Wednesday Bonus Exercise 2.2: Use Singularity to Run Tensorflow

In this tutorial, we see how to submit a tensorflow job on the OSG through Singularity containers. We currently offer CPU and GPU containers for tensorflow (both based on Ubuntu). Here, we focus on a CPU container.

Setup

You should still be logged into training.osgconnect.net (the OSG Connect submit server for this workshop).

Get the example files and understand the job requirements.

In order to run this example quickly, you can download all the files into a new folder using the tutorial command:

username@training $ tutorial tensorflow-matmul

This creates a directory tutorial-tensorflow-matmul. Go inside the directory and see what is inside.

username@training $ cd tutorial-tensorflow-matmul
username@training $ ls -F

You will see the following files

tf_matmul.py            (Python program to multiply two matrices using tensorflow package)
tf_matmul.submit        (HTCondor Job description file)
tf_matmul_wrapper.sh    (Job wrapper shell script that executes the python program)
tf_matmul_gpu.submit    (HTCondor Job description file targeting gpus)

NOTE: The file tf_matmul_gpu.submit is for gpus, but we will not focus on gpus in this exercise. You are welcome to take a look.

The python script `tf_matmul.py` uses tensorflow to perform the matrix multiplication of a `2x2` matrix.

The submit file will have similar requirements and options as our previous job, including:

Requirements = HAS_SINGULARITY == True

In addition, we also provide the full path of the image via the keyword +SingularityImage.

+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest"

Submit the tensorflow example job

Now submit the job to the OSG.

username@training $ condor_submit tf_matmul.submit 

The job will look for a machine on the OSG that has singularity installed. On a matched machine, the job creates the singularity container from the image /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest. Inside this container, the program tf_matmul.py begins to execute.

After your job completed, you will see an output file tf_matmul.output.

username@training $ cat tf_matmul.output 
result of matrix multiplication
===============================
[[ 1.0000000e+00  0.0000000e+00]
 [-4.7683716e-07  1.0000002e+00]]
===============================

The result printed in the output file should be a 2x2 identity matrix.