Using containers on the Nef cluster.

Containers are becoming a very popular way of packaging software nowadays. The most used format is currently docker container images. One problem with docker is that you can’t use it on Nef since it requires a root daemon that bypasses the OAR scheduler running on Nef nodes. So we need a way to run containers as a user inside a job on Nef. To do this, we can use the singularity software that provides the same functionality as docker and can even import docker files.

Create your own container

You can create containers on your local machine and then upload the container image to Nef.
You first have to install singularity on your workstation/laptop. If you are using a Fedora system, you can simply install it with dnf: dnf install singularity

To create your own container, you have to write a Singularity recipe (more information) :

This example will create a container, using a centos 6.9 OS using docker as a bootstrap container, and then will install a few more packages (this example is used to build conda packages: we need an old distribution in order to build binary packages compatibles with a wide range of operating systems)

BootStrap: docker
From: centos:6.9
MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/
Include: yum

%post
    echo "Hello from inside the container"
    yum -y install gcc gcc-c++ glew-devel mesa-libGL-devel freeglut-devel  libICE-devel libSM-devel libXt-devel libXrender-devel libX11-devel openssh-clients psutils rsync

By convention, the recipe should be named Singularity. You can then run the following command to build the CentOS container image:

sudo singularity build centos6.simg Singularity

Run your container on Nef

You can then upload it on nef, and reserve a node to run the container:

scp centos6.img nef-devel:
oarsub -I -l /nodes=1 

Once you have an interactive shell, you have to load the singularity module to be able to run the singularity command.

# module load singularity/2.5.2
# singularity exec centos6.simg cat /etc/centos-release
CentOS release 6.9 (Final)

So you can see that the container is indeed running a CentOS 6 system.

Import a docker image and use it on a GPU node

You can also import existing existing docker containers that include many libraries and tools for deep learning computations.

For example, tensorflow is available through docker images on the docker hub. Let’s see how we can run tensorflow tutorials on the nef cluster with singularity:

We will use examples files from https://github.com/tensorflow/models.git

First, reserve a GPU node; we need at least a Maxwell GPU so we add the gpucapability>=’5.1′ property:
oarsub -I -p "gpu='YES' AND gpucapability>='5.1'" -l /gpunum=1,walltime=1

Once the job is started, you can run in the interactive shell the command to load the singularity module:
module load singularity/2.5.2

Now we have two options, either run an ephemeral container, or create a container image on the file system that can be reused several time. To simply run an ephemeral container, and run an interactive shell on it, type the following command (notice the -nv option to enable Nvidia GPU devices inside the container) :

singularity shell --nv docker://tensorflow/tensorflow:latest-gpu

The other option is to create a local container file:
singularity pull docker://tensorflow/tensorflow:latest-gpu

This will create a tensorflow-latest-gpu.simg file
Now, we can run
singularity shell --nv ./tensorflow-latest-gpu.simg

Inside your container, you can simply run :
python tutorials/image/mnist/convolutional.py

We can see in terminal that tensorflow has found our Nvidia device and will use it.

018-09-06 13:58:10.979745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.74GiB
2018-09-06 13:58:10.979789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 13:58:14.284711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 13:58:14.284753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-09-06 13:58:14.284761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-09-06 13:58:14.285076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Initialized!

You can also run the script without an interactive shell:
singularity exec --nv ./tensorflow-latest-gpu.simg python tutorials/image/mnist/convolutional.py

Leave a Reply

Your email address will not be published.