Docker + GPUs

Last modified on 01 Oct 2021.

WSL + Windows
With Tensorflow or PyTorch
Basic installation
Check info
Install nvidia-docker2
Difference: nvidia-container-toolkit vs nvidia-container-runtime
Using docker-compose?
Make NVIDIA work in docker (Linux)
References

WSL + Windows

Make WSL2 recognize GPU on Windows 10 👉 Check this tut.

If you meet error “Your insider preview build settings need attention”, restart many times don’t solve the problem. 👉 Go to Account setting, then choose “Verify”.

With Tensorflow or PyTorch

👉 Official doc for TF + docker
👉 My note for docker + TF.
👉 An example of docker pytorch with gpu support.

Basic installation

It works perfectly on Pop!_OS 20.04,

sudo apt update
sudo apt install -y nvidia-container-runtime
sudo apt install -y nvidia-container-toolkit
sudo apt install -y nvidia-cuda-toolkit
# restard required

Check info

# verify that your computer has a graphic card
lspci -nn | grep '\[03'

# First, install drivers and check
nvidia-smi
# output: NVIDIA-SMI 450.80.02 Driver Version: 450.80.02    CUDA Version: 11.0
# it's maximum CUDA version that your driver supports

# check current version of cuda
nvcc --version
# If there is not nvcc, it may be in /usr/local/cuda/bin/
# Add this location to PATH
# modify ~/.zshrc or ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH

# You may need to install
sudo apt install -y nvidia-cuda-toolkit

# install and check nvidia-docker
dpkg -l | grep nvidia-docker
# or
nvidia-docker version

# Verifying –gpus option under docker run
docker run --help | grep -i gpus
# output: --gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)

# Listing out GPU devices
docker run -it --rm --gpus all ubuntu nvidia-smi -L
# output: GPU 0: GeForce GTX 1650 (...)

# Verifying again with nvidia-smi
docker run -it --rm --gpus all ubuntu nvidia-smi

# test a working setup container-toolkit
docker run --rm --gpus all nvidia/cuda nvidia-smi

# test a working setup container-runtime
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

# Error response from daemon: Unknown runtime specified nvidia.
# Search below for "/etc/docker/daemon.json"
# Maybe it helps.

Install `nvidia-docker2`

This package is the only docker-specific package of any of them. It takes the script associated with the nvidia-container-runtime and installs it into docker’s /etc/docker/daemon.json file for you. This then allows you to run (for example) docker run --runtime=nvidia ... to automatically add GPU support to your containers. It also installs a wrapper script around the native docker CLI called nvidia-docker which lets you invoke docker without needing to specify --runtime=nvidia every single time. It also lets you set an environment variable on the host (NV_GPU) to specify which GPUs should be injected into a container.

👉 Officicial guide to install.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2

# restart docker
sudo systemctl restart docker

# check version
nvidia-docker version

Difference: `nvidia-container-toolkit` vs `nvidia-container-runtime`

👉 What’s the difference between the lastest nvidia-docker and nvidia container runtime？

In this note, with Docker 19.03+ (docker --version), he says that nvidia-container-toolkit is used for --gpus (in docker run ...), nvidia-container-runtime is used for --runtime=nvidia (can also be used in docker-compose file).

However, if you want to use Kubernetes with Docker 19.03, you actually need to continue using nvidia-docker2 because Kubernetes doesn’t support passing GPU information down to docker through the --gpus flag yet. It still relies on the nvidia-container-runtime to pass GPU information down the stack via a set of environment variables.

👉 Installation Guide — NVIDIA Cloud Native Technologies documentation

Using docker-compose?

Purpose?

# instead of using
docker run \
    --gpus all\
    --name docker_thi_test\
    --rm\
    -v abc:abc\
    -p 8888:8888

# we use this with docker-compose.yml
docker-compose up

# check version of docker-compose
docker-compose --version

# If "version" in docker-compose.yml < 2.3
# Modify: /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

# restart our docker daemon
sudo pkill -SIGHUP dockerd

# If "version" in docker-compose.yml >=2.3
# docker-compose.yml => able to use "runtime"
version: '2.3' # MUST BE >=2.3 AND <3
services:
  testing:
    ports:
      - "8000:8000"
    runtime: nvidia
    volumes:
      - ./object_detection:/object_detection

👉 Check more in my repo my-dockerfiles on Github.

Run the test,

docker pull tensorflow/tensorflow:latest-gpu-jupyter
mkdir ~/Downloads/test/notebooks

Without using docker-compose.yml (tensorflow) (cf. this note for more)

docker run --name docker_thi_test -it --rm -v $(realpath ~/Downloads/test/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

With docker-compose.yml?

# ~/Download/test/Dockerfile
FROM tensorflow/tensorflow:latest-gpu-jupyter

# ~/Download/test/docker-compose.yml
version: '2'
services:
  jupyter:
    container_name: 'docker_thi_test'
    build: .
    volumes:
        - ./notebooks:/tf/notebooks # notebook directory
    ports:
        - 8888:8888 # exposed port for jupyter
    environment:
        - NVIDIA_VISIBLE_DEVICES=0 # which gpu do you want to use for this container
        - PASSWORD=12345

Then run,

docker-compose run --rm jupyter

Make NVIDIA work in docker (Linux)

This section is still working (on 26-Oct-2020) but it’s old for newer methods.

Idea: Using NVIDIA driver of the base machine, don’t install anything in docker!

First, maker sure your base machine has an NVIDIA driver.

 # list all gpus
 lspci -nn | grep '\[03'

 # check nvidia & cuda versions
 nvidia-smi

Install nvidia-container-runtime

 curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
 distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

 curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

 sudo apt-get update

 sudo apt-get install nvidia-container-runtime

Note that, we cannot use docker-compose.yml in this case!!!

Create an image img_datas with Dockerfile is

 FROM nvidia/cuda:10.2-base

 RUN apt-get update && \
     apt-get -y upgrade && \
     apt-get install -y python3-pip python3-dev locales git

 # install dependencies
 COPY requirements.txt requirements.txt
 RUN python3 -m pip install --upgrade pip && \
     python3 -m pip install -r requirements.txt
 COPY . .

 # default command
 CMD [ "jupyter", "lab", "--no-browser", "--allow-root", "--ip=0.0.0.0"  ]

Create a container,

 docker run --name docker_thi --gpus all -v /home/thi/folder_1/:/srv/folder_1/ -v /home/thi/folder_1/git/:/srv/folder_2 -dp 8888:8888 -w="/srv" -it img_datas

 # -v: volumes
 # -w: working dir
 # --gpus all: using all gpus on base machine

This article is also very interesting and helpful in some cases.

References

Difference between base, runtime and devel in Dockerfile of CUDA.
Dockerfile on Github of Tensorflow.