Tutorial created by Abel Brown, NVIDIA (abelb@nvidia.com)

It is probably fair to say that access to servers has never been easier. With platforms such as AWS, Azure, and Google GCE we can now launch on-demand servers of all varieties and configurations. This programable infrastructure (IaaS) help companies, agencies, and institutions maintain agility as market and mission pressures evolve. However, even with the rise of IaaS, application packaging, configuration, and composition has not advanced despite considerable efforts in configuration management. This is where docker comes in.

Docker is not about full virtualization but rather about the ease of packaging and running applications using Linux containers. The idea is that docker containers wrap a piece of software or application in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries (i.e. anything that can be installed on a server). This guarantees that the software will always run the same everywhere, regardless of the OS/compute environment running the container. Docker also provides portable Linux deployment such that containers can be run on any Linux system with kernel is 3.10 or later. All major Linux distros have supported Docker since 2014. While no doubt containers and virtual machines have similar resource isolation and allocation benefits, the architectural approach of Linux containers allows containerized applications to be more portable and efficient.

At NVIDIA, we use containers in a variety of ways including development, testing, benchmarking, and of course in production as the mechanism for deploying deep learning frameworks. Using nvidia-docker, a light-weight docker plugin, we can develop and prototype GPU applications on a workstation, and then deploy those applications anywhere that supports GPU containers.

Setup

In the interest of time, we've already configured docker and nvidia-docker. If you're interested in setup details, see Appendix A at the bottom.

First Contact

The simplest way to interact with docker is probably to just ask for the version information


In [17]:
docker --version


Docker version 1.12.3, build 6b644ec

We can ask nvidia-docker for the version information too


In [18]:
nvidia-docker --version


Docker version 1.12.3, build 6b644ec

Notice that nvidia-docker invocation here was simply "pass through" to docker command itself.

Next best way to get familiar with docker command line is to ask for --help


In [4]:
docker --help


Usage: docker [OPTIONS] COMMAND [arg...]
       docker [ --help | -v | --version ]

A self-sufficient runtime for containers.

Options:

  --config=~/.docker              Location of client config files
  -D, --debug                     Enable debug mode
  -H, --host=[]                   Daemon socket(s) to connect to
  -h, --help                      Print usage
  -l, --log-level=info            Set the logging level
  --tls                           Use TLS; implied by --tlsverify
  --tlscacert=~/.docker/ca.pem    Trust certs signed only by this CA
  --tlscert=~/.docker/cert.pem    Path to TLS certificate file
  --tlskey=~/.docker/key.pem      Path to TLS key file
  --tlsverify                     Use TLS and verify the remote
  -v, --version                   Print version information and quit

Commands:
    attach    Attach to a running container
    build     Build an image from a Dockerfile
    commit    Create a new image from a container's changes
    cp        Copy files/folders between a container and the local filesystem
    create    Create a new container
    diff      Inspect changes on a container's filesystem
    events    Get real time events from the server
    exec      Run a command in a running container
    export    Export a container's filesystem as a tar archive
    history   Show the history of an image
    images    List images
    import    Import the contents from a tarball to create a filesystem image
    info      Display system-wide information
    inspect   Return low-level information on a container, image or task
    kill      Kill one or more running containers
    load      Load an image from a tar archive or STDIN
    login     Log in to a Docker registry.
    logout    Log out from a Docker registry.
    logs      Fetch the logs of a container
    network   Manage Docker networks
    node      Manage Docker Swarm nodes
    pause     Pause all processes within one or more containers
    port      List port mappings or a specific mapping for the container
    ps        List containers
    pull      Pull an image or a repository from a registry
    push      Push an image or a repository to a registry
    rename    Rename a container
    restart   Restart a container
    rm        Remove one or more containers
    rmi       Remove one or more images
    run       Run a command in a new container
    save      Save one or more images to a tar archive (streamed to STDOUT by default)
    search    Search the Docker Hub for images
    service   Manage Docker services
    start     Start one or more stopped containers
    stats     Display a live stream of container(s) resource usage statistics
    stop      Stop one or more running containers
    swarm     Manage Docker Swarm
    tag       Tag an image into a repository
    top       Display the running processes of a container
    unpause   Unpause all processes within one or more containers
    update    Update configuration of one or more containers
    version   Show the Docker version information
    volume    Manage Docker volumes
    wait      Block until a container stops, then print its exit code

Run 'docker COMMAND --help' for more information on a command.

The format of docker command line interactions is:

docker [OPTIONS] COMMAND [arg...]

and as the help display shows there are a lot of commands to choose from. Don't worry, much like a big city, once we become more familiar with these commands the list won't seem so big. We can start to drill down and get help that is specific to each command. For example, one of the most useful docker commands is images which list all local containers on the host that docker knows about


In [19]:
docker images --help


Usage:	docker images [OPTIONS] [REPOSITORY[:TAG]]

List images

Options:
  -a, --all             Show all images (default hides intermediate images)
      --digests         Show digests
  -f, --filter value    Filter output based on conditions provided (default [])
      --format string   Pretty-print images using a Go template
      --help            Print usage
      --no-trunc        Don't truncate output
  -q, --quiet           Only show numeric IDs

OK, lets now ask docker about the what container images are available locally on the host


In [20]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

Here the output specifies three container images with some general metadata associated with each one. First you'll notice that the images are quite large on average (~ 2GB) and that each image is associated with a unique ID hash. When containers are created (i.e. via the create command) they are created from images. There is no limit to the number of containers we can create from an images so it is important that docker associates UUIDs for each image and container. Notice the REPOSITORY and TAG columns here specify more human readable image labels. The repository loosely coresponds to the image name (i.e. url) and just as in the version control system GIT images can be modified and "tagged" rather than explicitly changing the image name for each image version.

Here we have the "nvidia/cuda" container with ID c54a2cc56cbb and is tagged as the "latest" version of the image (i.e. most current). The deep learning library cuDNN was added to the image and a new image was created under the same name but tagged appropriately as "8.0-cudnn5-devel".

You're probably wondering already "where does docker store these containers?". In general, docker works in /var/lib/docker and images are stored in image subdirectory. For more information and details about where and how docker stores images on the host machine, see here.

For now just know that docker works with "images" and all containers are created from these images. We will go into all the details about creating and modifying images etc in just a bit. But first, lets actually kick around some containers!

Getting Started with Containers

First things first lets have docker list all containers using the ps command


In [21]:
docker ps -a


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

There are probably no containers listed which is fine because we're going to create some containers from images next. Again, don't forget you can get help for each command with docker [COMMAND] --help. Use this to get additional details on the ps command.

Lets now use the docker command create to initialize a container from the nvidia/cuda:latest image


In [22]:
docker create nvidia/cuda:latest


41524c54cf1c5e4370fbee2d2a6f5965290f59e61e3a8f52e5d3c9a5284d5001

The responce we recieved is a sha256 UUID for the generated container and listing docker containers again we see this new container now listed


In [23]:
docker ps -a


CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
41524c54cf1c        nvidia/cuda:latest   "/bin/bash"         6 seconds ago       Created                                 fervent_kalam

It is important to understand that the container is not actually doing anything right now. We've only "stamped" out a container from an image -- the container is not running at this point. Were the container active the STATUS would read "running". OK, so what is the container doing there? Well the answer is "nothing". Think about when we enter commands on the command-line -- each time we hit enter we implicitly specify that we would like that command to be executed immediately. You can think of containers as a command that has not yet executed. This command is wrapped up in the container and has all the resources (libraries etc) needed for successful execution. Speaking of which, lets actually run this container ...

Using the 12 character container id provided by the docker ps -a command above we can run the container as follows


In [24]:
# copy your CONTAINER ID from the docker ps -a command above
nvidia-docker run 41524c54cf1c


Using default tag: latest
Pulling repository docker.io/library/41524c54cf1c
nvidia-docker | 2016/12/16 20:59:56 Error: image library/41524c54cf1c:latest not found

hm ... we got an ERROR. Using the run command seemed like a good guess! Why does the run command not work here? More on this later in the next few cells.

Lets try the start command instead


In [25]:
nvidia-docker start 41524c54cf1c


41524c54cf1c

OK, that looks better. Using the start command docker returned the sha256 UUID. Lets have a look at the docker containers again


In [26]:
docker ps -a


CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS                     PORTS               NAMES
41524c54cf1c        nvidia/cuda:latest   "/bin/bash"         5 minutes ago       Exited (0) 3 seconds ago                       fervent_kalam

Now the status says "Exited (0) ...". Notice that the command (i.e. entry point) is /bin/bash. When the start was issued the "COMMAND" was executed and by definition bash command language interpreter that executes commands read from the standard input or from a file. However, there were no commands to execute from standard input! Containers can have other entry points -- the reason /bin/bash is used most often is that it allows the container to act more generically as a shell so we can send it additional instructions. Note that all containers have a default entrypoint of "/bin/sh -c" unless otherwise specified.

Here our hands are tied with what the container will do. Each time we issue the start command the container executes the entrypoint "/bin/bash" and since there is nothing on standard input the container simply exits. This is where the run command comes in.

Instead of creating and starting a container explicitly we can use the run command to exectue a command within a particular image via creating a container from that image with the appropriate entrypoint. Lets issue a run command passing the image ID of the "nvidia/cuda:latest" image as the argument.


In [27]:
# don't forget to use the container id from the "docker ps -a" command
nvidia-docker run 367795fb1051



Notice that the command start takes a container ID as the argument while the run command takes an image ID. Lets have a look at the containers


In [28]:
docker ps -a


CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS                      PORTS               NAMES
c0d4c1b7ec22        367795fb1051         "/bin/bash"         6 seconds ago       Exited (0) 5 seconds ago                        prickly_almeida
41524c54cf1c        nvidia/cuda:latest   "/bin/bash"         5 minutes ago       Exited (0) 40 seconds ago                       fervent_kalam

Now we have an additional container both from image 367795fb1051 which have exited.

At this point the run command has done exactly what the start command has done (i.e. started a container which executed the entrypoint and exited). However, the docker run command allows us to pass an alternative command to the container (docker run --help). Lets try to pass an alternative instruction.


In [29]:
nvidia-docker run 367795fb1051 nvidia-smi


Fri Dec 16 21:01:03 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Finally! Just to be clear, the nvidia-smi command was exectued within the container -- not the host. Lets have a look at the containers yet again


In [30]:
docker ps -a


CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS                          PORTS               NAMES
9cedbf73f11d        367795fb1051         "nvidia-smi"        10 seconds ago      Exited (0) 8 seconds ago                            sleepy_leavitt
c0d4c1b7ec22        367795fb1051         "/bin/bash"         36 seconds ago      Exited (0) 35 seconds ago                           prickly_almeida
41524c54cf1c        nvidia/cuda:latest   "/bin/bash"         6 minutes ago       Exited (0) About a minute ago                       fervent_kalam

We now have a new container from image 367795fb1051 but the "COMMAND" has been set to nvidia-smi as instructed by our run command. Just for kicks lets issue a start command to this new container. Each container gets a uuid that will change every time this lab is run so make sure to replace the container ID in the command below with the appropriate container ID listed above.


In [31]:
docker start 9cedbf73f11d


9cedbf73f11d

Now wait a minute, where is our output??


In [32]:
docker ps -a


CONTAINER ID        IMAGE                COMMAND             CREATED              STATUS                          PORTS               NAMES
9cedbf73f11d        367795fb1051         "nvidia-smi"        About a minute ago   Exited (0) 4 seconds ago                            sleepy_leavitt
c0d4c1b7ec22        367795fb1051         "/bin/bash"         About a minute ago   Exited (0) About a minute ago                       prickly_almeida
41524c54cf1c        nvidia/cuda:latest   "/bin/bash"         7 minutes ago        Exited (0) 2 minutes ago                            fervent_kalam

Sure enough when we check the docker container status it has status of "Exited (0) 10 seconds ago ..." which means that the start command did indeed start the container. The long story short is that the run command automatically provides the standard output from the command specified where as start does not forward the stdout by default -- we have to explicitly ask. According to the help section for the start command, the option "--attach" attaches STDOUT/STDERR and forward signals.


In [33]:
docker start --attach 9cedbf73f11d


Fri Dec 16 21:02:19 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Bingo! We got the output form the command using --attach option when using the start command. We can get the associated help for the attach option to see that indeed we get STDOUT/STDERR


In [60]:
docker start --help


Usage:	docker start [OPTIONS] CONTAINER [CONTAINER...]

Start one or more stopped containers

Options:
  -a, --attach               Attach STDOUT/STDERR and forward signals
      --detach-keys string   Override the key sequence for detaching a container
      --help                 Print usage
  -i, --interactive          Attach container's STDIN

It is reasonable to ask where STDOUT goes when not attached? The answer is that STDOUT/STDERR are piped to the container log file. Each container has a log file associated with it which can be accessed using the logs command


In [38]:
docker logs 9cedbf73f11d


Fri Dec 16 21:01:03 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Fri Dec 16 21:02:06 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Fri Dec 16 21:02:19 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

A few final words on starting and running containers. Keep an eye out on the container list when using the run command as each invocation creates a new image. There is no problem having many (many) container stitting around but eventually it creates clutter. Remember, containers are ment to be light-weight and disposable. To that end lets clean up our containers.


In [39]:
# generate a list of container ID from the docker ps command
docker ps -a | awk '{print $1}' | tail -n +2


9cedbf73f11d
c0d4c1b7ec22
41524c54cf1c

In [40]:
# for each container ID use the docker "rm" command to remove/delete the container
for cid in $(docker ps -a | awk '{print $1}' | tail -n +2);do docker rm $cid; done


9cedbf73f11d
c0d4c1b7ec22
41524c54cf1c

All cleaned up!


In [42]:
docker ps -a


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

So, the run command creates a new container each time and using the docker ps command we can see each new container as we run commands. However this can get combersom to have to manually clean out containers all the time. The solution to this is to use the --rm option with the run command. This instructs docker to simply remove the container after execution. This is quite convenient for most situations. Keep in mind however, that once the container has been deleted it can not be started again etc -- it's gone. In general this is the desired workflow since containers are intended to be light-weight disposible execution units. After all, if you need the container again, no problem, just create another one!

Summary

So far we have discussed what docker is (i.e. virtual machines v.s. Linux containers) and how to view (ps), create, start, run, and rm containers created from docker images. Furthermore we've investigated various options associated with these docker commands such as --attach and --rm and familiarized ourselves with how to obtain help for docker and each of the docker commands.

Exercises

Make sure that you're comfortable creating containers and executing commands before moving forward. Docker is quite forgiving so do be afraid to try lots of different things out while you explore. Here are a few suggestions:

  1. Try launching containers from the other images
  2. Try issuing the run command with the option --rm and confirm the continer is cleaned up
  3. It is often useful to give our containers a name, try this with the --name option with the run command
  4. Have the container execute whoami. Think about what user might get returned before you run this.
  5. Maybe try ifconfig inside of the container. Is the MAC address the same every time?
  6. Get the container to ping google (pro tip: use option -c1 so you don't ping forever)
  7. What does the container return when asked for disk usage (i.e df -h)??
  8. Use the --env option with the run command to set environment variables AWS_S3_BUCKET, AWS_ACCESS_KEY, and AWS_SECRET_KEY
  9. Notice the containers maintain state. Verify this by touching a file and then running the container again

Food for Thought:

What happens when you execute rm -rf / inside a container?

Diving Deeper into Images

By now you probably comfortable with launching containers with docker from the images that were already available when we started. The next step is understanding how to manage your own images. This includes things like importing images into docker, modifying existing images, exporting images, and of course deleting images.

In the docker world most images have a "parent". This means that the image was created by modifying some existing image. In general, this is the typical workflow in docker. The idea is that it is easy to create images and lets just reuse what's already existing so as to be most efficient.

However, there is an essential difference when working with docker images. In the virtual machine world, you modify a 4 GB machine image and then do "save as" and create a new 4 GB machine image that contains your changes/updates. In this way virtual machine images are totally independent but very heavy weight.

When we make modification to an existing docker image and use this to create our own "new" image, docker does not store two images. Just a the GIT version control does not make a new copy of a file everytime a modification is commited, so too docker works with images in "layers" so that changes or modifications to an image are stored as a new light-weight image deltas called "layers". In this way we can take an existing 2 GB docker image and create 10 new images each with a few modifications of this base image without having to store 20 GB of new images! That is, in creating a new docker image from a parent, we only have to store the changes to the parent image.

As you might imagine, only having to store deltas to images allows for many many (many!) images to be generated without having to pay the full cost of having all those images around. Therefore in the docker world, images abound since it is efficient and light-weight to generate new images from an existing parent image. Luckily, docker provides ways to manage all these images using "image repositories" so that we can manage images just like we would version controled files in a git repository. More on docker repositories later.

New Image from Container Modifications

Lets first update the existing nvidia/cuda image by creating a container that executes apt-get update.


In [43]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

In [44]:
nvidia-docker run 367795fb1051 apt-get update


Ign http://archive.ubuntu.com trusty InRelease
Get:1 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Get:2 http://archive.ubuntu.com trusty-security InRelease [65.9 kB]
Ign http://developer.download.nvidia.com  InRelease
Get:3 http://developer.download.nvidia.com  Release.gpg [819 B]
Get:4 http://developer.download.nvidia.com  Release [564 B]
Get:5 http://archive.ubuntu.com trusty Release.gpg [933 B]
Get:6 http://archive.ubuntu.com trusty Release [58.5 kB]
Get:7 http://archive.ubuntu.com trusty-updates/main Sources [480 kB]
Get:8 http://developer.download.nvidia.com  Packages [107 kB]
Get:9 http://archive.ubuntu.com trusty-updates/restricted Sources [5921 B]
Get:10 http://archive.ubuntu.com trusty-updates/universe Sources [214 kB]
Get:11 http://archive.ubuntu.com trusty-updates/main amd64 Packages [1161 kB]
Get:12 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [20.4 kB]
Get:13 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [505 kB]
Get:14 http://archive.ubuntu.com trusty-security/main Sources [157 kB]
Get:15 http://archive.ubuntu.com trusty-security/restricted Sources [4621 B]
Get:16 http://archive.ubuntu.com trusty-security/universe Sources [54.9 kB]
Get:17 http://archive.ubuntu.com trusty-security/main amd64 Packages [700 kB]
Get:18 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [17.0 kB]
Get:19 http://archive.ubuntu.com trusty-security/universe amd64 Packages [191 kB]
Get:20 http://archive.ubuntu.com trusty/main Sources [1335 kB]
Get:21 http://archive.ubuntu.com trusty/restricted Sources [5335 B]
Get:22 http://archive.ubuntu.com trusty/universe Sources [7926 kB]
Get:23 http://archive.ubuntu.com trusty/main amd64 Packages [1743 kB]
Get:24 http://archive.ubuntu.com trusty/restricted amd64 Packages [16.0 kB]
Get:25 http://archive.ubuntu.com trusty/universe amd64 Packages [7589 kB]
Fetched 22.4 MB in 5s (4246 kB/s)
Reading package lists...

Ok, since we did not use the --rm option our container is still available -- which is what we want since we're going to create a new image from this updated container. Notice that we can not save changes to a container that has been removed/deleted. This should be obvious but just saying ... :)


In [45]:
docker ps -a


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
938e7acea158        367795fb1051        "apt-get update"    26 seconds ago      Exited (0) 12 seconds ago                       cranky_bartik

Docker lets us keep these changes by committing them into a new image. Under the hood, docker keeps track of the differenced between the base image (nvidia/cuda or rather 367795fb1051) by creating a new image layer using the union filesystem (UnionFS). To see this, we can inspect the changes to the container using the docker diff command which takes a the container ID as an argument.


In [48]:
# make sure to use container id from "docker ps -a" command above
docker diff 938e7acea158


C /usr
C /usr/local
A /usr/local/nvidia
C /var
C /var/lib
C /var/lib/apt
C /var/lib/apt/lists
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_universe_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_restricted_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_universe_source_Sources.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_restricted_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_InRelease
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_Release
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_main_source_Sources.gz
A /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1404_x86%5f64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_InRelease
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_universe_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_Release.gpg
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_restricted_source_Sources.gz
A /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1404_x86%5f64_Release.gpg
A /var/lib/apt/lists/lock
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_restricted_source_Sources.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_restricted_source_Sources.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-updates_main_source_Sources.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/partial
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_restricted_binary-amd64_Packages.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_universe_source_Sources.gz
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_source_Sources.gz
A /var/lib/apt/lists/developer.download.nvidia.com_compute_cuda_repos_ubuntu1404_x86%5f64_Release
A /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty-security_main_source_Sources.gz

where A means that the file or directory listed was added, C means created and D means deleted

Lets now use the docker commit command to generate a new image from this container. You might want to check your disk usage before and after image creation just to verify for yourself that the new image does not eat up an additional 2 GB of disk space on the host.


In [68]:
df -h


Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G   12K  7.4G   1% /dev
tmpfs           1.5G  800K  1.5G   1% /run
/dev/xvda1       32G  8.4G   22G  28% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            7.4G     0  7.4G   0% /run/shm
none            100M   16K  100M   1% /run/user

In [124]:
# make sure to use the appropriate container ID here
docker commit <CONTAINER-ID> newiamgename:update


sha256:5980494cc2123a15bb4ed8f5393d847332a791f23fa237bfc37975bc328d3921

Here using the docker commit command we provided the unique container ID as provided by the ps command and a new name:tag for the resulting image. Now lets list the docker images and we should see our new image there.


In [125]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
newiamgename        update              5980494cc212        5 seconds ago       1.637 GB
nvidia/cuda         8.0-cudnn5-devel    31582c303549        5 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        5 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

Again, notice that the image size says something like 1.6 GB. Verify with df that we have not actually used additional physical space on host disk in generating this image


In [73]:
df -h


Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G   12K  7.4G   1% /dev
tmpfs           1.5G  800K  1.5G   1% /run
/dev/xvda1       32G  8.4G   22G  28% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            7.4G     0  7.4G   0% /run/shm
none            100M   16K  100M   1% /run/user

GOTCHA: docker does not allow upper case characters in the image names and doing so generates the error message: "invalid reference format"

Gernerating Tar Files for Sharing

There are two docker commands for creating a tar file that can be shared with others. The first is that we can use the docker commands save and load to create and ingest image tar files. The second option is to use the docker commands export and import to create and ingest container tar files

Notice the help definitions for each set of commands:

When working with Containers
export Export a container's filesystem as a tar archive
import Import the contents from a tarball to create a filesystem image
When working with Images
save Save one or more images to a tar archive (streamed to STDOUT by default)
load Load an image from a tar archive or STDIN

Lets save the new image created in the previous section as a tar-ball on the file system.


In [126]:
docker save -o dockerimageexport.tar newiamgename:update



Now if we look in our current working directory we should see a nice fat tar-ball of our docker image


In [102]:
ls -lah dockerimageexport.tar


-rw------- 1 ubuntu ubuntu 1.6G Nov 30 18:31 dockerimageexport.tar

Keep in mind that when an image is saved to the host file system the full size of the image is physically allocated. We can see this here as the file dockerimageexport.tar has size 1.6G.

Lets now remove our new image we just commited from docker using the rmi command


In [127]:
docker rmi 5980494cc212


Untagged: newiamgename:update
Deleted: sha256:5980494cc2123a15bb4ed8f5393d847332a791f23fa237bfc37975bc328d3921
Deleted: sha256:9a5db72eda6a4bebd250ce33ee0e6c24866e8f519de2c34c2e248056fed3ca8d

If we look at our docker images again we no longer see newimagename:update


In [128]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         8.0-cudnn5-devel    31582c303549        5 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        5 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

Finally, load the saved image into docker using the load command


In [129]:
docker load --input dockerimageexport.tar


Loaded image: newiamgename:update

In [131]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
newiamgename        update              5980494cc212        4 minutes ago       1.637 GB
nvidia/cuda         8.0-cudnn5-devel    31582c303549        5 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        5 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

You should see the image newimagename:update listed!

A few final words on saving vs exporting. While the two methods are indeed similar in functionality, the difference is that saving an image will keep its history (i.e. all parent layers, tags, and versions) while exporting a container will squash its history producing a flattened single layer resource.

Creating Images with Dockerfiles

Launching containers, making updates, and commiting the changes does work well however, it is quite manual. To automate this image construction workflow docker provides a manifesto, called a dockerfile, which is a text file that lists the build steps. Dockerfiles are quite popular in the docker community and often docker files are exchanged rather than image tar-balls. While simple, there are a few common pitfalls when createing dockerfiles. Read up on the dockerfile best practices for some excellent pointers that will save you lots of time.

Dockerfiles are quite simple lets create a dockerfile to build and updated version of the nvidia/cuda:latest image


In [82]:
cat << LINES > Dockerfile
FROM nvidia/cuda:latest
RUN apt-get update
ENTRYPOINT ["/bin/sh", "-c"]
CMD ["nvidia-smi"]
LINES



The FROM instruction sets the base image for subsequent instructions. As such, a valid dockerfile must have FROM as its first instruction. The image can be any valid image. The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the dockerfile. Finally, the main purpose of a CMD is to provide defaults for an executing container. These defaults can include an executable, or they can omit the executable, in which case you must specify an ENTRYPOINT instruction as well. For more information on how CMD and ENTRYPOINT interact see here.

To build an image using this dockerfile we invoke the docker build command (this takes about 60 seconds)


In [117]:
docker build -t foo:bar .


Step 1 : FROM nvidia/cuda:latest
 ---> 367795fb1051
Step 2 : RUN apt-get update
 ---> Running in 90a83491d019
Ign http://archive.ubuntu.com trusty InRelease
Get:1 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Ign http://developer.download.nvidia.com  InRelease
Get:2 http://developer.download.nvidia.com  Release.gpg [819 B]
Get:3 http://developer.download.nvidia.com  Release [564 B]
Get:4 http://archive.ubuntu.com trusty-security InRelease [65.9 kB]
Get:5 http://archive.ubuntu.com trusty Release.gpg [933 B]
Get:6 http://archive.ubuntu.com trusty-updates/main Sources [480 kB]
Get:7 http://developer.download.nvidia.com  Packages [107 kB]
Get:8 http://archive.ubuntu.com trusty-updates/restricted Sources [5921 B]
Get:9 http://archive.ubuntu.com trusty-updates/universe Sources [214 kB]
Get:10 http://archive.ubuntu.com trusty-updates/main amd64 Packages [1161 kB]
Get:11 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [20.4 kB]
Get:12 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [505 kB]
Get:13 http://archive.ubuntu.com trusty Release [58.5 kB]
Get:14 http://archive.ubuntu.com trusty-security/main Sources [157 kB]
Get:15 http://archive.ubuntu.com trusty-security/restricted Sources [4621 B]
Get:16 http://archive.ubuntu.com trusty-security/universe Sources [54.9 kB]
Get:17 http://archive.ubuntu.com trusty-security/main amd64 Packages [700 kB]
Get:18 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [17.0 kB]
Get:19 http://archive.ubuntu.com trusty-security/universe amd64 Packages [191 kB]
Get:20 http://archive.ubuntu.com trusty/main Sources [1335 kB]
Get:21 http://archive.ubuntu.com trusty/restricted Sources [5335 B]
Get:22 http://archive.ubuntu.com trusty/universe Sources [7926 kB]
Get:23 http://archive.ubuntu.com trusty/main amd64 Packages [1743 kB]
Get:24 http://archive.ubuntu.com trusty/restricted amd64 Packages [16.0 kB]
Get:25 http://archive.ubuntu.com trusty/universe amd64 Packages [7589 kB]
Fetched 22.4 MB in 4s (4928 kB/s)
Reading package lists...
 ---> 049dd8e11f2a
Removing intermediate container 90a83491d019
Step 3 : ENTRYPOINT /bin/sh -c
 ---> Running in 1b85fabf673c
 ---> 7038f1e7a8f8
Removing intermediate container 1b85fabf673c
Step 4 : CMD nvidia-smi
 ---> Running in 4a010da2ff72
 ---> 704bbf8f9727
Removing intermediate container 4a010da2ff72
Successfully built 704bbf8f9727

In [118]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
foo                 bar                 704bbf8f9727        17 seconds ago      1.637 GB
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

You should now see the new image built with the dockerfile. Notice that we used the -t option when building the Dockerfile so that we could provide a REPOSITORY and TAG. Without the -t option both repository and tag would be set to "<none>". This is not necessarily a problem, it just means that you will have have no other option but to reference the image using the IMAGE ID. Don't hesitate to use the rmi command to clean up.

Dockerfiles are quite powerful and have many additional commands for adding mount points, exposing ports, setting environmental variables etc. Be sure to read the docs for complete details. For an advanced example of how to add Jupyter notebooks to an image see Appendix B below.

It is a Dockerfile Best Practice to use a .dockerignore file when building images. Using a .dockerignore file you can prevent files an directories from being copied to the images during the build to ensure the images contains only essential files.

OK, notice that our image created with the dockerfile has a non-descript name foo:bar. We can use the tag command to rename an image


In [119]:
docker tag foo bettername


Error response from daemon: no such id: foo

hm ... we got an ERROR. We have to use the full name with TAG (i.e. foo:bar) when renaming images. Alternatively, we could use the "IMAGE ID" instead.


In [120]:
docker tag foo:bar bettername




In [121]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
bettername          latest              704bbf8f9727        27 seconds ago      1.637 GB
foo                 bar                 704bbf8f9727        27 seconds ago      1.637 GB
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

It is important to notice here that we now have an extra images listed! However, you can confirm with the df -h command that the rename/copy we did here didn't actually use any additional hard disk space. Also, notice that docker automatically gave the image a TAG of "latest". We can of course provide a tag explicitly when renaming the image as follows


In [122]:
docker tag foo:bar bettername:tagtagtag




In [123]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
foo                 bar                 704bbf8f9727        32 seconds ago      1.637 GB
bettername          latest              704bbf8f9727        32 seconds ago      1.637 GB
bettername          tagtagtag           704bbf8f9727        32 seconds ago      1.637 GB
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

Typically for personal/local use, image names don't matter too much. However once we start to share and distribute images, there is an image naming convention that must be followed with docker. Use the rmi command to clean up.


In [126]:
docker rmi foo:bar; 
docker rmi bettername:latest;
docker rmi bettername:tagtagtag;


Error response from daemon: No such image: foo:bar
Untagged: bettername:latest
Untagged: bettername:tagtagtag
Deleted: sha256:704bbf8f97273ebc0e60a73e65c4ed76254dcbbcd33a90df1baa17d1e663b750
Deleted: sha256:7038f1e7a8f84cd6079df6f3bf9a820446c9e35132c267d208a76d6b53963878
Deleted: sha256:049dd8e11f2aff4b65c9049269600826c4630ee3d4da960b1d8851341a149d10
Deleted: sha256:09796a2db57130d4727fba0ce912d2e25530b60502b02d98ecbd37335164d9f7

FYI: There is a possibility when deleting images that you could get the ERROR message

Error response from daemon: conflict: unable to delete <IMAGE ID> (must be forced) - image is being used by stopped container <CONTAINER ID>

This means that there are some existing containers using the image you are trying to delete. You must first remove those containers and then remove the associated image.


In [127]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

Summary

In this section we learned how create new images from existing containers using the docker command commit. Furthermore, using the export/import commands with containers, or alternatively save/load commands with images, we can move containers and images in and out of docker for sharing and backup etc. To facilitate consisten build process we can use a Dockerfile script with the build command to generate more complex images requiring a more complex configuration. Finally, we saw how we could use the tag command to rename/copy our images and the rmi command to delete images.

Exercises

At this point you should be comfortable launching containers, looking a logs, attaching standard output, creating your own images with docker commands and Dockerfiles, deleting images, renaming images etc etc. Here are a few suggestions for to investigate further and grow your docker knowledge base:

  1. Use the Dockerfile command ENV to create an images with predefined environmental variables
  2. Use the docker command inspect to have a look at what enviroment variables are defined in an image
  3. Suppose you add enviromental variables AWS_ACCESS_KEY, AWS_SECRET_KEY, and AWS_S3_BUCKET when building your docker image.
    What problems might arrise when sharing this docker image with others??
  4. What is the difference between ADD and COPY commands in a Dockerfile?
  5. What is the difference between the ENTRYPOINT and CMD commands in a Dockerfile?
  6. What is the default entrypoint for a container?
  7. What does the VOLUME command do and when should it be used?
  8. When should we use WORKDIR command in a Dockerfile?
  9. According to the Dockerfile Best Practices, which set of commands is better and why??

    RUN apt-get install -y automake
    RUN apt-get install -y build-essential

OR

RUN apt-get install -y automake build-essential


Food for Thought:

Is it possible to automate container build using a git repository hook?

Working with Image Repositories

Docker by nature is a social container famework. That is, the docker community likes to share containers. The docker repository is the primary mechanism for pushing and pulling images. Furthermore, many developers have come to actually distribute software as a ready made container using public docker repositories. In this way, users simply pull the docker container and launch the software with zero configuration hassel. For example, many deep learning frameworks are rather difficult to configure and install locally. Therefore, most deep learning frameworks are also published as docker images which can be pulled from DockerHub. Additonally, many deep learning frameworks have conflicting library dependencies which prevents having those frameworks installed locally at the same time. Pulling ready-made DL framework containers from DockerHub aliviates much of the drudgery of framework lifecycle.

The default docker repository is index.docker.io which points to what the community calls "DockerHub". DockerHub is the cental public repository for any docker images. There are no credentials required to pull images from DockerHub. On the other hand, to push images to this public repository does require creating an account to obtain a DockerHub ID with which all your published images will be associated.


In [1]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         8.0-cudnn5-devel    31582c303549        8 weeks ago         1.776 GB
nvidia/cuda         latest              367795fb1051        8 weeks ago         1.615 GB
hello-world         latest              c54a2cc56cbb        5 months ago        1.848 kB

When other try to pull your publicly avaliable image they will reference the image by the username/image:tag convention. For example, the full public identifier of the cuda container we have been working with here is

nvidia/cuda:latest

where "nvidia" is the official NVIDIA user ID on DockerHub, "cuda" is the name of the image published by user "nvidia" and finally the tag "latest" is used to ask for the most recent version of that image. As usual, even when pulling images from DockerHub, if no tag is specified by the user, the "latest" tag will be automatically infered. The generic naming convention is then

[USER]/IMAGE[:TAG]

See here fore additional information on getting started with DockerHub.

Using Images from Public Repository

Lets pull some images from the public DockerHub and see how easy it is to utilize preconfigured software in containers. First lets clean up all existing containers and then we're going to remove all existing docker images and start from scratch.


In [2]:
# DELETE ALL EXISTING CONTAINERS
# for each container ID use the docker "rm" command to remove/delete the container
for cid in $(docker ps -a | awk '{print $1}' | tail -n +2);do docker rm $cid; done


938e7acea158

In [3]:
# DELETE ALL EXISTING IMAGES
# for each image ID, use the docker "rmi" command to remove/delete the image
for iid in $(docker images| awk '{print $3}' | tail -n +2);do docker rmi $iid; done




In [4]:
# confirm no images
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

Now lets use the docker command pull to get the latest cuda image from NVIDIA on DockerHub (this takes about a minute)


In [9]:
docker pull nvidia/cuda:latest


latest: Pulling from nvidia/cuda










Digest: sha256:a73077e90c6c605a495566549791a96a415121c683923b46f782809e3728fb73
Status: Downloaded newer image for nvidia/cuda:latest

In [10]:
docker images


REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda         latest              d189f42bc63e        17 hours ago        1.617 GB

Congratulations! You have now pulled your frist image from DockerHub. Of course, you know how to run a command in this new container


In [12]:
nvidia-docker run --rm nvidia/cuda:latest nvidia-smi


Sat Dec 17 16:32:33 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   31C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

FYI: Notice that if you do not use the docker pull command and issue a run command, for better or worse, docker will look for the image specified locally and if not found, try to automatically pull the image for you from DockerHub. Lets remove the image we just pulled and issue a run command when the container is not available locally.


In [42]:
docker rmi nvidia/cuda:latest


Untagged: nvidia/cuda:latest
Untagged: nvidia/cuda@sha256:a73077e90c6c605a495566549791a96a415121c683923b46f782809e3728fb73
Deleted: sha256:d189f42bc63e5cce15b07d0bdf0eebac2a84725495d903eb79d8b7c48236a010
Deleted: sha256:0ee0aa554b8be64c963aaaf162df152784d868d21a7414146cb819a93e4bdb9e
Deleted: sha256:92ff5c86c10e04ad87dd16e8d92df3c8db1228305b1b0d297d10fa363eed4e55
Deleted: sha256:b71881e6e6e9aa0df33d3f0d8448f6448c588b13c4b46d654a0eaf43bc0a2263
Deleted: sha256:c83526fb182682c39fa8b6b0be2dcc457b9f5c1af409916d5bc1830006305214
Deleted: sha256:16ef8421620904603ccd54bf01b94af32106d906be7059eee08b941b5efb57ce
Deleted: sha256:565903b66233d5576592815ca4d499bd6fe09a9b4baf83f345aaf64544f1cd78
Deleted: sha256:b653e4373a4b35aa760ff67cfa3de2c9fe3c089823b63ec797eb04de256f86ba
Deleted: sha256:362e536c4e530b94ce4204461e4f8c998705bcb98c91be54dd40b22f50531f3a
Deleted: sha256:b69ad682d83af6a6962b4a60a0b5f033c8d39efcd20dbdf320b6dd8136e50aae
Deleted: sha256:bc224b1b676d12be2a49f99778dda08b90d22747244d0a0afcdf4cfeb7db5d89

In [43]:
nvidia-docker run --rm nvidia/cuda:latest nvidia-smi


latest: Pulling from nvidia/cuda










Digest: sha256:a73077e90c6c605a495566549791a96a415121c683923b46f782809e3728fb73
Status: Downloaded newer image for nvidia/cuda:latest
Sat Dec 17 16:58:49 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   31C    P8    17W / 125W |      0MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Stale Local Containers (!!)

However, be very carful here since docker run command does not pull the image every time. That is, the docker run command will only pull the image once (i.e. the first time) if the image is not available locally. Therefore, every time after, the docker run command will use the local image. This means that the "latest" image could be updated remotely on DockerHub but your local copy will not change! Therefore, you local image becomes stale. You must issue a docker pull command to update your local container images with the most recent changes on DockerHub. Unfortunately, the docker run command does not have a --pull option (see discussion) so if you want the bonefied most recent image every time you run then you will need to issue pull and run commands in tandem like so:

docker pull <IMAGE> && nvidia-docker run <IMAGE>

This useage will ensure the "latest" image is retreived from the repository before running a container from that image.

Docker Storage Driver

You might be wondering where exactly did docker put this new image? The contents of the /var/lib/docker directory vary depending on the driver docker is using for storage. You can find out more about how docker organizes images here. To figure out what driver is being used for storage we can use the docker info command


In [39]:
docker info 2> /dev/null | grep "Storage Driver:"


Storage Driver: devicemapper

In [38]:
docker info 2> /dev/null | grep "Data loop file:"


 Data loop file: /var/lib/docker/devicemapper/devicemapper/data

Here docker on Ubuntu uses devicemapper by default. You can read more about how the devicemapper storage driver works here. The acutual binary file with the image data is specified by the "Data loop file". Out of curiosity you can ask how big this file is but it does require sudo access:


In [41]:
sudo du -sh /var/lib/docker/devicemapper/devicemapper/data


1.9G	/var/lib/docker/devicemapper/devicemapper/data

Example: MNIST with TensorFlow

To demonstrate how easy it is to do awesome stuff with docker containers, lets use TensorFlow from Google to perform optical character recognition of handwritten digits 0 through 9 in the MNIST dataset.

First lets pull the GPU accelerated tensorflow container from DockerHub


In [44]:
docker pull tensorflow/tensorflow:latest-gpu


latest-gpu: Pulling from tensorflow/tensorflow



















Digest: sha256:b5c10face4b6a4fb772aad7e5e8f691d5944d8b50f06c308a4f520938265f124
Status: Downloaded newer image for tensorflow/tensorflow:latest-gpu

In [45]:
docker images


REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda             latest              d189f42bc63e        19 hours ago        1.617 GB
tensorflow/tensorflow   latest-gpu          48a64e7e7fee        7 days ago          2.62 GB

Next lets train a deep convolution neural network to recognize 28x28 pixel images of handwritten digits 0 - 9 (this takes a few minutes)


In [46]:
nvidia-docker run --rm 48a64e7e7fee python -m tensorflow.models.image.mnist.convolutional


I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 11.5 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 19.0 ms
Minibatch loss: 3.236, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 7.7%
Step 200 (epoch 0.23), 18.8 ms
Minibatch loss: 3.363, learning rate: 0.010000
Minibatch error: 10.9%
Validation error: 4.2%
Step 300 (epoch 0.35), 18.8 ms
Minibatch loss: 3.139, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 3.1%
Step 400 (epoch 0.47), 18.8 ms
Minibatch loss: 3.201, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 2.7%
Step 500 (epoch 0.58), 18.8 ms
Minibatch loss: 3.182, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.4%
Step 600 (epoch 0.70), 18.8 ms
Minibatch loss: 3.115, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.1%
Step 700 (epoch 0.81), 18.8 ms
Minibatch loss: 2.966, learning rate: 0.010000
Minibatch error: 1.6%
Validation error: 2.4%
Step 800 (epoch 0.93), 18.8 ms
Minibatch loss: 3.060, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 1.9%
Step 900 (epoch 1.05), 18.9 ms
Minibatch loss: 2.931, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.7%
Step 1000 (epoch 1.16), 18.8 ms
Minibatch loss: 2.867, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.8%
Step 1100 (epoch 1.28), 18.9 ms
Minibatch loss: 2.822, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.5%
Step 1200 (epoch 1.40), 18.9 ms
Minibatch loss: 2.943, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.6%
Step 1300 (epoch 1.51), 18.8 ms
Minibatch loss: 2.790, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.8%
Step 1400 (epoch 1.63), 18.8 ms
Minibatch loss: 2.823, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.5%
Step 1500 (epoch 1.75), 18.8 ms
Minibatch loss: 2.855, learning rate: 0.009500
Minibatch error: 4.7%
Validation error: 1.4%
Step 1600 (epoch 1.86), 18.8 ms
Minibatch loss: 2.732, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.5%
Step 1700 (epoch 1.98), 18.8 ms
Minibatch loss: 2.664, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.6%
Step 1800 (epoch 2.09), 18.8 ms
Minibatch loss: 2.661, learning rate: 0.009025
Minibatch error: 3.1%
Validation error: 1.4%
Step 1900 (epoch 2.21), 18.8 ms
Minibatch loss: 2.630, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.3%
Step 2000 (epoch 2.33), 18.8 ms
Minibatch loss: 2.633, learning rate: 0.009025
Minibatch error: 3.1%
Validation error: 1.3%
Step 2100 (epoch 2.44), 18.9 ms
Minibatch loss: 2.575, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.2%
Step 2200 (epoch 2.56), 18.8 ms
Minibatch loss: 2.564, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.2%
Step 2300 (epoch 2.68), 18.8 ms
Minibatch loss: 2.577, learning rate: 0.009025
Minibatch error: 3.1%
Validation error: 1.2%
Step 2400 (epoch 2.79), 18.8 ms
Minibatch loss: 2.508, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.2%
Step 2500 (epoch 2.91), 18.8 ms
Minibatch loss: 2.474, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2600 (epoch 3.03), 18.9 ms
Minibatch loss: 2.453, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.2%
Step 2700 (epoch 3.14), 18.8 ms
Minibatch loss: 2.509, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.2%
Step 2800 (epoch 3.26), 18.8 ms
Minibatch loss: 2.451, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.2%
Step 2900 (epoch 3.37), 18.8 ms
Minibatch loss: 2.518, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 3000 (epoch 3.49), 18.8 ms
Minibatch loss: 2.410, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 0.9%
Step 3100 (epoch 3.61), 18.8 ms
Minibatch loss: 2.380, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 1.1%
Step 3200 (epoch 3.72), 18.8 ms
Minibatch loss: 2.358, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.2%
Step 3300 (epoch 3.84), 18.9 ms
Minibatch loss: 2.319, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.0%
Step 3400 (epoch 3.96), 18.8 ms
Minibatch loss: 2.312, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 3500 (epoch 4.07), 18.9 ms
Minibatch loss: 2.270, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3600 (epoch 4.19), 18.9 ms
Minibatch loss: 2.252, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3700 (epoch 4.31), 18.8 ms
Minibatch loss: 2.229, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.1%
Step 3800 (epoch 4.42), 18.8 ms
Minibatch loss: 2.222, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3900 (epoch 4.54), 18.8 ms
Minibatch loss: 2.237, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 1.0%
Step 4000 (epoch 4.65), 18.8 ms
Minibatch loss: 2.273, learning rate: 0.008145
Minibatch error: 3.1%
Validation error: 1.0%
Step 4100 (epoch 4.77), 18.8 ms
Minibatch loss: 2.167, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 0.9%
Step 4200 (epoch 4.89), 18.8 ms
Minibatch loss: 2.172, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 1.1%
Step 4300 (epoch 5.00), 18.8 ms
Minibatch loss: 2.173, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 4400 (epoch 5.12), 18.9 ms
Minibatch loss: 2.121, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 1.2%
Step 4500 (epoch 5.24), 18.7 ms
Minibatch loss: 2.160, learning rate: 0.007738
Minibatch error: 3.1%
Validation error: 1.0%
Step 4600 (epoch 5.35), 18.8 ms
Minibatch loss: 2.084, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 1.0%
Step 4700 (epoch 5.47), 18.8 ms
Minibatch loss: 2.087, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 0.9%
Step 4800 (epoch 5.59), 18.8 ms
Minibatch loss: 2.065, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 0.9%
Step 4900 (epoch 5.70), 18.9 ms
Minibatch loss: 2.057, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 5000 (epoch 5.82), 18.9 ms
Minibatch loss: 2.141, learning rate: 0.007738
Minibatch error: 4.7%
Validation error: 1.0%
Step 5100 (epoch 5.93), 18.9 ms
Minibatch loss: 2.006, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.2%
Step 5200 (epoch 6.05), 18.8 ms
Minibatch loss: 2.100, learning rate: 0.007351
Minibatch error: 7.8%
Validation error: 0.9%
Step 5300 (epoch 6.17), 18.8 ms
Minibatch loss: 1.971, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.1%
Step 5400 (epoch 6.28), 18.8 ms
Minibatch loss: 1.964, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.0%
Step 5500 (epoch 6.40), 18.8 ms
Minibatch loss: 1.987, learning rate: 0.007351
Minibatch error: 1.6%
Validation error: 1.0%
Step 5600 (epoch 6.52), 18.7 ms
Minibatch loss: 1.931, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5700 (epoch 6.63), 18.8 ms
Minibatch loss: 1.910, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5800 (epoch 6.75), 18.8 ms
Minibatch loss: 1.902, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5900 (epoch 6.87), 18.8 ms
Minibatch loss: 1.889, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.1%
Step 6000 (epoch 6.98), 18.8 ms
Minibatch loss: 1.898, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.0%
Step 6100 (epoch 7.10), 18.9 ms
Minibatch loss: 1.867, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6200 (epoch 7.21), 18.9 ms
Minibatch loss: 1.844, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6300 (epoch 7.33), 18.8 ms
Minibatch loss: 1.834, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6400 (epoch 7.45), 18.8 ms
Minibatch loss: 1.828, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6500 (epoch 7.56), 18.8 ms
Minibatch loss: 1.807, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.9%
Step 6600 (epoch 7.68), 18.9 ms
Minibatch loss: 1.819, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.9%
Step 6700 (epoch 7.80), 18.9 ms
Minibatch loss: 1.782, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 1.0%
Step 6800 (epoch 7.91), 18.8 ms
Minibatch loss: 1.770, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 1.0%
Step 6900 (epoch 8.03), 18.8 ms
Minibatch loss: 1.757, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7000 (epoch 8.15), 18.8 ms
Minibatch loss: 1.752, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7100 (epoch 8.26), 18.9 ms
Minibatch loss: 1.735, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7200 (epoch 8.38), 18.8 ms
Minibatch loss: 1.734, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 1.0%
Step 7300 (epoch 8.49), 18.8 ms
Minibatch loss: 1.743, learning rate: 0.006634
Minibatch error: 3.1%
Validation error: 0.9%
Step 7400 (epoch 8.61), 18.8 ms
Minibatch loss: 1.700, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7500 (epoch 8.73), 18.9 ms
Minibatch loss: 1.691, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7600 (epoch 8.84), 18.9 ms
Minibatch loss: 1.748, learning rate: 0.006634
Minibatch error: 1.6%
Validation error: 0.9%
Step 7700 (epoch 8.96), 18.8 ms
Minibatch loss: 1.666, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7800 (epoch 9.08), 18.9 ms
Minibatch loss: 1.657, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 7900 (epoch 9.19), 18.8 ms
Minibatch loss: 1.645, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8000 (epoch 9.31), 18.9 ms
Minibatch loss: 1.643, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 8100 (epoch 9.43), 18.8 ms
Minibatch loss: 1.629, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8200 (epoch 9.54), 18.8 ms
Minibatch loss: 1.617, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 8300 (epoch 9.66), 18.8 ms
Minibatch loss: 1.612, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8400 (epoch 9.77), 18.8 ms
Minibatch loss: 1.596, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8500 (epoch 9.89), 18.8 ms
Minibatch loss: 1.605, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Test error: 0.8%

You can visit the DockerHub page for the tensorflow image here. Additionally, the actual Dockerfile used to generate this TensorFlow image on DockerHub is available on github here. If you look at the Dockerfile for creating the GPU enable TensorFlow image you will see that it is built FROM the NVIDIA container nvidia/cuda:8.0-cudnn5-devel. Building the image from the Dockerfile ensures that the image is fresh and contains the latest version of everything but it does take a long time to build. Usually, pulling the container from DockerHub is much faster but might not contain most recent versions of all src etc.

You can learn more about MNIST in TensorFlow here

Example: MNIST with MXNet

Lets first pull the MXNet container from DockerHub


In [55]:
docker pull kaixhin/cuda-mxnet:8.0


8.0: Pulling from kaixhin/cuda-mxnet

















Digest: sha256:4272d98e8f8e2904a4650369f26cc503ece7333d3cf5aafed49830dc1718fc55
Status: Downloaded newer image for kaixhin/cuda-mxnet:8.0

Next, we run the MXNet example for just a few epochs of training


In [58]:
nvidia-docker run                                             \
--rm                                                          \
--workdir=/root/mxnet/example/image-classification            \
kaixhin/cuda-mxnet:8.0                                        \
python train_mnist.py --network lenet --gpus 0 --num-epochs 2


libdc1394 error: Failed to initialize libdc1394
INFO:root:start with arguments Namespace(batch_size=64, disp_batches=100, gpus='0', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='10', model_prefix=None, mom=0.9, network='lenet', num_classes=10, num_epochs=2, num_examples=60000, num_layers=None, optimizer='sgd', test_io=0, top_k=0, wd=0.0001)
INFO:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com
DEBUG:urllib3.connectionpool:Setting read timeout to <object object at 0x7f2fd7d270c0>
DEBUG:urllib3.connectionpool:"GET /exdb/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 200 28881
INFO:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com
DEBUG:urllib3.connectionpool:Setting read timeout to <object object at 0x7f2fd7d270c0>
DEBUG:urllib3.connectionpool:"GET /exdb/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 200 9912422
INFO:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com
DEBUG:urllib3.connectionpool:Setting read timeout to <object object at 0x7f2fd7d270c0>
DEBUG:urllib3.connectionpool:"GET /exdb/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 200 4542
INFO:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com
DEBUG:urllib3.connectionpool:Setting read timeout to <object object at 0x7f2fd7d270c0>
DEBUG:urllib3.connectionpool:"GET /exdb/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 200 1648877
INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [100]	Speed: 12441.73 samples/sec	Train-accuracy=0.204219
INFO:root:Epoch[0] Batch [200]	Speed: 12389.08 samples/sec	Train-accuracy=0.437188
INFO:root:Epoch[0] Batch [300]	Speed: 12354.09 samples/sec	Train-accuracy=0.590938
INFO:root:Epoch[0] Batch [400]	Speed: 12582.99 samples/sec	Train-accuracy=0.643281
INFO:root:Epoch[0] Batch [500]	Speed: 12577.48 samples/sec	Train-accuracy=0.659375
INFO:root:Epoch[0] Batch [600]	Speed: 12598.03 samples/sec	Train-accuracy=0.736563
INFO:root:Epoch[0] Batch [700]	Speed: 12625.93 samples/sec	Train-accuracy=0.718750
INFO:root:Epoch[0] Batch [800]	Speed: 12580.54 samples/sec	Train-accuracy=0.745000
INFO:root:Epoch[0] Batch [900]	Speed: 12465.23 samples/sec	Train-accuracy=0.745000
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=4.814
INFO:root:Epoch[0] Validation-accuracy=0.758260
INFO:root:Epoch[1] Batch [100]	Speed: 12540.31 samples/sec	Train-accuracy=0.722812
INFO:root:Epoch[1] Batch [200]	Speed: 12470.70 samples/sec	Train-accuracy=0.720156
INFO:root:Epoch[1] Batch [300]	Speed: 12450.90 samples/sec	Train-accuracy=0.758437
INFO:root:Epoch[1] Batch [400]	Speed: 12462.37 samples/sec	Train-accuracy=0.726719
INFO:root:Epoch[1] Batch [500]	Speed: 12435.83 samples/sec	Train-accuracy=0.732812
INFO:root:Epoch[1] Batch [600]	Speed: 12403.44 samples/sec	Train-accuracy=0.759844
INFO:root:Epoch[1] Batch [700]	Speed: 12509.41 samples/sec	Train-accuracy=0.753125
INFO:root:Epoch[1] Batch [800]	Speed: 12438.39 samples/sec	Train-accuracy=0.777188
INFO:root:Epoch[1] Batch [900]	Speed: 12424.33 samples/sec	Train-accuracy=0.785781
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=4.821
INFO:root:Epoch[1] Validation-accuracy=0.705613

There you go! Notice that we have run both TensorFlow and MXNet frameworks with little to no effort in configuring the frameworks. We just pull the latest images (with gpu tags) and run the training in the a container. Setting up these frameworks on your local machine is often a length task taking hours or even days (worst case). As you can see, using containers encapsulates the framework and allows us to focus on actually getting work done. Not to mention, using containers keeps the local machine clean.

You might notice the error in the output:

libdc1394 error: Failed to initialize libdc1394

but don't worry about it. The libdc1394 is actually a camera driver and does not effect this example.

If you look at the Dockerfile for creating the GPU enable MXNet docker image you will see that it is built FROM the NVIDIA container nvidia/cuda:8.0-cudnn5-devel just like the TensorFlow image. Again, building the image from the Dockerfile ensures that the image is fresh and contains the latest version of everything but it does take a long time to build. Usually, pulling the container from DockerHub is much faster but might not contain most recent versions of all src etc. Both TensorFlow and MXNet have regular/routine image builds pushed to DockerHub.

You can learn more about MNIST in MXNet here.

Private Image Repositories

There are many reason you might want to run your own local repository. For example, maybe there is super secret code in your container or your network is restricted and you can't access DockerHub. There are basic instructions avaliable on the docker blog for getting started with your own local registry. For simplicity, we are actually going to use a docker container to run our local registry.

Lets first pull the registry image from DockerHub


In [59]:
docker pull registry


Using default tag: latest
latest: Pulling from library/registry





Digest: sha256:1152291c7f93a4ea2ddc95e46d142c31e743b6dd70e194af9e6ebe530f782c17
Status: Downloaded newer image for registry:latest

Notice we did not have to specify a use name there! Next, we launch a container in detached mode (i.e. background) since the registry is a service which runs continuously waiting for requests. If we don't use detatch here then we never finish evaluating this command and we just wait and wait ... and wait since the service will never quit.


In [61]:
docker run --detach --publish 5000:5000 registry:latest


917dc854922f1071fe9d517b6bde966c88f6c999810b660ae04402de0b62a266

Now we have a container running in the background. You might have noticed that we launched this container with docker rather than nvidia-docker. This is because there is no GPU activity in the registry image so there is no need to use the nvidia docker plugin here when launching associated containers.

Notice that we have published port 5000 from the container and have mapped it to our localhost port of 5000. In this way, the container can listen for incoming connections. Learn more about binding container ports to the host here.

To see our running containers use the docker ps command


In [62]:
docker ps


CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                    NAMES
917dc854922f        registry:latest     "/entrypoint.sh /etc/"   About a minute ago   Up About a minute   0.0.0.0:5000->5000/tcp   evil_stonebraker

We can get a response from the registry using curl


In [63]:
curl -i http://localhost:5000/v2








<a href="/v2/">Moved Permanently</a>.

As we can see from the reponse, this Docker registry is running with API version 2.0. More on the docker registry API here. We are now ready to prepare an image for our local registry. The key here is that we must use the full docker naming convension so that docker knows where to put the image. To do this we will simply use the tag command to manipulate the image name. For demonstration purposes we will use the docker image busybox which is a very small image (~1MB) which doesn't do anything.


In [64]:
docker pull busybox


Using default tag: latest
latest: Pulling from library/busybox

Digest: sha256:29f5d56d12684887bdfa50dcd29fc31eea4aaf4ad3bec43daf19026a7ce69912
Status: Downloaded newer image for busybox:latest

In [65]:
docker images


REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
mxnet                   latest              dc34347c591b        59 minutes ago      3.27 GB
<none>                  <none>              1db339a39660        About an hour ago   7.329 GB
kaixhin/cuda-mxnet      8.0                 df23056fd695        9 hours ago         3.614 GB
nvidia/cuda             8.0-cudnn5-devel    d1ff577d4d0c        22 hours ago        1.778 GB
nvidia/cuda             latest              d189f42bc63e        22 hours ago        1.617 GB
tensorflow/tensorflow   latest-gpu          48a64e7e7fee        8 days ago          2.62 GB
registry                latest              c9bd19d022f6        8 weeks ago         33.27 MB
busybox                 latest              e02e811dd08f        10 weeks ago        1.093 MB

In [66]:
docker tag busybox localhost:5000/busybusy



Now with issue an images command to see the localhost:5000/busybusy:latest image created by the tag command


In [67]:
docker images


REPOSITORY                TAG                 IMAGE ID            CREATED             SIZE
mxnet                     latest              dc34347c591b        About an hour ago   3.27 GB
<none>                    <none>              1db339a39660        About an hour ago   7.329 GB
kaixhin/cuda-mxnet        8.0                 df23056fd695        9 hours ago         3.614 GB
nvidia/cuda               8.0-cudnn5-devel    d1ff577d4d0c        22 hours ago        1.778 GB
nvidia/cuda               latest              d189f42bc63e        22 hours ago        1.617 GB
tensorflow/tensorflow     latest-gpu          48a64e7e7fee        8 days ago          2.62 GB
registry                  latest              c9bd19d022f6        8 weeks ago         33.27 MB
busybox                   latest              e02e811dd08f        10 weeks ago        1.093 MB
localhost:5000/busybusy   latest              e02e811dd08f        10 weeks ago        1.093 MB

Next, we use the push command to publish our image to the local registry


In [68]:
docker push localhost:5000/busybusy


The push refers to a repository [localhost:5000/busybusy]

latest: digest: sha256:29f5d56d12684887bdfa50dcd29fc31eea4aaf4ad3bec43daf19026a7ce69912 size: 527

Now, we can ask the registry for the image catalog to verify our image is now available


In [70]:
curl http://localhost:5000/v2/_catalog


{"repositories":["busybusy"]}

If we push another image (with different tag) to the local repository we will see another entry in the catalog


In [71]:
docker tag busybox localhost:5000/verybusy
docker push localhost:5000/verybusy
curl http://localhost:5000/v2/_catalog


The push refers to a repository [localhost:5000/verybusy]

latest: digest: sha256:29f5d56d12684887bdfa50dcd29fc31eea4aaf4ad3bec43daf19026a7ce69912 size: 527
{"repositories":["busybusy","verybusy"]}

Each image in the repository is described by a manifest which can be accessed via the API. For example, to access the manifest describing the image verybusy we write


In [80]:
curl http://localhost:5000/v2/verybusy/manifests/latest | head -15


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2735  100  2735    0     0   433k      0 --:--:-- --:--:-- --:--:--  445k
{
   "schemaVersion": 1,
   "name": "verybusy",
   "tag": "latest",
   "architecture": "amd64",
   "fsLayers": [
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      },
      {
         "blobSum": "sha256:56bec22e355981d8ba0878c6c2f23b21f422f30ab0aba188b54f1ffeff59c190"
      }
   ],
   "history": [
      {

Finally, we can ask for available tags with the API call

GET /v2/<name>/tags/list 

as follows


In [81]:
curl http://localhost:5000/v2/verybusy/tags/list


{"name":"verybusy","tags":["latest"]}

For more information, see the official docker documentation for registry deployment.

Don't forget to stop your registry container and clean up!


In [85]:
docker ps


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

In [83]:
docker stop 917dc854922f && docker rm 917dc854922f


917dc854922f

Summary

In this section you learned how to leverage images hosted on the public docker image registry DockerHub. You can always pull images from DockerHub anonymously but if you want to push images you'll need to sign up and get a Docker ID. When pulling and images to you local machine it is important to keep in mind that updates to the images can occur on DockerHub but you'll need to re-pull the image to get the updates. Just keep this in mind so that you don't get stuck working with stale images. A common practice is to distribute Dockerfiles which allows users to build the image locally which ensures that all the image contents are up-to-date. Furthermore, we identified docker images hosted on DockerHub for major deep learning frameworks such as TensorFlow and MXNet as well as where to find their respective Dockerfiles. For both of these frameworks we demonstrated how to run deep learning training on the MNIST dataset. Finally, for those not able or willing to use public image repositories such as DockerHub, we breifely showed how to create a local docker registry using the registry:latest image and interact with the service using the Docker Registry HTTP API V2. In configuring this local docker registry we learned how to lauch containers in detached mode so that the container could run continuously in the background while returning control to the user. Finally, for services requiring connections we saw how we can bind host and container network ports.

What's Next?

What you have seen hear is just the tip of the iceberg. In follow on tutorials we will show how to scale out containerized workflows using Kubernetes and Mesos. In learning how to scale out containerized applications you'll need to understand a bit more about docker contianer networking. Additionally, as you might guess ther is an entire ecosystem of utilities and services for managing docker configurations, development, and deployments such as runc, nsenter, Docker Remote, docker-py, Docker Compose, Docker Machine, Docker Swarm, and Vagrant. Not to mention, container support by the major Cloud service providers such as Google Compute Engine, Amazon Web Services, and Microsoft Azure. Finally, there is a new breed of operating systems gaining popularity which only contain enough functionality to be able to launch containers! Examples of these optimized container OSes are CoreOS, Atomic, Ubuntu Core, and RancherOS. Many of these container OSes are already supported on the aforementioned Cloud providers, AWS in particular.

Appendix A

Here are some installation details for getting docker and nvidia-docker up and running from scratch ...

Step 0 - GPU Driver

In order to get GPU access with in docker/nvidia-docker we need to make sure that the NVIDIA driver is available on the host system. It's possible to obtain the appropriate device driver from either the standard driver download page or via CUDA installation.

Step 1 - Docker Install

Once the NVIDIA device driver has been successfully installed, we need to install docker it self. The installation of docker is quite simple but it is just slightly different for each OS. The steps for docker installation on Ubuntu 14.04 can be found here. Don't worry, the docker docs have install instructions for many other operating systems including RedHat, CentOS, Debian, and so on.

Docker provides an official installation script via https://get.docker.com which can accessed via command-line using "wget -qO-" or "curl -sSL"

Step 2 - NVIDIA Docker

The final configuration step is to obtain the nvidia-docker plugin which properly exposes the GPU hardware and drivers for docker containers. Official installation instructions for nvidia-docker for Ubuntu, CentOS, and other distributions can be found here

Appendix B

Here is the dockerfile for adding Jupyter Notebooks to an image (thanks Ryan Olson @ NVIDIA)


In [ ]:
# create config file for jupyter (file copied by Dockerfile)
cat << LINES > jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.port = 8888
c.NotebookApp.open_browser = False
LINES

In [ ]:
# create run hook for launch (file copied by Dockerfile)
cat << LINES > jupyter_notebook_config.py
#!/bin/bash
jupyter notebook "$@"
LINES

In [ ]:
cat << LINES > Dockerfile
FROM <YOUR/IMAGE:HERE>

RUN apt-get update && apt-get install -y \
        libzmq3-dev \
        python-dev \
        python-matplotlib \
        python-pandas \
        python-pip \
        python-sklearn && \
    rm -rf /var/lib/apt/lists/*

RUN pip install \
        ipykernel \
        jupyter && \
    python -m ipykernel.kernelspec

COPY jupyter_notebook_config.py /root/.jupyter/

COPY jupyter.sh /usr/local/bin

WORKDIR /data
VOLUME /data

EXPOSE 8888

CMD ["jupyter.sh"]

LINES