Introduction to Cloud Computing with Docker

Professor Robert J. Brunner

</DIV>


Introduction

In this Notebook, we will review some basic Docker issues, including an overview of the Docker engine and basic Docker usage scenarios, advanced Docker commands, how to acquire the Hadoop docker image, and how how to perform a Boot2Docker tune-up.


Docker Overview

To date in this course, we have pulled Docker images from the official Docker registry hub, run containers from these images, and stopped containers that were running. While this basic approach has already provided considerable benefit, Docker actually supports significantly more functionality. Before proceeding, a quick overview of the Docker technology should prove useful.

First, in order to run any Docker commands, we need to have a Docker daemon process. The Docker daemon processes the Docker commands that we enter, such as docker ps or docker run. On Linux systems, this daemon is fairly lightweight since many of the core functions required for Docker to work are embedded in the Linux kernel. On Windows or Mac OSX laptops, however, we need a separate application to function as the Docker daemon. This application is Boot2Docker, which uses the VirtualBox virtual appliance manager to operate a compact Linux OS (tinycore64 Linux) to provide the Docker daemon functionality. Since we are using a remote Docker daemon like Boot2Docker, we do not need to use sudo before executing any Docker command, as is typically shown in the Docker documentation. The docker daemon can be explicitly run by using the docker -d command, but in this class we do not need to do this since the boot2docker application already provides us with a Docker daemon.

Docker manages software by dealing with layers. A read-only layer is known as an image, and images can be stacked to form a more complex image. In this manner, we can start with a base operating system image and add software layers on top of this parent image to build a more complex image. This layering can be seen when you pull a Docker image and different components are individually pulled from the Docker registry. Images do not change and thus they have no state. Images are referred to by a 64 hexadecimal digit string, although a short identifier string consisting of the first twelve digits can be used at the command line.

When an image is run, a new Docker container is created from the indicated image in which changes are allowed. The container can run in either detached or foreground mode. Detached mode is often used by servers (like our IPython Notebook server) as the container runs in the background. Interactions with the server are either through networked connections or via shared volumes. A detached container can be connected to via a docker attach command, or via the exec command for containers that support interactive access. In foreground mode, the container by default attaches to the console's STDIN, STDOUT, and STDERR (although this can be changed).

A running container can be referenced in one of three ways:

  1. the full 64 hexadecimal digit string (UUID)
  2. the first twelve digits from the full digit string (short UUID)
  3. the human readable name assigned by the Docker daemon (or via a provided tag).

A container can have an explicit restart policy, which can be useful for running server processes, for example, a container can be started to always restart when a container exits. In addition, by default a container's file system persists after the container exists. This allows post-mortem debugging, however, these zombie containers can quickly add up, consuming resources. A container can be started with the --rm flag to indicate that the container's file system should be removed after the container exits.


Hadoop Docker image

If you haven't yet done so, download the Hadoop Docker image built by SequenceIQ using an Ubuntu OS. Note that this is a trusted Docker container, and is automatically built and made available on the Docker Hub registry.

You can do this by opening a new Boot2Docker shell and entering:

docker pull sequenceiq/hadoop-ubuntu:2.6.0

This downloads an Ubuntu-based Docker container that contains a functional Hadoop environment. One component missing from this container is support for Hadoop streaming, which we will address in the introduction to Hadoop IPython Notebook. To simplify using this Docker image, we can add a tag to the Docker image that allows us to refer to the image by using a shorter name. The syntax for this command is shown below, which you should enter at a Boot2Docker prompt.

docker tag sequenceiq/hadoop-ubuntu:2.6.0 hadoop


Docker Volumes

We have already used the Docker shared volume feature to share a folder between our host operating system and the Docker container by using the -v flag. By default, the Boot2Docker application will try to mount /Users/ on a Mac OSX system or C:\Users on a Windows system so that we can easily mount host directories under these directories on a running Docker container.

A more general approach can be used for sharing folders, however, where a specific data volume can be created and used by other Docker containers. For example, we could create a shared data volume from our IPython Server Docker container by using the following Docker command:

docker create -v /notebooks --name notebooks lcdm/info490

We could subsequently reuse this volume in other containers, for example an Hadoop Docker container, by using the --volumes-from attribute to the docker run command:

docker run -it --volumes-from notebooks --name hd1 hadoop

This data volume could be referenced as many times as necessary, simplifying the sharing of data (for example from an instructor to students in a classroom with shared disk space.


Linking Containers

Docker containers are useful even when run by themselves, however, Docker provides several methods to link containers together. This can be useful wen server processes need to communicate or when building more complex processes. The simpelst approach to link Docker containers is by network port mappings by using the -p flag, where each Docker container exposes network ports to other Docker containers to enable inter-container communication. A second approach involves using container names to link two or more running containers via the --link flag, which creates a secure tunnel between linked containers to enable communication. This linking information can be accessed in a linked container by either using environmental variables or entries in the /etc/hosts file.

A more powerful approach to connecting Docker containers is Docker Swarm, which is still underdevelopment. Docker Swarm provides native clustering for Docker containers that can turn a pool of Docker containers into a single, virtual host.


Advanced Docker Commands

Of the many Docker commands, most accept very few parameters. The lone exception is docker run which actually creates a running container from a Docker image. in the previous few sections, we have commented on some of the more advanced options that can be passed to the docker run command to enable shared data volumes, to force container deletion upon exit, or to link containers. The full list of Docker commands is quite lengthy, and can be access at the command line by simply entering docker. Additional help on each command can be ontained at the command line by entering the docker and the command name, like docker tag.

Some of the more notable Docker command line tools include:

  • cp: used to copy data into a running Docker container from the host operating system.
  • history: displays the history of a Docker image.
  • info: displays system-wide docker information.
  • restart: used to restart a stopped container.
  • rm: remove docker container, use -f flag to force removal
  • rmi: remove a Docker image, use the -f flag to force removal
  • search: search the Docker official registry for specific Docker images.
  • stats: used to monitor the system resources used by a running container.
  • stop: used to stop a running Docker container.
  • tag: used to add tags, like a new, human-readable name to a image or container.
  • top: used to monitor usage of a running container.


The Boot2Docker Partition

A Boot2Docker virtual machine has, by default, only 18.2 GB of allocated space. Over time, this space can become consumed by Docker images or containers. This can easily happen for two reasons:

  1. When you pull the latest tagged version of a Docker image, you often retain the original image(s).

  2. When a container exits, by default its filesystem is persisted so that you can debug the results if necessary.

Given these facts, you may eventually run out of available disk space in your Boot2Docker partition. In this case, your Docker containers may fail to function as expected. To address this issue, we will now explore how to find out how much space you have available in a boot2docker partition, and how to release more space as necessary.

The simplest method to quantify the available disk for a Boot2Docker instance is to first start a Boot2Docker shell. Once you have a Boot2Docker prompt, you can secure-shell into the Boot2Docker virtual machine by entering:

boot2docker ssh

This will open a shell in the Boot2Docker virtual machine (that is running in VirtualBox). At this new prompt, you can enter df -h at the boot2docker virtual machine prompt. This will display the disk usage information in a human readable format as demonstrated in the following screenshot.

In this case, the /dev/sda1 device holds the boot2docker partition where we store all data for the virtual machine and all docker images and containers. In this case, 14.2 GB has been used, leaving only 2.9 GB.


Docker Resource Usage

The amount of space available to the Boot2docker partition seems rather small. To find out what is consuming the available resources, we can see the list of Docker images by running the docker images command, while to see the list of Docker containers we can use the docker ps command. By itself, however, docker ps simply displays the list of running containers. To display all docker containers, including those that have exited, you need to use the -a flag. On my system, after already removing a number of exited containers, the docker ps -a command results in the following display:

If the listing for either the Docker images or containers is rather long, you can always simply count the number of images or containers by piping the output of the respective Docker command into the wc -l command. For example, to count the number of Docker images you would enter:

docker images | wc -l

While to count the number of Docker containers you would enter:

docker ps -a | wc -l

If either of these commands indicate you have ore than a handful of images or containers, you likely need to cleanup Docker resources.


Docker Cleanup

After a lengthy period of using Docker, we can easily consume available resources, leading to running containers being unable to acquire sufficient resources to run properly, like swap memory. When this occurs (or before for the proactive) we have several options to cleanup unnecessary Docker images or containers:

  1. Remove everything and return to a clean slate.
  2. Resize the boot2docker partition to provide more space
  3. Perform selective clean-up

The first option is simple and an easy choice if you do not need to retain any exited containers and can easily reacquire any necessary Docker images. For example, in this class you can always re-download the course Docker image, and if you have used shared folders you still have all data saved locally. To perform this step, we simply run the following two commands to remove all images and containers and reinitialize the boot2docker environment. Note that this removes everything, so only use if you have nothing of value in a saved container or image.

boot2docker destroy
boot2docker init

The second option is more difficult, in that you have to manually perform a number of steps as detailed in the official Docker documentation. However, this approach does not require any changes to existing Docker images or containers.

The final option involves cleaning up unused images and containers. The actual approach you should take depends on the number of images and containers you may have in your boot2docker environment. If you have a small number of images and containers, the best approach is to docker rm to remove unneeded containers and docker rmi to remove unneeded images.

If you have a large number of either containers or images, however, this task should be automated. You can either remove all containers or images by using one or both of the following commands:

docker rm `docker ps -a -q`
docker rmi `docker images -q`

Or you can list the containers or images and use the Unix grep command to select specific containers or images for removal, which can be done by using the Unix xargs command.

For example, to remove all images that are untagged (for example, when newer versions of an image are pulled, old versions may become untagged as they are no longer the latest version).

docker images | grep lcdm | awk '{print $3 ; }' | xargs -n 1 docker rmi