To date in this course, we have pulled Docker images from the official Docker registry hub, run containers from these images, and stopped containers that were running. While this basic approach has already provided considerable benefit, Docker actually supports significantly more functionality. Before proceeding, a quick overview of the Docker technology should prove useful.
First, in order to run any Docker commands, we need to have a Docker
daemon process. The Docker daemon processes the Docker commands that we
enter, such as docker ps
or docker run
. On Linux systems, this
daemon is fairly lightweight since many of the core functions required
for Docker to work are embedded in the Linux kernel. On Windows or Mac
OSX laptops, however, we need a separate application to function as the
Docker daemon. This application is Boot2Docker, which uses the
VirtualBox virtual appliance manager to operate a compact Linux OS
(tinycore64 Linux) to provide the Docker daemon functionality. Since
we are using a remote Docker daemon like Boot2Docker, we do not need to
use sudo
before executing any Docker command, as is typically shown in
the Docker documentation. The docker daemon can be explicitly run by
using the docker -d
command, but in this class we do not need to do
this since the boot2docker application already provides us with a Docker
daemon.
Docker manages software by dealing with layers. A read-only layer is
known as an image, and images can be stacked to form a more
complex image. In this manner, we can start with a base operating system
image and add software layers on top of this parent image to build a
more complex image. This layering can be seen when you pull
a Docker
image and different components are individually pulled from the Docker
registry. Images do not change and thus they have no state. Images are
referred to by a 64 hexadecimal digit string, although a short
identifier string consisting of the first twelve digits can be used at
the command line.
When an image is run, a new Docker container is created from the
indicated image in which changes are allowed. The container can run in
either detached or foreground mode. Detached mode is often used by
servers (like our IPython Notebook server) as the container runs in the
background. Interactions with the server are either through networked
connections or via shared volumes. A detached container can be connected
to via a docker attach
command, or via the exec
command for
containers that support interactive access. In foreground mode, the
container by default attaches to the console's STDIN, STDOUT, and STDERR
(although this can be changed).
A running container can be referenced in one of three ways:
A container can have an explicit restart policy, which can be useful
for running server processes, for example, a container can be started to
always restart when a container exits. In addition, by default a
container's file system persists after the container exists. This allows
post-mortem debugging, however, these zombie containers can quickly add
up, consuming resources. A container can be started with the --rm
flag
to indicate that the container's file system should be removed after the
container exits.
If you haven't yet done so, download the Hadoop Docker image built by SequenceIQ using an Ubuntu OS. Note that this is a trusted Docker container, and is automatically built and made available on the Docker Hub registry.
You can do this by opening a new Boot2Docker shell and entering:
docker pull sequenceiq/hadoop-ubuntu:2.6.0
This downloads an Ubuntu-based Docker container that contains a functional Hadoop environment. One component missing from this container is support for Hadoop streaming, which we will address in the introduction to Hadoop IPython Notebook. To simplify using this Docker image, we can add a tag to the Docker image that allows us to refer to the image by using a shorter name. The syntax for this command is shown below, which you should enter at a Boot2Docker prompt.
docker tag sequenceiq/hadoop-ubuntu:2.6.0 hadoop
We have already used the Docker shared volume feature to share a folder
between our host operating system and the Docker container by using the
-v
flag. By default, the Boot2Docker application will try to mount
/Users/
on a Mac OSX system or C:\Users
on a Windows system so that
we can easily mount host directories under these directories on a
running Docker container.
A more general approach can be used for sharing folders, however, where a specific data volume can be created and used by other Docker containers. For example, we could create a shared data volume from our IPython Server Docker container by using the following Docker command:
docker create -v /notebooks --name notebooks lcdm/info490
We could subsequently reuse this volume in other containers, for example
an Hadoop Docker container, by using the --volumes-from
attribute to
the docker run
command:
docker run -it --volumes-from notebooks --name hd1 hadoop
This data volume could be referenced as many times as necessary, simplifying the sharing of data (for example from an instructor to students in a classroom with shared disk space.
Docker containers are useful even when run by themselves, however,
Docker provides several methods to link containers together. This
can be useful wen server processes need to communicate or when building
more complex processes. The simpelst approach to link Docker containers
is by network port mappings by using the -p
flag, where each Docker
container exposes network ports to other Docker containers to enable
inter-container communication. A second approach involves using
container names to link two or more running containers via the --link
flag, which creates a secure tunnel between linked containers to enable
communication. This linking information can be accessed in a linked
container by either using environmental variables or entries in the
/etc/hosts
file.
A more powerful approach to connecting Docker containers is Docker Swarm, which is still underdevelopment. Docker Swarm provides native clustering for Docker containers that can turn a pool of Docker containers into a single, virtual host.
Of the many Docker commands, most accept very few parameters. The
lone exception is docker run
which actually creates a running
container from a Docker image. in the previous few sections, we have
commented on some of the more advanced options that can be passed to the
docker run
command to enable shared data volumes, to force container
deletion upon exit, or to link containers. The full list of Docker
commands is quite lengthy, and can be access at the command line by
simply entering docker
. Additional help on each command can be
ontained at the command line by entering the docker
and the command
name, like docker tag
.
Some of the more notable Docker command line tools include:
cp
: used to copy data into a running Docker container from the host operating system.history
: displays the history of a Docker image.info
: displays system-wide docker information.restart
: used to restart a stopped container.rm
: remove docker container, use -f
flag to force removalrmi
: remove a Docker image, use the -f
flag to force removalsearch
: search the Docker official registry for specific Docker images.stats
: used to monitor the system resources used by a running container.stop
: used to stop a running Docker container.tag
: used to add tags, like a new, human-readable name to a image or container.top
: used to monitor usage of a running container.A Boot2Docker virtual machine has, by default, only 18.2 GB of allocated space. Over time, this space can become consumed by Docker images or containers. This can easily happen for two reasons:
When you pull the latest
tagged version of a Docker image, you
often retain the original image(s).
When a container exits, by default its filesystem is persisted so that you can debug the results if necessary.
Given these facts, you may eventually run out of available disk space in your Boot2Docker partition. In this case, your Docker containers may fail to function as expected. To address this issue, we will now explore how to find out how much space you have available in a boot2docker partition, and how to release more space as necessary.
The simplest method to quantify the available disk for a Boot2Docker instance is to first start a Boot2Docker shell. Once you have a Boot2Docker prompt, you can secure-shell into the Boot2Docker virtual machine by entering:
boot2docker ssh
This will open a shell in the Boot2Docker virtual machine (that is
running in VirtualBox). At this new prompt, you can enter df -h
at the
boot2docker virtual machine prompt. This will display the disk usage
information in a human readable format as demonstrated in the following
screenshot.
In this case, the /dev/sda1
device holds the boot2docker partition
where we store all data for the virtual machine and all docker images
and containers. In this case, 14.2 GB has been used, leaving only 2.9 GB.
The amount of space available to the Boot2docker partition seems rather
small. To find out what is consuming the available resources, we can see
the list of Docker images by running the docker images
command, while
to see the list of Docker containers we can use the docker ps
command.
By itself, however, docker ps
simply displays the list of running
containers. To display all docker containers, including those that have
exited, you need to use the -a
flag. On my system, after already
removing a number of exited containers, the docker ps -a
command
results in the following display:
If the listing for either the Docker images or containers is rather
long, you can always simply count the number of images or containers by
piping the output of the respective Docker command into the wc -l
command. For example, to count the number of Docker images you would
enter:
docker images | wc -l
While to count the number of Docker containers you would enter:
docker ps -a | wc -l
If either of these commands indicate you have ore than a handful of images or containers, you likely need to cleanup Docker resources.
After a lengthy period of using Docker, we can easily consume available resources, leading to running containers being unable to acquire sufficient resources to run properly, like swap memory. When this occurs (or before for the proactive) we have several options to cleanup unnecessary Docker images or containers:
The first option is simple and an easy choice if you do not need to retain any exited containers and can easily reacquire any necessary Docker images. For example, in this class you can always re-download the course Docker image, and if you have used shared folders you still have all data saved locally. To perform this step, we simply run the following two commands to remove all images and containers and reinitialize the boot2docker environment. Note that this removes everything, so only use if you have nothing of value in a saved container or image.
boot2docker destroy
boot2docker init
The second option is more difficult, in that you have to manually perform a number of steps as detailed in the official Docker documentation. However, this approach does not require any changes to existing Docker images or containers.
The final option involves cleaning up unused images and containers. The
actual approach you should take depends on the number of images and
containers you may have in your boot2docker environment. If you have a
small number of images and containers, the best approach is to docker
rm
to remove unneeded containers and docker rmi
to remove unneeded
images.
If you have a large number of either containers or images, however, this task should be automated. You can either remove all containers or images by using one or both of the following commands:
docker rm `docker ps -a -q`
docker rmi `docker images -q`
Or you can list the containers or images and use the Unix grep
command
to select specific containers or images for removal, which can be done
by using the Unix xargs
command.
For example, to remove all images that are untagged (for example, when newer versions of an image are pulled, old versions may become untagged as they are no longer the latest version).
docker images | grep lcdm | awk '{print $3 ; }' | xargs -n 1 docker rmi