Last edited: July 21, 2015
Ryan J. O'Neil
Engineering @ Yhat, Inc.
SEOR @ George Mason University
As cloud computing becomes more prevalent in the world of software and information technology, techniques for automated and repeatable cloning of computing environments grow increasingly important. One strategy for doing this is the creation of containers. A container is an isolated environment similar to a virtual machine, but with a much lighter weight architecture. Instead of setting another kernel on top of the host operating system, containers share the host's kernel. Container resources are isolated from the host, making it easy to run programs within their own custom environments, while still benefitting from low overhead.
Running containers on a Unix host operating system is not a new thing. Development of the chroot
system call for Unix and BSD dates back to the late 1970s and early '80s. This enabled creation of chroot
"jails" in which system users could execute commands as the root
user, but were only allowed to see a subset of the file system. Later, the FreeBSD project extended the concept into a command called jail
. In 2005, Sun Microsystems coined the term container with the creation of "Solaris Containers," which were also built upon the chroot
concept.
Recent developments in the Linux kernel have made container virtualization more convenient and powerful. Notable among these are the addition of "control groups" in the 2.6.24 kernel, which allow isolation of hardware resources to a group of processes, and "namespace isolation" in the 3.8 kernel, which isolates processes from seeing other processes on the same system. These advancements form the basis for modern container architectures, such as LXC, Docker, and Rocket, which allow containers to function as though they are their own operating systems and user spaces.
The lure of low-overhead virtualization is proving extremely attractive for engineers across many disciplines in information technology. Containers provide easy replication of environments for software development, testing, and serving of infrastructure needs. They can be set up and torn down quickly without modifying their hosts, thus mapping naturally to distributed architectures. They allow one to capture complicated depedency structures for software in a single place, significantly speeding up development and testing times.
However, virtualization is no silver bullet. With the adoption of containers systems come new operational problems. This paper explores one of these which we call the Docker Image Construction Problem. While our focus will be on Docker's specific implementation of containers, the salient aspects of this problem can be mapped to other container implementations as well.
Consider a scenario many devops professionals are faced with regularly: the creation of environments in which to develop, test, and finally running prduction software. Depending on the scale of a given organization, there may be a great many of these. Each environment will require that a number of commands be run in order to create it. In Docker's terminology, an environment is an image, and an image is the result of running its associated commands. Each command requires a, potentially large, amount of time to run. The computing time required to create an image is the sum of the time required to run its commands.
For the purposes of this paper, we assume these commands are independent and that their order does not matter. There can, of course, be instances where that is not true. But those are refinements of the problem we present here. Examples of commands for creating these environment include:
Any number of commands are possible to set up these environments, many of which require substantial computing time. In an organization that embraces the idea of continuous integration, these environments will be built up and torn down regularly, magnifying the cost of building them significantly.
Without virtualization or other techniques to mitigate the required computing time, the resources required will always be the sum of time required for each command in every image. That is, if we have a set $I$ of images to build and for each $i \in I$ there is a set of commands $c \in C_i$ where the time required by commmand $c$ is $t_c$, then our required compute time is:
$$\sum_{i \in I} \sum_{c \in C_i} t_c$$It is common for commands to repeat across the creation of multiple images. An organization may standardize on a particular programming language or set of development tools. There may be a single data set created for both development and staging of a software product. Each image may use the same operating system as its base and expect that its kernel is regularly updated with the latest security patches.
If a container technology such as Docker is embraced, this sharing of commnands may allow one to reduce the compute time required to create a set of images. Docker, for instance, maintains an image cache. If a series of commands are run, it will reuse whatever it can safely determine has already been accomplished. Thus our goal in formulating and exploring this problem is build a set of required images in such as way that maximizes our use of the Docker cache. This will result in minimizing the requisite compute time, or makespan.
An image in Docker is constructed by running a series of commands. Commands are run in a specified order, and after each one Docker saves the changes made by that command to a layer. The final image is the result of applying each command's associated layer to some start image, such as a clean install of a Linux distribution.
For example, a given image may require the following commands be run in any order. We provide single-character aliases for them for the sake of convenience.
Command | |
---|---|
$A$ | Install the Java compiler and runtime environment. |
$B$ | Download a set of external dependencies. |
$C$ | Set of a database schema and populate it with test data. |
Docker first runs command $A$ and saves the results of that command to its cache. It now run command $B$ and saves the results of $AB$ to the cache. Finally, it runs command $C$ and saves the reults of $ABC$ to the cache. Thus, layers are constructed and cached additively. Docker has $A$ and $AB$ in its cache, but not $B$ by itself.
After constructing Image 1, the Docker cache contains $\{A, AB, ABC\}$. So far we are building only a single image and each comand is unique, so this schedule obviously uses the minimal amount of compute time. Now say we want to build another version of the same image, but with a new kernel.
Command | |
---|---|
$A$ | Install the Java compiler and runtime environment. |
$B$ | Download a set of external dependencies. |
$C$ | Set of a database schema and populate it with test data. |
$D$ | Upgrade the kernel to the latest version. |
We request the image $ABCD$ from Docker. It first sees that $A$ already exists in its cache and need not be re-created. The same goes for $AB$ and $ABC$. $ABCD$ does not exist in the cache, so Docker runs command $D$ and applies it to $ABC$, saving the result in its cache.
In this case we were able to build two unique images, while saving ourselves the time required to run commands $A$, $B$, and $C$ a second time. This is due to the mechanics of Docker's cache. Clearly, our current schedule for the creation of these two image uses the minimal amount of compute time.
However, say we are not so good at scheduling. Instead of putting the kernel upgrade last on our creation of Image 2, we decide to put it first. Now we request image $DABC$ from Docker. Docker looks at the first command and sees that it doe not have $D$ in its cache. It runs $D$ and caches its results. Similarly, it does not have $DA$, so it (needlessly) runs $A$ again. The same goes for $B$ and $C$. At the end of the process, we have run $D$ once, and $A$, $B$, $C$ twice each. This results in using the maximum amount of compute time possible for creation of our two images.
It is important to note that the Docker uses the full lists of commands in order as its cache keys. This means that a set of images diverge by running different commands, they cannot be brought back together again. Figure 3 shows an image construction plan that is not possible.
What actually happens in this case is that the time required for running command $D$ is incurred twice. Figure 4 shows the way that works using Docker's cache.
This is feasible but clearly not minimal. Moving command $D$ earlier in the plan allows the two images to share its cost.
It is easy to see that, due to the particulars of its cache mechanics, Docker is very sensitive to the order in which commands are run during image creation. Certain command schedules may waste much more compute time than others. Our goal is to always find the best schedule for creation of images.
In this section we propose a Mixed Integer Linear Program (MILP) for constructing sets of Docker images while requiring the minimal makespan. As a proxy for minimizing makespan we instead maximize our use of the Docker cache, which is easier to measure within the context of a linear model. Our model takes a input a set of commands and their associated compute times, along with a set of required images and which commands they are constructed out of. We assume the commmand times are known a priori and that they can run in any order to construct an image. The output is a schedule for creating the required images that can be executed directly within Docker.
For the sake of simplicity, we first consider the construction plans for each image irrespective of any other images we are attempting to construct. Let $I = \{1,\dotsc,n\}$ be the set of Docker images we must construct and $C = \{1,\dotsc,m\}$ the set of commands available. The commands used to construct image $i \in I$ is $C_i \subseteq C$.
We define a stage of a command with respect to an image $i \in I$ as the order that command is executed with respected to the other commands comprising the image. Each command must be executed once, so image $i$ has stages $s = 1,\dotsc,\left\vert{C_i}\right\vert$.
For each image, each command must be executed once and each stage can only have one command assigned to it. This results in a set of assignments contraints. $x_{i,s,c}$ is a binary variable that takes the value $1$ if the plan for constructing image $i$ runs command $c$ during stage $s$, and $0$ if it does not. Thus for each image $i \in I$ we have the following constraints.
$$\sum\limits_{c \in C_i} x_{i,s,c} = 1\, \forall\, i \in I,\, s = 1,\dotsc,\left\vert{C_i}\right\vert$$
$$\sum\limits_{s = 1}^{\left\vert{C_i}\right\vert} x_{i,s,c} = 1\, \forall\, i \in I,\, c \in C_i$$
The assignment constraints will be enough to ensure our model generates feasible image construction schedules, but do nothing to guide the optimization of the model. For this purpose we introduce a second set of variables that depend on $x$ and allow us to reap some benefit when they are positive. Specifically, $y_{i,j,s,c} \ge 0\, \forall\, i < j \in I,\, s = 1,\dotsc,\left\vert{C_i \cap C_j}\right\vert$ will be $1$ if images $i < j \in I$ share a path up to stage $s$ and run the same command $c$ during stage $s$. This definition leads naturally to the following constraints.
$$y_{i,j,s,c} \leq x_{i,s,c}\, \forall\, i < j \in I,\, s = 1,\dotsc,\left\vert{C_i \cap C_j}\right\vert,\, c \in C_i \cap C_j$$
$$y_{i,j,s,c} \leq x_{j,s,c}\, \forall\, i < j \in I,\, s = 1,\dotsc,\left\vert{C_i \cap C_j}\right\vert,\, c \in C_i \cap C_j$$
$$\sum\limits_{c \in C_i \cap C_j} y_{i,j,s,c} \leq \sum\limits_{c \in C_i \cap C_j} y_{i,j,s-1,c}\, \forall\, i < j \in I,\, s = 2,\dotsc,\left\vert{C_i \cap C_j}\right\vert$$
Our $y$ variables allow us to realize benefit from sharing the Docker cache between images. Given our parameters $t_c$, which are the expected time for running command $c$, we can now define our model's objective.
$$\max z = \sum\limits_{i < j \in I}\sum\limits_{s = 1}^{\left\vert{C_i \cap C_j}\right\vert}\sum\limits_{c \in C_i \cap C_j} t_c y_{i,j,s,c}$$It is important to note that this objective is superadditive with respect to the number of images that share a path. That is, four images sharing the path $AB$ will yield a higher objective value than two pairs of images sharing the paths $AB$ and $BA$, respectively.
We combine the above sections into a full representation of our Docker Image Construction Problem (DICP).
$$ \begin{array}{r l} \max & z & = &\sum\limits_{i < j \in I}\sum\limits_{s = 1}^{\left\vert{C_i \cap C_j}\right\vert}\sum\limits_{c \in C_i \cap C_j} t_c y_{i,j,s,c} & & {\scriptsize\left(1\right)} \\ \text{s.t.} & & & \sum\limits_{c \in C_i} x_{i,s,c} = 1 & \forall\, i \in I,\, s = 1,\dotsc,\left\vert{C_i}\right\vert & {\scriptsize\left(2\right)} \\ & & & \sum\limits_{s = 1}^{\left\vert{C_i}\right\vert} x_{i,s,c} = 1 & \forall\, i \in I,\, c \in C_i & {\scriptsize\left(3\right)} \\ & & & y_{i,j,s,c} \leq x_{i,s,c} & \forall\, i < j \in I,\, s = 1,\dotsc,\left\vert{C_i \cap C_j}\right\vert,\, c \in C_i \cap C_j & {\scriptsize\left(4\right)} \\ & & & y_{i,j,s,c} \leq x_{j,s,c} & \forall\, i < j \in I,\, s = 1,\dotsc,\left\vert{C_i \cap C_j}\right\vert,\, c \in C_i \cap C_j & {\scriptsize\left(5\right)} \\ & & & \sum\limits_{c \in C_i \cap C_j} y_{i,j,s,c} \leq \sum\limits_{c \in C_i \cap C_j} y_{i,j,s-1,c} & \forall\, i < j \in I,\, s = 2,\dotsc,\left\vert{C_i \cap C_j}\right\vert & {\scriptsize\left(6\right)} \\ & & & x_{i,s,c} \in \{0,1\} & & {\scriptsize\left(7\right)} \\ & & & y_{i,j,s,c} \geq 0 & & {\scriptsize\left(8\right)} \\ \end{array} $$Given this model, ${\scriptsize\left(1\right)}$ is our objective function that maximizes use of the Docker cache with respect to compute time. ${\scriptsize\left(2\right)}$ assigns one command to each stage for each image in $I$, while ${\scriptsize\left(3\right)}$ ensures that each command is run once. ${\scriptsize\left(4\right)}$ and ${\scriptsize\left(5\right)}$ only allow paths to be shared if two images run the same command during a given stage. ${\scriptsize\left(6\right)}$ requires that paths be shared before a stage if they are to be shared during that stage.
${\scriptsize\left(7\right)}$ ensures $x$ is binary, which is required by the assignment portion of the model. ${\scriptsize\left(8\right)}$ allows $y$ to be continuous and nonnegative, which is acceptable for mathematical purposes. Practical implementations of the model may prefer to make $y$ a binary vector, though leaving it as continuous may have benefits for decomposition approaches.
Consider an example with three available commands, $C = \{A, B, C\}$, and two images, $C_1 = \{A, B\}$ and $C_2 = \{A, B, C\}$. Compute times for the command are $t_A = 20$, $t_B = 10$, and $t_C = 5$. We construct the model as shown in the previous section.
$$ \begin{array}{r l} \max & z & = & 20 y_{1,2,1,A} + 10 y_{1,2,1,B} + 20 y_{1,2,2,A} + 10 y_{1,2,2,B} \\ \text{s.t.} & & & x_{1,1,A} + x_{1,1,B} = 1 \\ & & & x_{1,2,A} + x_{1,2,B} = 1 \\ & & & x_{1,1,A} + x_{1,2,A} = 1 \\ & & & x_{1,1,B} + x_{1,2,B} = 1 \\ & & & x_{2,1,A} + x_{2,1,B} + x_{2,1,C} = 1 \\ & & & x_{2,2,A} + x_{2,2,B} + x_{2,2,C} = 1 \\ & & & x_{2,3,A} + x_{2,3,B} + x_{2,3,C} = 1 \\ & & & x_{2,1,A} + x_{2,2,A} + x_{2,3,A} = 1 \\ & & & x_{2,1,B} + x_{2,2,B} + x_{2,3,B} = 1 \\ & & & x_{2,1,C} + x_{2,2,C} + x_{2,3,C} = 1 \\ & & & y_{1,2,1,A} \le x_{1,1,A} \\ & & & y_{1,2,1,A} \le x_{2,1,A} \\ & & & y_{1,2,1,B} \le x_{1,1,B} \\ & & & y_{1,2,1,B} \le x_{2,1,B} \\ & & & y_{1,2,2,A} \le x_{1,2,A} \\ & & & y_{1,2,2,A} \le x_{2,2,A} \\ & & & y_{1,2,2,B} \le x_{1,2,B} \\ & & & y_{1,2,2,B} \le x_{2,2,B} \\ & & & y_{1,2,2,A} + y_{1,2,2,B} \le y_{1,2,1,A} + y_{1,2,1,B} \\ & & & x_{i,s,c} \in \{0,1\} \\ & & & y_{i,j,s,c} \ge 0 \\ \end{array} $$This model has two equivalent optimal solutions, yielding the image construction plans $\{AB, ABC\}$ and $\{BA, BAC\}$, each with objective value $z = 30$.
In [ ]: