Running Docker Containers in the Cloud

Professor Robert J. Brunner

</DIV>

Introduction

Assuming you have already completed the steps for downloading the Google Cloud SDK Docker image and Registering to use Google Compute Engine as outlined in the Week 13 README, we can now proceed to learning how to use the Google Compute Engine to run our class Docker containers. There are two concurrent processes to keep track of in the rest of this Notebook. The first set of processes take place in a Web browser where you register to use the Google Compute Engine and subsequently create, monitor, and delete Virtual Instances. The second set of processes take place locally in your detached Google Cloud SDK Docker container.

Before proceeding, start Boot2Docker and at the Boot2Docker prompt, enter

docker ps -a | grep gcloud

If this command returns nothing, you should try executing this command to create and initialize the Google Cloud SDK Docker container:

docker run -it --name gcloud-config google/cloud-sdk gcloud auth login

After running this command, you should be able to see an exited Docker container.

bash-3.2$ docker ps -a | grep gcloud
7db61cac9698        google/cloud-sdk:latest   "gcloud auth login"    52 seconds ago      
    Exited (0) 17 seconds ago                            gcloud-config

If this previous step fails, review the steps listed in the Week 8 README.

On the other hand, if you have one or more containers listed (most of them may have Exited) you should be ready to execute Docker commands to interact with Google Compute Engine. First, however, you might want to update the Google Cloud SDK Docker container by issuing the following command:

docker run -it --volumes-from gcloud-config  google/cloud-sdk \
    gcloud components update

This command may return a message that All components are up to date, or alternatively you may be presented with a list of components along with a prompt asking if you wish to continue. Enter Y for yes and allow the updates to be downloaded and installed. Once this process is completed you should proceed to registering for access to Google Compute Engine.

Google Compute Engine Registration

To start the first set of processes, you must register to use the Google Compute Engine. Normally this step requires payment since you are actively using compute resources owned by Google. However, we will sign up for the free sixty-day trial period. The first step is to vist Google Compute Engine in a Web browser (click the link and you should be taken to the website). You will first need to provide your Google credentials, which might require a two-phase verification (for example, Google may send a verification code to your cell phone).

After you have authenticated, you will need to create a new project. You can do this by selecting the Create a new project item from the drop down menu in the center of the page and entering a suitable name for you project (for example, RP Data Science 2015). Your Web page should resemble the following screenshot:

When you have completed this page, copy the Project ID (circled in red in the above screenshot, but your id will be different) to a text file for later use, select the checkbox stating that you agree to the terms of service, and click the Create button. A message window showing the Google Compute Engine Activities may appear as the new project is created.

Once the project has been created, we need to enter personal information to complete the Project creation (note this will only need to be done for the first project. To enter this information, select the Billing & settings item under the Projects tabe on the left-hand side of the web page (shown in the following image).

Even though this is a free trail, Google will collect your personal information and a Credit Card to (according to Google) verify that there is a person completing the form. Multiple assurances are given that no billing will be applied to your account without your express agreement. If you use all of your free resources, your account will be locked until you provide payment. After selecting Billing & services you will be presented with a new Web page that provides this assurance, your project information, and a button entitled Sign up for a free trail, an example page is shown below.

Click on the free trail button, and enter your personal information. Your account type will most likely be Individual (for this class at least). Once you have completed the account sign-up phase, you will need to click on the Accept and start free trail button.

At this point, you should switch back to your Boot2Docker prompt and enter the following Docker command to assign your project id to your Google Cloud SDK Docker container. For example, my project-id is proven-reporter-88419 so I used the following Docker command:

docker run -it --volumes-from gcloud-config  google/cloud-sdk \
    gcloud config set project proven-reporter-88419

You will need to use your project-id as indicated by Google on your project Web page.

Enabling Google Compute Engine

You should now switch back to the web-based Google Developers Console to finish setting up your Google Compute Engine. First, select APIs under the APIs & authmenu item on the left hand side of the Webpage. You will be presented with a list of APIs with sliders indicating the status of the API. Scroll down to the Google Compute Engine slider, which you click on to change from Off to On, this should open a new Webpage as shown in the following screenshot.

Now the Google Compute Engine has been enabled, we can create a virtual machine to run our Docker container. On the left hand menu, select VM instances under the Compute menu item. This will allow you to either Create instance or Take the quickstart as shown below.

Click on Create instance, which will open a long form in which you can customize your new Virtual Machine instance. You will want to enter an instance name (for example, rppds-15), select Allow HTTP Traffic, select the us-central1-a zone, a standard n1-standard-1 Machine Type, and ubuntu-1404-trusty Image, as shown in the following screenshot.

Once you have filled out the form, click the Create button. You may be presented with an Activities window similar to the one shown below. This simply provides an update to the status of creating your new virtual machine.

After your new virtual machine has been created, you will be presented with a Google Compute Engine dashboard that provides an overview of your new virtual machine. At the moment, as shown below, your dashboard is fairly empty, but this will change as you begin to use your new virtual machine to run a Docker container.

Circled in red on the dashboard image above are two key pieces of information you need to retain, the project name and the external IP address of your virtual machine (this will change if you stop and restart your virtual machine).

At this point, you can return to your Google Cloud SDK Docker container and test the connection to your new Google Compute Engine by entering the following command (shown below with my output).

bash-3.2$ docker run --rm -ti --volumes-from gcloud-config google/cloud-sdk 
    gcutil listinstances
+----------+---------------+---------+----------------+-----------------+
| name     | zone          | status  | network-ip     | external-ip     |
+----------+---------------+---------+----------------+-----------------+
| rppds-15 | us-central1-a | RUNNING | 10.240.225.126 | 130.211.153.108 |
+----------+---------------+---------+----------------+-----------------+

Connecting to Google Compute Engine

You now have a virtual machine running in Google Compute Engine, and the ability to connect to this virtual machine from your Google Cloud SDK Docker container. The next step is to actually connect by using ssh with the Google Docker container. This is done by issuing an ssh command from the Google Cloud SDK Docker container:

docker run --rm -ti --volumes-from gcloud-config google/cloud-sdk \
    gcloud compute ssh rppds-15 --zone us-central1-a

You will be prompted for a Pass Phrase, you can safely hit enter twice without entering anything, and after a period of time when the connection is established, you will eventually be given a prompt in your new virtual machine running in Google Compute Engine.

This virtual machine is running the Ubuntu operating system (since that is what we specified during the creation of the virtual machine). Thus the first step should be to update the package list, which you do by entering sudo apt-get update at the vm shell prompt. After this you should apply any available software updates, which is doen by entering sudo apt-get -y upgrade at the VM prompt. We specified the -y flag to automatically answer yes to any queries since we will want to apply any upgrades. Finally, we should apply any distribution upgrades, which is done by entering sudo apt-get -y dist-upgrade.

Once the update and upgrade process is completed you need to download the Docker tools for Ubuntu by entering sudo apt-get -y install docker.io at the vm prompt. During this process, the apt-get program would normally ask if you want to continue and download the necessary files, but since we used the -y flag we apt-get will automatically download all necessary files. The next step is ensure tab completion, which is done by executing a shell script: source /etc/bash_completion.d/docker.io.

Finally, we need to ensure we have the latest docker components by processing a web-accesible file on the docker.com website. The easiest technique for doing this is to use the curl program, which is similar to wget, except in this case we will download the file and pass it directly into a Unix shell for immediate processing; the format for this command is curl -sSL https://get.docker.com/ubuntu/ | sudo sh.

To summarize, we need to execute the following commands in order:

sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y dist-upgrade
sudo apt-get -y install docker.io
source /etc/bash_completion.d/docker.io
curl -sSL https://get.docker.com/ubuntu/ | sudo sh

After these steps have all been completed, you will now have an Ubuntu virtual machine that is ready to run Docker containers.

Running Docker containers in Google Compute Engine

You now have a working Ubuntu virtual machine that is ready to run Docker commands. Since you are no longer using Boot2Docker, you would normally need to enter sudo before any docker commands. However, since we are the root user in this Ubuntu vm (because we did not create any user accounts) we can enter Docker commands in the same manner as we have at the Boot2Docker shell.

The first step is to download our course Docker image (alternatively, you could also download the Hadoop container) by entering docker pull lcdm/info490 at the vm prompt. Next, we need to make a new directory that will contain our course material. In keeping with tradition, we will call the directory i2ds, thus you need to enter mkdir i2ds followed by cd i2ds. Now we can clone the course git repository so that we can easily run the IPython Notebooks for the course. We do this by entering git clone https://github.com/INFO490/spring2015.

Now we are ready to run a Docker IPython Notebook server. We can now change the default port mapping such that we can use the standard web port 80 instead of the special port 8888. We also need to map our shared folder into our Notebook server. In the end, our docker run command takes the full form: docker run -d -p 80:8888 -e "PASSWORD=temp123" -v /root/i2ds:/notebooks/i2ds lcdm/info490. At this point, you are running our course Docker image inside the Google Compute Engine virtual machine as shown in the following screenshot.

To summarize, the following steps are required to successfully run our course Docker container to enable Web-accesible IPython Notebooks:

docker pull lcdm/info490
`mkdir i2ds
cd i2ds
git clone https://github.com/INFO490/spring2015
`docker run -d -p 80:8888 -e "PASSWORD=temp123" \
```
 -v /root/i2ds:/notebooks/i2ds lcdm/info490`
```

After these steps have been executed and the Docker container is running, you can open a Web browser to view the Google Compute Engine hosted course website. The URL for this server was the external IP address that was listed on your Google Compute Engine Dashboard and is also available by using the Google Cloud SDK Docker container to list the instances you have running on the Google Compute Engine. This second options can be done on your host machine by entering the following command at a Boot2docker prompt:

docker run --rm -ti --volumes-from gcloud-config google/cloud-sdk \
    gcutil listinstances

In the case outlined in this Notebook, the external IP address is 130.211.153.108. Entering this address in your web browser will allow you to access the IPython Notebook server I have running on Google Compute Engine, which can be seen in the following screenshot that was available after entering the password used to start the server.

You can use this Notebook server in the same manner as you have been using the course IPython Notebook server throughout class, only this server is now running in the cloud. You can connect to the running server via docker exec or when needed shut the server down with docker stop. Finally, do note that leaving a virtual machine running can consume resources; in the next section we review how to shut down your virtual machine in order to reduce your resource consumption.

Shutting down Google Compute Engine

The last step should be to reclaim any resources you have running. If this virtual machine was running a server for a business, you likely would want to leave it running since this is now a publicly accesible site. However, for a personal server that is being used solely for development purposes it is best to shut the server down gracefully in order to prevent your limited resources from being exhausted (especially with a weak password that, in the case of this notebook, is also available online!).

To shut down the virtual machine, simply open the Google Compute Engine developer console, select your project from the list displayed, and click on VM instances in the left hand menu. Select the running rppds-15 virtual machine at the bottom of the page and next click on Stop as shown below:

You will be given a prompt asking you to confirm this decision to stop the virtual machine, and after selecting Stop your virtual machine, and running IPython Notebook server will no longer be running.

Once a virtual machine has been stopped, you can restart it by clicking the Start button (you may have to first select the appropriate virtual machine if it is not already selected). The IPython Notebook server will not restart automatically (we did not set it up to do so, although we could). Thus you will need to ssh into the Google Compute Engine virtual machine and issue the docker run command to start the IPython Notebook server.

You also can delete a virtual machine, but only do that when you know you will no longer need the virtual machine.