Goal: To understand what Docker is and how it can be used with Jupyter notebooks for reproducible research.
Docker is technological tool that creates high performance, shareable, reproducible computational environments. Jupyter notebooks are tools for interactive analysis that interweave prose, code, and results. Together, Docker and Jupyter notebooks are best-of-breed methods to create research that is reproducible.
In [ ]:
#Imports for running this presentation live
from ipywidgets import interact, interactive
from IPython.display import clear_output, display, HTML, YouTubeVideo
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import cnames
from matplotlib import animation
%matplotlib inline
!docker info
!docker load -i busybox.dockerarchive.tar
Even though computers are often considered deterministic, computational software is a rapidly evolving and changing landscape. Libraries are constantly adding new features and fixing issues.
Image source: http://www.michaelogawa.com/research/storylines/
Even libraries with the strictest backwards-compatibility policies can change in significant ways.
Image source: http://www.bonkersworld.net/backwards-compatibility/
A reproducible computational environment has a sufficiently consistent state for the computational task at hand.
For example, this can consist of
Image source: https://www.youtube.com/watch?v=g1LgVfV5_ZQ
Image source: http://time-az.com/images/2014/02/20140203carjam.jpg
Linux container systems , like Docker, are new type of tool to easily build, ship, and run reproducible, binary applications.
It is "good enough" for a reproducible computational environment.
In this talk, we will introduce Docker from the perspective a scientific research software engineer. We will
Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.
In [ ]:
!docker run --rm busybox sh -c 'echo "Hello Docker World!"'
Docker works with images that consume minimal disk space, versioned, archiveable, and shareable. Executing applications in these images does not require dedicated resources and is high performance.
It works with containers as opposed to virtual machines (VM's).
In [ ]:
%time !docker run --rm busybox sh -c 'echo "Hello Docker World!"'
A Docker container is similar to a running an application in a chroot, but it sandboxes processes and the network stack with Linux kernel:
In [ ]:
!docker search itk
docker <subcommand>
docker push
, docker pull
, docker tag
docker export
will create a archiveable tarball of an image's filesystem.
In [ ]:
!docker images
In [ ]:
!docker ps
In [ ]:
!docker run -d busybox sh -c 'sleep 3'
In [ ]:
!docker ps
In [ ]:
!docker ps -a
In [ ]:
!mkdir -p docker-ls-data
!cp $PWD/Data/*.png docker-ls-data/
In [ ]:
%%writefile docker-ls-data/Dockerfile
FROM busybox
MAINTAINER Matt McCormick <matt.mccormick@kitware.com>
RUN mkdir -p /Data
ADD *.png /Data/
VOLUME /Data
CMD ["/bin/sh", "-c", "ls /Data"]
In [ ]:
!docker build -t ls-data ./docker-ls-data
In [ ]:
!docker run --rm ls-data
A portable Docker image will only assume standard CPU/memory/disk/network resources are available. If local USB devices and video card devices are used the images will not be runnable anywhere.