Bare Requirements

To run these examples, you need the following in a Linux(-like) environment.

  • git
  • make
  • vi, or your favorite editor to edit .env file and Makefile
  • Anaconda python3 with ability to start Jupyter notebooks that can render plots

On a standard Linux environment, the first 3 are likely already there.
Instructions on getting Anaconda installed, can be found here: https://www.continuum.io/downloads

reveal.js

To create the slides, I also use reveal.js. You can download the latest version, extract the contents and move it into a folder called notebooks/reveal.js.

Getting the Code

The code is hosted on GitHub. By executing the following command:


> git clone https://github.com/gsentveld/lunch_and_learn

the project will be cloned into a new sub directory called lunch_and_learn

Please note that I will use this project for future sessions, so content may change.

Create your own .env file

> cd lunch_and_learn
In this folder, create a file called .env GitHub
The contents of this file should look like:

# Environment variables go here, can be read by `python-dotenv` package:
#
# DO NOT ADD THIS FILE TO VERSION CONTROL! (It is added to the .gitignore)

PROJECT_NAME=lunch_and_learn
PYTHON_INTERPRETER=python3

BASE_URL=https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2015/

EXTERNAL_DATA_DIR=/data/external
RAW_DATA_DIR=/data/raw
INTERIM_DATA_DIR=/data/interim
PROCESSED_DATA_DIR=/data/processed
FILES="fmlydisb funcdisb familyxx househld injpoiep personsx samadult samchild paradata cancerxx"

Starting the Notebook server

> cd lunch_and_learn/notebooks > jupyter notebook --no-browser --ip='*' --port=8888 --NotebookApp.token='lunchandlearn'
(feel free to set your own password of course!)

Then go to a browser and navigate to http://localhost:8888 and use the password: lunchandlearn.

Using the command line

Besides using the Notebook interface to explore the data interactively, you will get to a point where you want to be able to run things in batch.
After cloning the code, do the following the learn what is already possible on the command line.

> cd lunch_and_learn
> make

This returns something like:

Available rules:

$(CSV_FILES)        Get the CSV files extracted from the zipfiles
$(SAVED_ZIP_FILES)  Get the zipfiles from the CDC website
clean               Delete all compiled Python files
create_environment  Set up python interpreter environment
data                Make Dataset
requirements        Install Python Dependencies
test_environment    Test python environment is setup correctly
Then install the python libraries required:
> make requirements

Virtual Environment

If you want to run in a virtual_environment, then first run:

> make create_environment
Then activate the environment
> source activate lunch_and_learn
and then
> make requirements

Either way, 'requirements' is a prerequisite of all the other make targets, so it will be run at least once.

Using Docker Anaconda Images

Anaconda can also run on Docker. The Docker images are fully installed and ready to run Python 3 and Jupyter.
The downside is that it adds another layer of technology that you need to learn.

Get the Anaconda image

For example, we can use the continuumio/anaconda3 image, which can be pulled from the Docker repository:

 docker pull continuumio/anaconda3 

Next, we can run the Anaconda image with Docker and start an interactive shell:

 docker run -i -t continuumio/anaconda3 /bin/bash

Once the Docker container is running, we can start an interactive Python shell, install additional conda packages or run Python applications.

Installing a few extra things that you will need

With the fresh Docker container running, you will see a prompt like:

 root@495c58dfeaa6# 
At that promtp, do the following to install make and vi (you can skip the latter if you like another editor better.)
root@495c58dfeaa6# apt-get install make
root@495c58dfeaa6# apt-get install vim

Running as root is not a good practice. We add a user you want to work as (Changing gsentveld with whatever you want of course).

root@495c58dfeaa6# useradd -m gsentveld
Then change the password.
root@495c58dfeaa6# passwd gsentveld
root@495c58dfeaa6# exit

Save environment changes in your docker image!

When you exit a docker container, any changes in the image are lost and forgotten the next time you start the container. In order to save the status of your container in you image, do the following:

docker ps -l
This should show something like:

$ docker ps -l
CONTAINER ID IMAGE                 COMMAND     CREATED         STATUS        PORTS              NAMES
495c58dfeaa6 continuumio/anaconda3 "/bin/bash" 15 minutes ago  Up 15 minutes 6006/tcp, 8888/tcp xenodochial_curran
Copy that Container ID and do the following (you commit the changes and give it a new name):
docker commit 495c58dfeaa6 lunchandlearn

Install Jupyter and the basic requirements of the CookieCutter environment

docker run -i -t lunchandlearn /bin/bash

As root, run the following:

Alternatively, we can start a Jupyter Notebook server with Anaconda from a Docker image:

 docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"

You can then view the Jupyter Notebook by opening http://localhost:8888 in your browser, or http://<DOCKER-MACHINE-IP>:8888 if you are using a Docker Machine VM.

Once you are inside of the running notebook, you can import libraries from Anaconda, perform interactive computations and visualize your data.