To run these examples, you need the following in a Linux(-like) environment.
On a standard Linux environment, the first 3 are likely already there.
Instructions on getting Anaconda installed, can be found here: https://www.continuum.io/downloads
To create the slides, I also use reveal.js. You can download the latest version, extract the contents and move it into a folder called notebooks/reveal.js
.
> cd lunch_and_learn
In this folder, create a file called .env
GitHub
The contents of this file should look like:
# Environment variables go here, can be read by `python-dotenv` package:
#
# DO NOT ADD THIS FILE TO VERSION CONTROL! (It is added to the .gitignore)
PROJECT_NAME=lunch_and_learn
PYTHON_INTERPRETER=python3
BASE_URL=https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2015/
EXTERNAL_DATA_DIR=/data/external
RAW_DATA_DIR=/data/raw
INTERIM_DATA_DIR=/data/interim
PROCESSED_DATA_DIR=/data/processed
FILES="fmlydisb funcdisb familyxx househld injpoiep personsx samadult samchild paradata cancerxx"
> cd lunch_and_learn/notebooks
> jupyter notebook --no-browser --ip='*' --port=8888
--NotebookApp.token='lunchandlearn'
(feel free to set your own password of course!)
Then go to a browser and navigate to http://localhost:8888 and use the password: lunchandlearn
.
Besides using the Notebook interface to explore the data interactively, you will get to a point where you want to be able to run things in batch.
After cloning the code, do the following the learn what is already possible on the command line.
> cd lunch_and_learn
> make
This returns something like:
Available rules:
$(CSV_FILES) Get the CSV files extracted from the zipfiles
$(SAVED_ZIP_FILES) Get the zipfiles from the CDC website
clean Delete all compiled Python files
create_environment Set up python interpreter environment
data Make Dataset
requirements Install Python Dependencies
test_environment Test python environment is setup correctly
Then install the python libraries required:
> make requirements
If you want to run in a virtual_environment, then first run:
> make create_environment
Then activate the environment
> source activate lunch_and_learn
and then
> make requirements
Either way, 'requirements' is a prerequisite of all the other make targets, so it will be run at least once.
https://docs.docker.com/toolbox/toolbox_install_windows/#step-2-install-docker-toolbox
For example, we can use the continuumio/anaconda3
image, which can be pulled from the Docker repository:
docker pull continuumio/anaconda3
Next, we can run the Anaconda image with Docker and start an interactive shell:
docker run -i -t continuumio/anaconda3 /bin/bash
Once the Docker container is running, we can start an interactive Python shell, install additional conda packages or run Python applications.
With the fresh Docker container running, you will see a prompt like:
root@495c58dfeaa6#
At that promtp, do the following to install make
and vi
(you can skip the latter if you like another editor better.)
root@495c58dfeaa6# apt-get install make
root@495c58dfeaa6# apt-get install vim
Running as root is not a good practice. We add a user you want to work as (Changing gsentveld with whatever you want of course).
root@495c58dfeaa6# useradd -m gsentveld
Then change the password.
root@495c58dfeaa6# passwd gsentveld
root@495c58dfeaa6# exit
When you exit a docker container, any changes in the image are lost and forgotten the next time you start the container. In order to save the status of your container in you image, do the following:
docker ps -l
This should show something like:
$ docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
495c58dfeaa6 continuumio/anaconda3 "/bin/bash" 15 minutes ago Up 15 minutes 6006/tcp, 8888/tcp xenodochial_curran
Copy that Container ID and do the following (you commit the changes and give it a new name):
docker commit 495c58dfeaa6 lunchandlearn
Alternatively, we can start a Jupyter Notebook server with Anaconda from a Docker image:
docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser"
You can then view the Jupyter Notebook by opening http://localhost:8888
in your browser, or http://<DOCKER-MACHINE-IP>:8888
if you are using a Docker Machine VM.
Once you are inside of the running notebook, you can import libraries from Anaconda, perform interactive computations and visualize your data.