Efficient Data Analysis with the IPython Notebook

Objectives

  • Become familiar with the IPython Notebook.
  • Introduce the IPython landscape.
  • Getting started with exploratory data analysis in Python
  • Conducting reproducible data analyis and computing experiments

  • How do you currently:

    • wrangle data?
    • visualize results?
    • Analysis: machine learning, stats
    • Parallel computing
    • Big data

What is Python?

Python is a general-purpose programming language that blends procedural, functional, and object-oriented paradigms

Mark Lutz, Learning Python

  • Simple, clean syntax
  • Easy to learn
  • Interpreted
  • Strong, dynamically typed
  • Runs everywhere: Linux, Mac, and Windows
  • Free and open
  • Expressive: do more with fewer lines of code
  • Lean: modules
  • Options: Procedural, object-oriented, and functional.

Abstractions

  • Python provides high-level abstraction
  • Performance can be on par with compiled code if right approach is used

IPython and the IPython Notebook

IPython

  • Platform for interactive computing
  • Shell or browser-based notebook
  • Project Jupyter
    • Language independent notebook
    • Can be used with R, Julia, bash ...

IPython Notebook

http://blog.fperez.org/2012/01/ipython-notebook-historical.html

Interactive web-based computing, data analysis, and documentation.

  • One document for code and output
  • Run locally and remote
  • Document process
  • Share results

Integrate Code and Documentation

  • Data structure ouput
  • Inline plots
  • Conversation sytle programming (Literate programming)
  • Telling a data story
  • Great for iterative programming.

    • Data analysis
    • Quick scripts
    • Prototyping
  • 2 type of cells:

    • Markdown for documentation
    • Code for execution programs

In [1]:
2+4


Out[1]:
6

In [2]:
print("hello")


hello

In [3]:
print("Hello world!")


Hello world!

Locally and Remote

Documentation and Sharing

Keyboard Shortcuts

Markdown and LaTeX

  • Markdown
  • Latex $y = \sqrt{a + b}$

Images

<img src='https://s3.amazonaws.com/research_computing_tutorials/monty-python.png' width="300">

This is an image:

Embeded Plots


In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randn(10000)
print(x)


[ 2.21814171 -1.3200109  -0.32797836 ..., -0.80605013 -0.48163981
 -0.38008204]

Plot a Histogram of x


In [5]:
plt.hist(x, bins=50)
plt.show()


Customizable

  • Custom CSS
  • Custom javascript libraries
  • Create your own output format.
  • Tools and workflow

Magic Commands

  • Built-in useful functions
  • % line commands
  • %% cell commands

In [6]:
%lsmagic


Out[6]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

In [7]:
%timeit y = np.random.randn(100000)


100 loops, best of 3: 3.71 ms per loop

In [8]:
%ll


total 6172
-rw-r--r-- 1 tom   30618 Jun 24 09:10 01_introduction-IPython-notebook.ipynb
-rw-r--r-- 1 tom  138029 Jun 24 09:09 01_introduction-IPython-notebook.pdf
drwxr-xr-x 6 tom     204 Jun 24 09:09 01_introduction-IPython-notebook_files/
-rw-r--r-- 1 tom  257949 Jun 24 15:37 01walrus_animation.mp4
-rw-r--r-- 1 tom  795157 Jun 23 16:35 02_Data_Transfer.pdf
-rw-r--r-- 1 tom  281098 Jun 24 15:42 02walrus_animation.mp4
-rw-r--r-- 1 tom  177190 Jun 24 09:46 03_HPC_File_Systems.pdf
-rw-r--r-- 1 tom  117870 Jun 24 11:08 04a_exploratory_pandas.ipynb
-rw-r--r-- 1 tom  164554 Jun 24 09:45 04a_exploratory_pandas.pdf
drwxr-xr-x 6 tom     204 Jun 24 09:45 04a_exploratory_pandas_files/
-rw-r--r-- 1 tom  696878 Jun 24 11:24 04b_python-map-plotting.ipynb
-rw-r--r-- 1 tom  473051 Jun 24 09:46 04b_python-map-plotting.pdf
drwxr-xr-x 6 tom     204 Jun 24 09:46 04b_python-map-plotting_files/
-rw-r--r-- 1 tom  213926 Jun 24 05:04 05_Data_Conversion_Cleaning.pdf
-rw-r--r-- 1 tom    4709 Jun 24 05:04 06_CSV_to_NetCDF_Exercise.ipynb
-rw-r--r-- 1 tom   11436 Jun 26 09:09 06_CSV_to_NetCDF_Solution.ipynb
-rw-r--r-- 1 tom 1735152 Jun 24 15:43 07_python-matplotlib.ipynb
lrwxr-xr-x 1 tom      77 Jun 23 05:51 Walrus_Data -> /Users/tom/Google Drive/Grants/2014_USGS/2015-06-23-Visualization/Walrus_Data/
-rw-r--r-- 1 tom   87723 Jun 23 14:10 basic_animation.mp4
-rw-r--r-- 1 tom  191347 Jun 23 05:17 data_overview.png
-rwxr-xr-x 1 tom    9220 Jun 24 09:08 example*
-rw-r--r-- 1 tom      82 Jun 24 09:07 example.cpp
-rw-r--r-- 1 tom  230492 Jun 23 05:17 ipython-notebook-keyboard.png
-rw-r--r-- 1 tom  172569 Jun 23 05:17 ipython-notebook-sharing.png
-rw-r--r-- 1 tom   77163 Jun 23 05:17 ipython-notebook.png
-rw-r--r-- 1 tom    9674 Jun 23 05:42 rc_logo.png
-rw-r--r-- 1 tom   72981 Jun 23 05:17 traditional_python.png
-rw-r--r-- 1 tom  262061 Jun 24 13:13 walrus_animation.mp4
-rw-r--r-- 1 tom   17607 Jun 24 11:01 walrus_behav.pdf
-rw-r--r-- 1 tom   31545 Jun 24 11:05 walrus_behav.png

Other Languages: Bash


In [9]:
%%bash
ls -l


total 6172
-rw-r--r-- 1 tom staff   30618 Jun 24 09:10 01_introduction-IPython-notebook.ipynb
-rw-r--r-- 1 tom staff  138029 Jun 24 09:09 01_introduction-IPython-notebook.pdf
drwxr-xr-x 6 tom staff     204 Jun 24 09:09 01_introduction-IPython-notebook_files
-rw-r--r-- 1 tom staff  257949 Jun 24 15:37 01walrus_animation.mp4
-rw-r--r-- 1 tom staff  795157 Jun 23 16:35 02_Data_Transfer.pdf
-rw-r--r-- 1 tom staff  281098 Jun 24 15:42 02walrus_animation.mp4
-rw-r--r-- 1 tom staff  177190 Jun 24 09:46 03_HPC_File_Systems.pdf
-rw-r--r-- 1 tom staff  117870 Jun 24 11:08 04a_exploratory_pandas.ipynb
-rw-r--r-- 1 tom staff  164554 Jun 24 09:45 04a_exploratory_pandas.pdf
drwxr-xr-x 6 tom staff     204 Jun 24 09:45 04a_exploratory_pandas_files
-rw-r--r-- 1 tom staff  696878 Jun 24 11:24 04b_python-map-plotting.ipynb
-rw-r--r-- 1 tom staff  473051 Jun 24 09:46 04b_python-map-plotting.pdf
drwxr-xr-x 6 tom staff     204 Jun 24 09:46 04b_python-map-plotting_files
-rw-r--r-- 1 tom staff  213926 Jun 24 05:04 05_Data_Conversion_Cleaning.pdf
-rw-r--r-- 1 tom staff    4709 Jun 24 05:04 06_CSV_to_NetCDF_Exercise.ipynb
-rw-r--r-- 1 tom staff   11436 Jun 26 09:09 06_CSV_to_NetCDF_Solution.ipynb
-rw-r--r-- 1 tom staff 1735152 Jun 24 15:43 07_python-matplotlib.ipynb
lrwxr-xr-x 1 tom staff      77 Jun 23 05:51 Walrus_Data -> /Users/tom/Google Drive/Grants/2014_USGS/2015-06-23-Visualization/Walrus_Data
-rw-r--r-- 1 tom staff   87723 Jun 23 14:10 basic_animation.mp4
-rw-r--r-- 1 tom staff  191347 Jun 23 05:17 data_overview.png
-rwxr-xr-x 1 tom staff    9220 Jun 24 09:08 example
-rw-r--r-- 1 tom staff      82 Jun 24 09:07 example.cpp
-rw-r--r-- 1 tom staff  230492 Jun 23 05:17 ipython-notebook-keyboard.png
-rw-r--r-- 1 tom staff  172569 Jun 23 05:17 ipython-notebook-sharing.png
-rw-r--r-- 1 tom staff   77163 Jun 23 05:17 ipython-notebook.png
-rw-r--r-- 1 tom staff    9674 Jun 23 05:42 rc_logo.png
-rw-r--r-- 1 tom staff   72981 Jun 23 05:17 traditional_python.png
-rw-r--r-- 1 tom staff  262061 Jun 24 13:13 walrus_animation.mp4
-rw-r--r-- 1 tom staff   17607 Jun 24 11:01 walrus_behav.pdf
-rw-r--r-- 1 tom staff   31545 Jun 24 11:05 walrus_behav.png

In [10]:
files = !ls # But glob is a better way
print files[:5]


['01_introduction-IPython-notebook.ipynb', '01_introduction-IPython-notebook.pdf', '01_introduction-IPython-notebook_files', '01walrus_animation.mp4', '02_Data_Transfer.pdf']

Keep it all together


In [11]:
%%writefile example.cpp
#include <iostream>

int main(){
    std::cout << "hello from c++" << std::endl;
}


Overwriting example.cpp

In [12]:
%ls


01_introduction-IPython-notebook.ipynb	 06_CSV_to_NetCDF_Solution.ipynb
01_introduction-IPython-notebook.pdf	 07_python-matplotlib.ipynb
01_introduction-IPython-notebook_files/  Walrus_Data@
01walrus_animation.mp4			 basic_animation.mp4
02_Data_Transfer.pdf			 data_overview.png
02walrus_animation.mp4			 example*
03_HPC_File_Systems.pdf			 example.cpp
04a_exploratory_pandas.ipynb		 ipython-notebook-keyboard.png
04a_exploratory_pandas.pdf		 ipython-notebook-sharing.png
04a_exploratory_pandas_files/		 ipython-notebook.png
04b_python-map-plotting.ipynb		 rc_logo.png
04b_python-map-plotting.pdf		 traditional_python.png
04b_python-map-plotting_files/		 walrus_animation.mp4
05_Data_Conversion_Cleaning.pdf		 walrus_behav.pdf
06_CSV_to_NetCDF_Exercise.ipynb		 walrus_behav.png

In [13]:
%%bash
g++ example.cpp -o example
./example


hello from c++

In [ ]:

NBconvert examples

  • HTML
  • PDF (print) - you have to have LaTex installed
  • Slides
  • Dynamic Slides
  • ReStructured Text (sphinx)

In [14]:
!ipython nbconvert --to 'PDF' 01_introduction-IPython-notebook.ipynb


[NbConvertApp] Converting notebook 01_introduction-IPython-notebook.ipynb to pdf
[NbConvertApp] Support files will be in 01_introduction-IPython-notebook_files/
[NbConvertApp] Making directory 01_introduction-IPython-notebook_files
[NbConvertApp] Writing 27285 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running pdflatex 3 times: [u'pdflatex', u'notebook.tex']
[NbConvertApp] PDF successfully created
[NbConvertApp] Support files will be in 01_introduction-IPython-notebook_files/
[NbConvertApp] Making directory 01_introduction-IPython-notebook_files
[NbConvertApp] Writing 137987 bytes to 01_introduction-IPython-notebook.pdf

In [15]:
!open 01_introduction-IPython-notebook.pdf

In [ ]: