Data Analysis and Visualization with the IPython Notebook

Materials from Monte Lunacek and Thomas Hauser tutorials

Objectives

  • Become familiar with the IPython Notebook.
  • Introduce the IPython landscape.
  • Getting started with visualization and data analysis in Python
  • Conducting reproducible data analysis, visualization and computing experiments

  • How do you currently:

    • wrangle data?
    • visualize results?
    • Analysis: machine learning, stats
    • Parallel computing
    • Big data

What is Python?

Python is a general-purpose programming language that blends procedural, functional, and object-oriented paradigms

Mark Lutz, Learning Python

  • Simple, clean syntax
  • Easy to learn
  • Interpreted
  • Strong, dynamically typed
  • Runs everywhere: Linux, Mac, and Windows
  • Free and open
  • Expressive: do more with fewer lines of code
  • Lean: modules
  • Options: Procedural, object-oriented, and functional.

Abstractions

  • Python provides high-level abstraction
  • Performance can be on par with compiled code if right approach is used

IPython and the Jupyter Notebook

IPython

  • Platform for interactive computing
  • Shell or browser-based notebook
  • Project Jupyter: https://jupyter.org
    • Language independent notebook
    • Can be used with R, Julia, bash ...

Jupyter IPython Notebook

http://blog.fperez.org/2012/01/ipython-notebook-historical.html

Interactive web-based computing, data analysis, and documentation.

  • One document for code and output
  • Run locally and remote
  • Document process
  • Share results

Integrate Code and Documentation

  • Data structure ouput
  • Inline plots
  • Conversation sytle programming (Literate programming)
  • Telling a data story
  • Great for iterative programming.

    • Data analysis
    • Quick scripts
    • Prototyping
  • 2 type of cells:

    • Markdown for documentation
      • Markdown can contain LaTeX for equations
    • Code for execution programs

Markdown

Here is a formula:

$f(x,y) = x^2 + e^x$

Images

<img src='https://s3.amazonaws.com/research_computing_tutorials/monty-python.png' width="300">

This is an image:

Code


In [1]:
2+4


Out[1]:
6

In [2]:
print("hello")


hello

In [3]:
a=2
print("Hello world!")


Hello world!

Locally and Remote

Documentation and Sharing

Keyboard Shortcuts

Embeded Plots


In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(10000)
print(x)


[-1.96081537  0.51098991 -0.03038128 ...,  2.67338147  0.45963294
 -0.80157216]

Plot a Histogram of x


In [5]:
plt.hist(x, bins=50)
plt.show()


Notebooks can be customized

  • Custom CSS
  • Custom javascript libraries
  • Create your own output format.
  • Tools and workflow

Magic Commands

  • Built-in useful functions
  • % line commands
  • %% cell commands

In [6]:
%lsmagic


Out[6]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

In [7]:
%timeit y = np.random.randn(100000)


100 loops, best of 3: 3.81 ms per loop

In [8]:
%ls


01_inclass-notebook.ipynb		03_inclass_matplotlib.ipynb
01_introduction-IPython-notebook.ipynb	03_intro_matplotlib.ipynb
02_inclass_python.ipynb			04_effective_visualizations.ipynb
02_python_overview.ipynb

Other Languages: Bash


In [9]:
%%bash
ls -l


total 360
-rw-r--r-- 1 tom staff   3165 Jan 19 15:18 01_inclass-notebook.ipynb
-rw-r--r-- 1 tom staff  30074 Jan 19 15:23 01_introduction-IPython-notebook.ipynb
-rw-r--r-- 1 tom staff   2013 Jan 19 15:18 02_inclass_python.ipynb
-rw-r--r-- 1 tom staff   7169 Jan 19 15:18 02_python_overview.ipynb
-rw-r--r-- 1 tom staff   1545 Jan 19 15:18 03_inclass_matplotlib.ipynb
-rw-r--r-- 1 tom staff 201672 Jan 19 15:18 03_intro_matplotlib.ipynb
-rw-r--r-- 1 tom staff 109122 Jan 19 15:18 04_effective_visualizations.ipynb

In [10]:
files = !ls # But glob is a better way
print files[:5]


['01_inclass-notebook.ipynb', '01_introduction-IPython-notebook.ipynb', '02_inclass_python.ipynb', '02_python_overview.ipynb', '03_inclass_matplotlib.ipynb']

Keep it all together


In [11]:
%%writefile example.cpp
#include <iostream>

int main(){
    std::cout << "hello from c++" << std::endl;
}


Writing example.cpp

In [12]:
%ls example.cpp


example.cpp

In [13]:
%%bash
g++ example.cpp -o example
./example


hello from c++

NBconvert examples

  • HTML
  • PDF (print) - you have to have LaTex installed
  • Slides
  • Dynamic Slides
  • ReStructured Text (sphinx)

In [14]:
!ipython nbconvert --to 'PDF' 01_introduction-IPython-notebook.ipynb


[NbConvertApp] Converting notebook 01_introduction-IPython-notebook.ipynb to pdf
[NbConvertApp] Support files will be in 01_introduction-IPython-notebook_files/
[NbConvertApp] Making directory 01_introduction-IPython-notebook_files
[NbConvertApp] Writing 27538 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running pdflatex 3 times: [u'pdflatex', u'notebook.tex']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 140631 bytes to 01_introduction-IPython-notebook.pdf

In [15]:
!open 01_introduction-IPython-notebook.pdf

In [16]:
!ipython nbconvert --to 'html' 01_introduction-IPython-notebook.ipynb


[NbConvertApp] Converting notebook 01_introduction-IPython-notebook.ipynb to html
[NbConvertApp] Writing 221940 bytes to 01_introduction-IPython-notebook.html