IPython Notebook & Pandas

OpenTechSchool Workshop - Hugo Herter - August 2014

Description

During this workshop, you will learn how to combine the easiness of IPython Notebook with the power of Pandas for data processing in Python.

The targeted audiences are scientists and other people doing data analysis. No prior knowledge of Python is required, but basic programming understanding is required (variables, functions, ...).

IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.

pandas is a library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Resources

This workshop is divided in two parts: The first part is about installing everything and getting started with IPython Notebook, and the second part is an introduction to Pandas within the Notebook.

These slides/documentation and examples are hosted on GitHub:

You can also have a look at this other workshop from OpenTechSchool, which goes further into details of every step, but does not cover Pandas.

Patches are welcome !

* These slides are nothing but an IPython notebook

Installing

You are free to use Python 2 or the newer Python 3, but the library folium, used in the last part of the Pandas document does has not been ported to Python 3 yet and won't work.

POSIX vs Windows

We will start this workshop by installing Python, IPython Notebook, Pandas and the extra libraries we will be using. The installation procedure is pretty similar on popular Linux distributions and on Mac OS X, but quite different on Windows which is covered in a dedicated section.

Terminal

Some instructions will be designed for use in the terminal on Linux and OS X. These will appear as follow :

some instruction for the terminal

Windows users should not need to use these commands if they use the installation method further described.

Install Python

Linux Ubuntu/Debian

Python comes pre-installed on your system, but we will want the package manager pip, and some extra files in order to compile the latest version of Pandas:

apt-get install build-essential python-dev

Mac OS X

Python comes pre-installed on your system, but we will want the package manager pip, and some extra files in order to compile the latest version of Pandas:

  • Install XCode from the Apple App Store
  • Install Homebrew from http://brew.sh
  • Install Homebrew's version of Python: brew install python

Install IPython Notebook and Pandas

Install Ipython Notebook and Pandas

pip install --user ipython[notebook] pandas

(compiles a bunch of stuff...)

Note: Recent Linux distributions also include pre-compiled versions of these libraries, but they might be out-of-date. Search for ipython-notebook and python-pandas.

Install on Windows with Anaconda

Python does not come pre-installed on your system, and the absence of a decent shell and compiler makes it difficult to install all the dependencies you will need, so the best plan is to install a scientific bundle with most of the tools we need prepared.

http://continuum.io/downloads

Note: You can also use Anaconda on OS X and Linux, but it is not the recommended approach.

Other Libraries

You will be using a few other libraries during this workshop. They are optional but some are pretty cool. You can install them now, or keep going and install them when needed.

  • Requests for HTTP requests pip install requests
  • Folium for Maps pip install folium

You may also want to have a look at this library which is not covered:

  • Vincent for nice looking Graphs pip install vincent

Launch IPython Notebook

Linux / Mac OS X

ipython notebook

Windows

???

IPython Notebook pops up

If your web browser does not automatically open on a page like this one, open it and go to http://localhost:8888.

Create a New Notebook

Give it a shot !

Enter some Python instructions in the cell, then press Run Cell in the menu or SHIFT-ENTER.

The notebook will show everything that is printed and the output of the last instruction if not None.


In [6]:
print("Hello everybody !")
print("I hope you are doing fine.")


Hello everybody !
I hope you are doing fine.

In [7]:
r = range(0, 100, 6)

In [20]:
r


Out[20]:
[0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96]

Mix code with notes and explainations

Change the type of cell in the menu toolbar to Markdown, and insert your notes around the code.

Advanced content

You can display anything you want in a Notebook, as long as it is HTML. Below are two basic example.

Images

Wether you want to include some photos or to visualize some 2D output, Images are always useful. Note that the image must be present on your hard drive, either in absolute or relative filepath.


In [45]:
from IPython.core.display import Image 
Image(filename='files/ots-logo.png', height=100)


Out[45]:

Arbitrary HTML

If you don't find what you need in IPython or another library for displaying your data, you can always fallback to inserting raw HTML in your notebook.

Here is arbitrary HTML that includes another website within the page using an iframe element.


In [59]:
from IPython.core.display import HTML
HTML("<iframe src='http://pandas.pydata.org/' width='880px' height='400px'></iframe>")


Out[59]:

Next: Numeric and Pandas

You got IPython Notebook working and your first code executing within.

It is time to jump to some processing !