OpenTechSchool Workshop - Hugo Herter - August 2014
During this workshop, you will learn how to combine the easiness of IPython Notebook with the power of Pandas for data processing in Python.
The targeted audiences are scientists and other people doing data analysis. No prior knowledge of Python is required, but basic programming understanding is required (variables, functions, ...).
IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.
pandas is a library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
This workshop is divided in two parts: The first part is about installing everything and getting started with IPython Notebook, and the second part is an introduction to Pandas within the Notebook.
These slides/documentation and examples are hosted on GitHub:
You can also have a look at this other workshop from OpenTechSchool, which goes further into details of every step, but does not cover Pandas.
Patches are welcome !
* These slides are nothing but an IPython notebook
You are free to use Python 2 or the newer Python 3, but the library folium
, used in the last part of the Pandas document does has not been ported to Python 3 yet and won't work.
We will start this workshop by installing Python, IPython Notebook, Pandas and the extra libraries we will be using. The installation procedure is pretty similar on popular Linux distributions and on Mac OS X, but quite different on Windows which is covered in a dedicated section.
Some instructions will be designed for use in the terminal on Linux and OS X. These will appear as follow :
some instruction for the terminal
Windows users should not need to use these commands if they use the installation method further described.
Python comes pre-installed on your system, but we will want the package manager pip
, and some extra files in order to compile the latest version of Pandas:
apt-get install build-essential python-dev
Python comes pre-installed on your system, but we will want the package manager pip
, and some extra files in order to compile the latest version of Pandas:
brew install python
Install Ipython Notebook and Pandas
pip install --user ipython[notebook] pandas
(compiles a bunch of stuff...)
Note: Recent Linux distributions also include pre-compiled versions of these libraries, but they might be out-of-date. Search for ipython-notebook
and python-pandas
.
Python does not come pre-installed on your system, and the absence of a decent shell and compiler makes it difficult to install all the dependencies you will need, so the best plan is to install a scientific bundle with most of the tools we need prepared.
Note: You can also use Anaconda on OS X and Linux, but it is not the recommended approach.
You will be using a few other libraries during this workshop. They are optional but some are pretty cool. You can install them now, or keep going and install them when needed.
You may also want to have a look at this library which is not covered:
pip install vincent
If your web browser does not automatically open on a page like this one, open it and go to http://localhost:8888.
In [6]:
print("Hello everybody !")
print("I hope you are doing fine.")
In [7]:
r = range(0, 100, 6)
In [20]:
r
Out[20]:
You can display anything you want in a Notebook, as long as it is HTML. Below are two basic example.
Wether you want to include some photos or to visualize some 2D output, Images are always useful. Note that the image must be present on your hard drive, either in absolute or relative filepath.
In [45]:
from IPython.core.display import Image
Image(filename='files/ots-logo.png', height=100)
Out[45]:
If you don't find what you need in IPython or another library for displaying your data, you can always fallback to inserting raw HTML in your notebook.
Here is arbitrary HTML that includes another website within the page using an iframe element.
In [59]:
from IPython.core.display import HTML
HTML("<iframe src='http://pandas.pydata.org/' width='880px' height='400px'></iframe>")
Out[59]: