Matplotlib

A python 2D plotting library

Andreas Linz, Ying-Chi Lin and Huan Meng.

This presentation refers to the latest stable matplotlib version 1.4.2.


Outline:

- What is matplotlib?
- How to install matplotlib?
- Setting up matplotlib for iPython Notebook
- Pyplot tutorial 

What is matplotlib?

  • open-source python library for 2D plotting
  • API inspired by MATLAB
  • PSF-like license (Python Software Foundation)
  • great-publication figures
  • makes heavy use of NumPy (other extension codes)
  • can be used in python scripts, Ipython notebooks, etc.
  • backends for generating vector or pixel graphics, PDFs, postscript etc.
  • some of the available diagram types: plots, histograms, bar charts, scatterplots ...
  • matplotlib is divided into three main parts:
    1. pylab interface to create MATLAB like figures, provided by matplotlib.pylab
    2. matplotlib frontend or matplotlib API
    3. matplotlib backends are the renderers for different kinds of formats,device-dependent drawing devices

How to install matplotlib?

At first make sure you have pip or some other pre-built packages installed!

  • pip
  • pip works on Unix/Linux, OS X, and Windows.
  • pip comes with Python (no need to install it seperately)

Source install from git

  • When you are interested in contributing to matplotlib development, running the latest source code, or just like to build everything yourself, it is not dicult to build matplotlib from source:
git clone git@github.com:matplotlib/matplotlib.git
cd matplotlib
python setup.py build
python setup.py install

OSX

  • new terminal APP
curl -O https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip install matplotlib
  • if you want to install IPython with the dependencies needed for Ipython notebooks
pip install ipython[notebook]

Linux

  • use the package manager of your choice:
    • Fedora / Redhat:
      • sudo yum install python-matplotlib
    • Debian / Ubuntu:
      • sudo apt-get install python-matplotlib
    • Arch:
      • sudo pacman -S python-matplotlib
  • usually the package should be called something like python-pip

Windows

  • use one of the scipy-stack compatible Python distributions such as Python(x,y), Enthought Canopy, or Continuum Anaconda.
  • install the latest python release (python 3.4+ includes pip)
  • add pip to your PATH by adding the Scripts/ subdirectory of your python installation to this environment variable:
    • Start → Control Panel → System → Advanced → Environment Variables ...
    • append Path under System Variables with the Path of the Scripts directory, the default is C:\Python34\Scripts

install matplotlib

  • sidenote: if you haven't installed Ipython yet, then you can install it via pip install ipython[notebook] (installs with dependencies for Ipython notebooks)
  • pip install matplotlib
  • or use the precompiled installers for installing matplotlib, which is recommended

check your installation

  • python3 -c 'import matplotlib; print(matplotlib.__version__, matplotlib.__file__)'
  • python2.7 -c 'import matplotlib; print(matplotlib.__version__, matplotlib.__file__)'

Setting up matplotlib for Notebook

Enable matplotlib

  • start Ipython with the --matplotlib switch: ipython notebook --matplotlib
  • or use the magic command at the top cell of your notebook: %matplotlib inline
    • inline - show plots inside the Notebook

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt

A small test example to see if matplotlib is working properly on your Notebook


In [ ]:
plt.plot([2,4,6,9],[1,4,9,16], 'ro')
plt.axis([0, 10, 0, 20])
plt.show()

Pyplot tutorial

What is pyplot?

  • matplotlib.pyplot: is a collection of functions that make matplotlib work like MATLAB
  • each pyplot function makes some change to a figure: e.g. create a figure, create a plotting area in a figure, decorate the plot with labels, etc.
  • for detailed descriptions of all pyplot functions, see here

Parts of a Figure

Figure

  • can contain several Axes (= plots)
  • to create a new figure use

In [ ]:
fig = plt.figure()  # create an empty figure with no axes
fig, ax = plt.subplots(2, 2)  # a figure with a 2x2 grid of Axes

function subplots( ) returns a tuple fig, ax

  • fig is the matplotlib.figure.Figure object
  • ax can be either a single axes object or an array of axes objects

How to draw a simple plot (Axes)

Function: plot(arg, kwargs)

  • for plotting lines and/or markers to the Axes
  • arg can be a list or multiple x, y pairs

In [ ]:
# define data as lists
x1 = [2,4,6,8]
y1 = [5.5,7,2,4]
x2 = [4,6,8,10,12,14]
y2 = [-12,-14,-7,-12,-3.3,-1]
#let's draw the first line
# if no x-value is given an incrementing index, beginning from zero, will be used
plt.plot(y1)

#draw x, y plot
#plt.plot(x1,y1)

#draw two lines in one plot
#plt.plot(x1,y1,x2,y2)

plt.show()

Change line/marker style

  • kwargs: so-called 'keyword args', is a string to specify the formatting of the plot. It contains actually the 'keys' of several formatting dictionaries which define things such as line style, marker style and color.

In [ ]:
#-g : line in green 
plt.plot(y1,'-g')

#ro- : red circle with line
plt.plot(x1,y1,'ro-')

#g* : star symbols in green
plt.plot(x1,y1,'ro-',x2,y2,'g*')
plt.show()
  • Instead of using the keys(abbreviations), kwargs can also be given using Line2D property assignments.
  • axis( ): define the range of the axes [xmin, xmax, ymin, ymax]
  • legend( ): show labels

In [ ]:
plt.plot(x2, y2, color='green', linestyle='dashed', marker='o',
     markerfacecolor='blue', markersize=12,label='line1',linewidth=3)
plt.axis([2, 16, -16, 2])
plt.legend()
plt.show()
  • setp( ) (“set property”) : can also be used to define line style

In [ ]:
lines = plt.plot(x1, y1, x2, y2)
# use keyword args
plt.setp(lines, color='r', linewidth=2.0) 
# or MATLAB style string value pairs
plt.setp(lines, 'color', 'r', 'linewidth', 2.0)
plt.show()

Adding axes title, axis label and line legend

to see available text properties (horizontal-alignment, rotation etc...)


In [ ]:
line1 = plt.plot(x1, y1)
line2 = plt.plot(x2, y2)
plt.setp(line1, color='r', linewidth=2.0, label='mouse')
plt.setp(line2, color='b', linewidth=2.0, label='pig')
plt.title('test results', color='g',fontsize=18)
plt.xlabel('days',fontsize=16,style='italic')
plt.ylabel('value',fontsize=16,style='italic', rotation='horizontal')
plt.legend() # to show legend
plt.show()

There are many different types of graphs possible, for more examples.

Plot functions


In [ ]:
import numpy as np
import math

# [linear space](http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html),
# returns a list of equally evenly spaced numbers in the given interval
x1 = np.linspace(0.0, 5.0)
x2 = np.linspace(0.0, 1.0, 1000)

# f(x) = cos(2*pi*x) * e^(-x)
y1 = np.cos(2 * np.pi * x1) * np.exp(-x1)

# second plot without numpy array for the y-axis
def f(x):
    return math.sin(x**x)

y2 = [ f(x*math.pi) for x in x2 ]

plt.title('Two subplots')
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'yo-')
plt.ylabel(r'$\cos(2\pi*x) * e^{-x}$', fontsize=16)

plt.subplot(2, 1, 2)
plt.plot(x2, y2, 'r.-')
plt.xlabel('radiant')
plt.ylabel(r'$\sin(\pi*x^{\pi*x})$', fontsize=16)

# change the plot size
from pylab import rcParams
xSize, ySize = rcParams['figure.figsize']
# run this cell three times and the plot size will change, global state magic!
rcParams['figure.figsize'] = (16, 9)
plt.show()

In [ ]:
# reset global size
rcParams['figure.figsize'] = (6.0, 4.0)

Using external datasets

  • all plotting functions expect np.array or np.ma.masked_array as input (where np is numpy)
  • example datasource Berlin OpenData (their data is not UTF-8 formatted!)
    • I've used the first dataset that I've found on daten.berlin.de, so the following diagram is only for demonstration purposes and nothing else!

In [ ]:
# dataset
datasetpath = 'data/EWR_Ortsteile_2012-12-31.csv'
data = None
with open(datasetpath) as f:
    data = f.readlines()

for i in range(10):
    print(data[i])
print('...')

In [ ]:
# distribution of age and population for Berlin districts divided by foreigners and locals
import csv
from statistics import mean
import numpy as np
from collections import defaultdict

def clean(age, count):
    if 'und' in age:
        # fix inconsistent values
        agerange = '95-100+'
    else:
        ages = age.split('_')
        agerange = '{}-{}'.format(ages[0], ages[1])
    return agerange, int(count.split(',')[0])

data = { 
'total': {
    'foreigner': defaultdict(list),
    'local': defaultdict(list)
    }
}
with open(datasetpath) as f:
    # automatically uses csv header to generate keys
    # regardless of their name, csv files are not always delimited by `,`
    csvdata = csv.DictReader(f, delimiter=';') 
    # csvdata is a list of dict's, one dictionary for each line of data
    for dict_row in csvdata:
        # collect values for each district
        district = dict_row['Bez-Name']
        # maybe I should use defaultdicts ...
        if not district in data:
            data[district] = {
                'foreigner': {},
                'local': {}
            }
        citizenship = dict_row['Staatsangeh']
        # clean up inconsistent values
        age, cnt = clean(dict_row['Altersgr'], dict_row['Häufigkeit'])
        if citizenship == 'A':
            # totals could be accumulated by a seperate function, but it's more convenient this way
            data['total']['foreigner'][age].append(cnt)
            data[district]['foreigner'][age] = cnt
        elif citizenship == 'D':
            data['total']['local'][age].append(cnt)
            data[district]['local'][age] = cnt

# calculate the average age of each district for foreigner and locals
foreigners = {
    'age': [],   # x-axis
    'means': []  # y-axis
}

locals = {
    'age':[],
    'means': []
}

# iterate over the sorted keys and add the values in order to the list
for age, number in sorted(data['total']['foreigner'].items()):
    # number is a list containing the number of foreigners/locals of the specific age for each district
    foreigners['age'].append(age)
    # add the arithmetic mean over all numbers of foreigners/local for that age in each district
    foreigners['means'].append(mean(number))
for age, number in sorted(data['total']['local'].items()):
    locals['age'].append(age)
    locals['means'].append(mean(number))

# import pprint
# pp = pprint.PrettyPrinter()
# pp.pprint(locals)

# numpy.arange returns evenly spaced for the given range, g.e. arange(3) returns [0, 1, 2]
# this will be used to set the x-coordinate of each bar in the plot
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html
indices = np.arange(len(foreigners['means']))
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar
# plt.bar("x-coord from the left", "bar height", ...)
fplt = plt.bar(indices, foreigners['means'], color='r')
# bottom - distance from the x-axis
lplt = plt.bar(indices, locals['means'], color='y', bottom=foreigners['means'])
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
plt.xticks(indices, locals['age'], rotation='vertical')
# labels
plt.xlabel('age groups')
plt.ylabel('citizens')
# prints the legend
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.legend
plt.legend((fplt, lplt), ('foreigners', 'locals'))
# show the plot
plt.show()

Fin!

PS

If you are interested in more examples, check this out.


In [ ]: