Scientific Python: part 2

Plotting with matplotlib

Software Carpentry Bootcamp
eResearch NZ 2013

Prepared by: Ariel Rokem

Thanks to:Justin Kitzes, Paul Ivanov

1. Getting Started

1.1 What is matplotlib?

Matplotlib is the most popular and mature library for plotting data using Python. It has all of the functionality you would expect, including the ability to control the formatting of plots and figures at a very fine level.

The official matplotlib documentation is at http://matplotlib.org/
The matplotlib gallery is at http://matplotlib.org/gallery.html

1.2 Importing matplotlib

Matplotlib is often used through 'pyplot', which provides a high-level interface for plotting.


In [ ]:
# In IPython or the IPython notebook, it's easiest to use the pylab magic, which
# imports matplotlib, numpy, and scipy.

# The inline flag means that images will be shown here in the notebooks, rather
# than in pop-up windows.

%pylab inline

# If you are using 'regular' Python, however, you'll want the following. You'll
# need to also separately import numpy and any other packages that you might need.

#import matplotlib.pyplot as plt

2. Creating Figures

There are two major challenges with creating figures. First is understanding the syntax to actually make the basic plot appear. Second is formatting the basic plot to look exactly how you would like it to look. In general, the formatting will probably take you longer...

Within pyplot (currently imported as 'plt'), there are two basic ways to go about making plots - using the Matlab-like clone, and using the object-oriented approach. The latter provides better control over plot features, while only requiring slightly more typing. It's easy to quickly outgrow the Matlab clone, so we'll go right to the object-oriented syntax.

2.1 A first plot

In simple matplotlib plotting, there are two concepts to distinguish:

  • Figure - the entire figure, like what you might see in a journal, including all subplots, axes, lines, labels, etc. The whole encilada.

  • Subplot/Axes - one of the sub-sections of the figure, labeled (a), (b), etc. in articles. Each subplot will contain one Axes object, which is the container where all of the useful stuff, such as actual lines, legends, labels, etc., are actually housed.

For example, here's how to make one figure with two subplots, the second of which contains two lines.


In [ ]:
# Make some data to plot
x = np.linspace(0, 2*np.pi)
y1 = np.sin(x)
y2 = np.cos(x)

# First, create an empty figure with 2 subplots
# - The arguments (1, 2) indicate 1 row and 2 cols
# - The function plt.subplots returns an object for the figure and for each axes
# - There are multiple ways to accomplish this same goal, but this is probably the
#   simplest - notice that each subplot is associated with one of the axes objects.


# Next, put one line on the first axis and both lines on the second axis
# - On the second axes, add a legend to distinguish the two lines


# Finally, save the figure as a png file

Exercise 1 - Simple formatting

There are lots of formatting options to play with. Modify the code above to make some changes to the formatting of these plots.

First, make some changes to the axes. HINT: These adjustments are methods to the ax1 and ax2 objects, and (conveniently) they all start with the text 'set'. Try typing 'ax1.set' and hitting tab to see some options.

  • Change the x axis on ax1 to run from 0 to 4. (HINT: set_xlim)
  • Add labels to the x axis on both subplots (HINT: set_xlabel, set_ylabel)

Second, make some changes to the lines that you plotted using ax1.plot(...). These changes can be made by looking at the various arguments that you can give to the plot method. You can do this easily by typing ax1.plot? in the cell below and running it - this will give you pop-up help for the plot method.

  • Make the sine line on ax1 red and dashed.
  • Put a circular black marker on top of the cos line on ax2. Make it really big.

Bonus: Eliminate the box around the legend on the second subplot.


In [ ]:
#ax1.plot?

2.2 Other types of plots

In the example above, we used the plot method to make line plots. There are also methods to make scatter plots, barplots, histograms, loglog plots, semilog plots, etc.


In [ ]:
# Make some data to plot
x = np.arange(0, 100)
y = np.random.rand(100)  # 100 random numbers

# Make a figure with 6 subplots and axes
fig, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(3, 2)

# Add data to each axis. Optional arguments to each method will customize each plot.

2.3 Plotting images

Matplotlib also makes it easy to plot images. For this, you can use the plot method imshow (syntax borrowed from Matlab).


In [ ]:
# Read an image file for first subplot, generate random array for second
img1 = plt.imread('lena.png')
img2 = np.random.rand(128, 128)

# Make figure

It can be very intimidating to try to craft exactly the figure that you want, especially if you are used to being able to adjust things visually using a program like Excel.

If you get stuck and don't know where to start, or just want to learn more about what matplotlib can do, a great option is to have a look at the matplotlib gallery, which can be found at http://matplotlib.org/gallery.html. A good way to get started is to find a figure here that sort of looks like what you want, copy the code, and modify it for your own needs.

Have a look at the matplotlib gallery, find a cool looking figure, copy the code into the box below, and modify it. Note that some of the examples might require packages that are not installed on your machine (in particular those that make maps) - if this is the case, pick another example for the purposes of this exercise.

In IPython, you can use the "load magic". Type %loadpy and then the URL of the py file containing the code, and it will automatically copy it into a cell below. Run the cell with the code to see the figure.


In [ ]:
# Try it here...
%loadpy http://matplotlib.org/mpl_examples/pylab_examples/contour_demo.py

4. Formatting figures

The formatting of figures often takes longer than actually setting them up and adding data. There are many different approaches to formatting figures in matplotlib (many goals can be accomplished in different ways, using different commands), and you will come across many of these as you learn more. The tips below give a few simple ways to get started.

4.1 Common formatting tricks

There are hundreds of formatting options available in matplotlib, many of which you will end up using occasionally. There are a few options, however, that you will use very frequently. A short list of these might include:

  • Changing axis limits
  • Changing line colors
  • Changing lines to dashed (for black and white figures)
  • Adding markers to lines
  • Make tick labels point outward instead of inward
  • Get rid of the box surrounding the plot
  • Adding subplot letters, like (a) and (b)

Here's how to accomplish all of these things.


In [ ]:
# Make some data to plot
x = np.linspace(0, 2*np.pi)
y1 = np.sin(x)
y2 = np.cos(x)

# First, create an empty figure with 1 subplot
fig, ax1 = plt.subplots(1, 1)

# Add title and labels
ax1.set_title('My Plot')
ax1.set_xlabel('x')
ax1.set_ylabel('y')

# Change axis limits
ax1.set_xlim([0,2])
ax1.set_ylim([-1, 2])

# Add the lines, changing their color, style, and marker
ax1.plot(x, y1, 'k--o', label='sin') # Black line, dashed, with 'o' markers
ax1.plot(x, y2, 'r-^', label='cos') # Red line, solid, with triangle-up markers

# Adjust tick marks and get rid of 'box'
ax1.tick_params(direction='out', top=False, right=False) # Turn ticks out
ax1.spines['top'].set_visible(False) # Get rid of top axis line
ax1.spines['right'].set_visible(False) #  Get rid of bottom axis line

# Add subplot letter
ax1.annotate('(a)', (0.01, 0.96), size=12, xycoords='figure fraction')

# Add legend
ax1.legend()

# Finally, save the figure as a png file
fig.savefig('myfig-formatted.png')

In [ ]:
import matplotlib.mlab as ml
tab = ml.csv2rec('../testing/sightings_tab_lg.csv')
print(type(tab))

In [ ]:
tab[0]

In [ ]:
tab.dtype

In [ ]:
tab['count']

In [ ]:
tab['animal']

In [ ]:
elk_tab = tab[np.where(tab['animal']=='Elk')]

In [ ]:
fig, ax = plt.subplots(1)
ax.plot(elk_tab['date'], elk_tab['count'])
#fig.autofmt_xdate()

4.2 Advanced formatting with rcParams

A fast way to control many aspects of figure formatting is to temporarily modify a dictionary called rcParams. This dictionary allows you to set, in one place, many of the options that you will need to change before submitting your figures for publication, including figure fonts, font sizes, figure size, figure dpi, etc., as well as many options regarding how elements are spaced in your figures (ie, the distance between different elements of subfigures).

WARNING: The rcParams dictionary is GLOBAL to matplotlib's plot library - therefore, if you make a change to it, all future plots that you make will also have those changes (until you close your Python session). If you do modify rcParams, it's good hygiene to set it back to the defaults after you've made your plot, as described below.

A description of the rcParams options can be found at http://matplotlib.org/users/customizing.html

Run the code below to see all of the different options you can set here. Once you've looked this over, hit the Toggle button on the left to hide the output again.


In [ ]:
# View rcParams
matplotlib.rcParams

Now, save the default rcParams dictionary so we can 'reset' everything after we change it. In a 'regular' Python interpreter, or inside a Python module, you can just use plt.rcdefaults() to reset the defaults, instead of saving them here then resetting them manually later. However, this work in the notebook.


In [ ]:
# Save default rcParams so we can reset them later
# WARNING: Do not run this cell after changing rcParams, as it will overwrite the
# defaults that we are trying to preserve.
rcdef = plt.rcParams.copy()

Now let's make a simple plot, using mostly default formatting.


In [ ]:
# Make sure rcParams is at default settings, since we're messing with it
plt.rcParams.update(rcdef)

# Make a simple figure with default formatting
fig, axall = plt.subplots(1, 2)  # axall is the tuple containing both axis objects

for ax in axall:
    ax.plot(np.random.rand(100), 'k-o', label='Random')
    ax.set_ylim([0, 1.2])
    ax.set_ylabel('Value')
    ax.legend()

There are many obvious formatting problems here. The legend is too big, the axis labels are too small, the legend shouldn't have a box (arguably), and the y-axis label on the second subplot is hidden behind the first subplot. Also, although you can't see it here, the figure resolution is too low to print without appearing fuzzy.

The code below changes a whole bunch of values in rcParams to get a figure to look juuuuust right. Then it makes the figures, saves it, and puts rcParams back to its default.


In [ ]:
# Choose a bunch of new parameter values
# In practice, you'll try modifying these, running the code and saving the figure,
# looking at the figure, then making more modifications until you're happy.
newparams = {'axes.labelsize': 14, 'axes.linewidth': 1, 'savefig.dpi': 300, 
             'lines.linewidth': 1.5, 'figure.figsize': (8, 3),
             'figure.subplot.wspace': 0.4,
             'ytick.labelsize': 12, 'xtick.labelsize': 12,
             'ytick.major.pad': 5, 'xtick.major.pad': 5,
             'legend.fontsize': 12, 'legend.frameon': False, 
             'legend.handlelength': 1.5}

# Update the global rcParams dictionary with the new parameter choices
# Before doing this, we reset rcParams to its default again, just in case
plt.rcParams.update(rcdef)
plt.rcParams.update(newparams)

# Make the new figure with new formatting
fig, axall = plt.subplots(1, 2)

for ax in axall:
    ax.plot(np.random.rand(100), 'k-o', label='Random')
    ax.set_ylim([0, 1.2])
    ax.set_ylabel('Value')
    ax.legend()
    
fig.savefig('myfig-advanced.png')

# Put rcParams back to default
plt.rcParams.update(rcdef)

You'll want to check that your formatting looks good by looking at the file that's being saved to your hard disk, since that's exactly how it will look to the publisher or when inserted into your manuscript.