02 Graphics

Some of this tour is modeled on the Matplotlib pyplot tutorial, the SciPy lecture notes on plotting, Jake Vanderplas' Matplotlib Intro and the Software Carpentry Bootcamp lesson on Matplotlib.


Matplotlib

Matplotlib is probably the single most used Python package for 2D-graphics. It provides both a very quick way to visualize data from Python and publication-quality figures in many formats.

This notebook will provide a brief summary of the most important aspects of Matplotlib for our work, but you are encouraged to explore the extensive documentation for the library and in particular, the gallery of plots.

No one can keep all of the functions and fine layout control commands in their brain. Often when I need to make a plot, I go to the gallery page and browse the images until I find one that is similar to what I want to create and then I copy the code and modify it to suit my needs. You are encouraged to do the same.

Importing the library

To import the parts of Matplotlib that we will need, we use


In [ ]:
# In iPython or the iPython notebook, it's easiest to use the pylab magic, which
# imports matplotlib, numpy, and scipy.

# The inline flag means that images will be shown here in the notebooks, rather
# than in pop-up windows.

%pylab notebook

# If you are using 'regular' Python, however, you'll want the following. You'll
# need to also separately import numpy and any other packages that you might need.

import matplotlib.pyplot as plt
import numpy as np

Creating figures

There are two major challenges with creating figures. First is understanding the syntax to actually make the basic plot appear. Second is formatting the basic plot to look exactly how you would like it to look. In general, the formatting will probably take you longer...

Within Matplotlib's pyplot module (currently imported as 'plt'), there are two basic ways to go about making plots - using the Matlab-like clone, and using the object-oriented approach. The latter provides better control over plot features, while only requiring slightly more typing. The Matlab-clone syntax is good for quick and dirty, fast and simple plotting, while the object-oriented syntax is better for refining your plots to make them "publication-ready". We will use a little bit of both here.

Note: When you look at the source code from the Matplotlib gallery, the examples will mostly be using the object-oriented syntax.

Simple Plots

Here is an example of creating a simple plot of two functions. To plot a function, we need to create a NumPy array for the independent variable ($x$) and then set the dependent variable ($y = f(x)$) by using NumPy's built-in functions.


In [ ]:
#create the data to be plotted
x = np.linspace(0, 2*np.pi, 300)
y = np.sin(x)
y2 = np.sin(x**2)

In this case, x is now a NumPy array with 300 values ranging from 0 to 2$\pi$ (included). y is the sine (array of 300 values) and y2 is the square of the sine (array of 300 values) at each of those 300 x values.


In [ ]:
#Now plot it
plt.plot(x, y) 
plt.plot(x, y2)
plt.show()

What this plot lacks is important information about what is in it - there are no axis labels, no legend telling us what is different about blue vs. green, and the size of the font on the axis labels is a bit small. We should probably try to improve the plot's readability so other people viewing it will understand what we are trying to convey.

Font size for labels is especially important when you are creating plots for inclusion in a slide presentation. If your labels are too small for the audience to read, you will quickly lose their attention.

Our demos during the final exam are a perfect place to practice creating readable plots. Grading of the demos will include an evaluation of how well information is presented, including readability of plots.

We'll explore some of the capabilities in Matplotlib for refining how our data is presented graphically below, but first, we'll look at a couple more simple tricks.

You can control the style, color and other properties of the markers, for example:


In [ ]:
plt.plot(x, y, linewidth=2);
plt.plot(x, y2, linewidth=2);

In [ ]:
#decrease the number of points to illustrate the use of markers
x = np.linspace(0, 2*np.pi, 50)
y = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y, 'o', markersize=5, color='r');
plt.plot(x, y2, '^', markersize=5, color='b');

See the Matplotlib line style demo to view the different types of line styles you can choose from. The source code is a little "slick", though, so it may not be obvious to you how the markers are set in that example.

Saving a figure

To save a figure in a standard format such as png, jpg or pdf, just call the savefig method with a filename that includes the extension for the type of file you wish to create:


In [ ]:
#back to our original data
x = np.linspace(0, 2*np.pi, 300)
y = np.sin(x)
y2 = np.sin(x**2)

plt.plot(x, y)
plt.plot(x, y2)

#add a grid
plt.grid()

plt.savefig("Example.pdf")
plt.savefig("Example.png")

Refining our plots

Now that we've seen how to create some data and plot it, we'll iteratively improve on the look of the plots as we explore features of Matplotlib. Start with something simple: draw the cosine/sine functions.


In [ ]:
#create the data
x = np.linspace(-np.pi, np.pi, 56, endpoint=True)
c, s = np.cos(x), np.sin(x)

In [ ]:
plt.plot(x, c)
plt.plot(x, s)
plt.show()

Matplotlib comes with a set of default settings that allow customizing all kinds of properties. You can control the defaults of almost every property with Matplotlib: figure size and dpi, line width, color and style, axes, axis and grid properties, text and font properties and so on.

We can explicitly set all of the parameters that define the plot when we create it. Here are the default settings, explicitly shown.

Play around with them to see how they change the look of the plot.


In [ ]:
# Create a figure of size 8x6 points, 80 dots per inch
plt.figure(figsize=(8, 6), dpi=80)

# Create a new subplot from a grid of 1x1
plt.subplot(1, 1, 1)

# Plot cosine with a blue continuous line of width 1 (pixels)
plt.plot(x, c, color="blue", linewidth=1.0, linestyle="-",label="cosine")

# Plot sine with a green continuous line of width 1 (pixels)
plt.plot(x, s, color="green", linewidth=1.0, linestyle="-",label="sine")

# Set x limits
plt.xlim(-4.0, 4.0)

# Set x ticks
plt.xticks(np.linspace(-4, 4, 9, endpoint=True))

# Set y limits
plt.ylim(-1.0, 1.0)

# Set y ticks
plt.yticks(np.linspace(-1, 1, 5, endpoint=True))

# Add axis labels, setting a readable font size
plt.xlabel("x (rad)",fontsize=15)
plt.ylabel("sin,cos",fontsize=15)

# Add a legend
plt.legend(loc='upper left')

# Save figure using 72 dots per inch
plt.savefig("plot_example.png", dpi=72)

# Show result on screen
plt.show()

That's a pretty nicely formatted figure, ready for publication or presentation. Lines are thick enough to see and distinguishable even in black and white, and the legend and labels are also large enough to read easily. Axis labels would also generally be wise to include, but in this case they are implicit from the legend and the values on the axes.

Exercise 1

Try to create a plot that matches the one you see below. The function that is plotted in blue is $ y(x) = 5\cos(x) - \frac{1}{2}\sqrt{x}$.


In [ ]:
#your code here

Other types of plots

In the example above, we used the plot method to make line plots. There are also methods to make scatter plots, barplots, histograms, loglog plots, semilog plots, etc.

Errorbars

For data you might collect in the laboratory, you want to show the uncertainties on your data points.


In [ ]:
#Simple constant error bars on each point
x = np.array([0.0, 2.0, 4.0, 6.0, 8.0])
y = np.array([1.1, 1.9, 3.2, 4.0, 5.9])
plt.figure()
plt.errorbar(x, y, xerr=0.2, yerr=0.6, marker='o')
plt.title("Simplest errorbars, 0.2 in x, 0.6 in y");

Perhaps your error bars vary from point to point:


In [ ]:
# example data
x = np.arange(0.1, 4, 0.5)
y = np.exp(-x)

# example variable error bar values
yerr = 0.1 + 0.2*np.sqrt(x)
xerr = 0.1 + yerr

plt.figure()
plt.errorbar(x, y, xerr, yerr, marker='^')
plt.show()

Subplots and Logarithmic axes

You may wish to have the axes plot on a logarithmic scale. There are two ways to do this: "log-log" or "semilog", where only one axis is in log scale. To simplify the presentation of the different options, we will also divide our figure up into four subplots. Have a look at each possibility:


In [ ]:
x = np.linspace(0., 5.)
y = np.exp(-x)

#Make a figure with 4 subplots and axes side-by-side
fig, ax = plt.subplots(1,4, figsize=(10,6))

#Plot on each axis
ax[0].plot(x,y)
ax[1].loglog(x,y)
ax[2].semilogx(x,y)
ax[3].semilogy(x,y);

Recall that an exponential function plotted in semilogy is a straight line!

Scatter plots

For sparse data, sometimes you want to see the values plotted as a scatter plot of y vs. x:


In [ ]:
# Make some data to plot
x = np.arange(0, 100)
y = np.random.rand(100)  # 100 random numbers

plt.scatter(x,y);

Histograms

Histograms are a class of plot that allow you to present information on the frequency of a particular value occuring in a distribution of events. They are used universally in many fields of science, physics included.

Here is the wikipedia definition:

A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area proportional to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the total area equaling 1. The categories are usually specified as consecutive, non-overlapping intervals of a variable. The categories (intervals) must be adjacent, and often are chosen to be of the same size. The rectangles of a histogram are drawn so that they touch each other to indicate that the original variable is continuous.

Here is an example histogram annotated with text inside the plot, using the text function:


In [ ]:
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts',fontsize=20)
plt.ylabel('Probability',fontsize=20)
plt.title('Histogram of IQ',fontsize=20)

# This will put a text fragment at the position given:
plt.text(45, .027, r'$\mu=100,\ \sigma=15$', fontsize=20)
plt.axis([40, 160, 0, 0.03])
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.grid();

The number of bins was set at 50 in the second argument of plt.hist. The area is normalized to 1, so the values on the y axis are fractions that will add up to 1 if all of the green bars are added together. The alpha parameter sets the transparency of the fill color.

Play with the settings a little bit to see how they change the look of the plot.

Exercise 2

Make a NumPy array of 1000 normally distributed (bell curve) values with a mean of 42 and a variance (width) of 7. Then plot a histogram of the values in 42 bins with the color of the bins being a transparent blue. An example is provided but your plot will look slightly different because our arrays will be different.


In [ ]:
#Your code here

When you use the Matplotlib gallery to template a figure, you can very easily load the source code into your notebook and then modify it as needed to fit your specific needs. Try it now. After the code is loaded, just execute the cell to see the output.


In [ ]:
# Try it here...
%loadpy http://matplotlib.org/mpl_examples/pylab_examples/contour_demo.py

Common formatting tricks

There are hundreds of formatting options available in Matplotlib, many of which you will end up using occasionally. There are a few options, however, that you will use very frequently. Here's a short list...

  • Changing axis limits
  • Changing line colors
  • Changing lines to dashed (for black and white figures)
  • Adding markers to lines
  • Make tick labels point outward instead of inward
  • Get rid of the box surrounding the plot
  • Adding subplot letters, like (a) and (b)

...and some examples for how to accomplish all of these things.


In [ ]:
# Make some data to plot
x = np.linspace(0, 2*np.pi)
y1 = np.sin(x)
y2 = np.cos(x)

# First, create an empty figure with 1 subplot
fig, ax1 = plt.subplots(1, 1)

# Add title and labels
ax1.set_title('My Plot',fontsize=20)
ax1.set_xlabel('x',fontsize=20)
ax1.set_ylabel('y',fontsize=20)

# Change axis limits
ax1.set_xlim([0,2])
ax1.set_ylim([-1, 2])

# Add the lines, changing their color, style, and marker
ax1.plot(x, y1, 'k--o', label='sin') # Black line, dashed, with 'o' markers
ax1.plot(x, y2, 'r-^', label='cos') # Red line, solid, with triangle-up markers

# Adjust tick marks and get rid of 'box'
ax1.tick_params(direction='out', top=False, right=False) # Turn ticks out
ax1.spines['top'].set_visible(False) # Get rid of top axis line
ax1.spines['right'].set_visible(False) #  Get rid of bottom axis line

# Add subplot letter
ax1.annotate('(a)', (0.01, 0.96), size=12, xycoords='figure fraction')

# Add legend
ax1.legend()

# Finally, save the figure as a png file
fig.savefig('myfig-formatted.png')