So, you want to plot something in Python. Perhaps you've typed
import matplotlib.pyplot as plt
a few times...maybe you even managed to format your axis ticklabels for a special plot. Do you have a few examples lying around where you once used matplotlib to do something, but you don't quite remember why it worked?
This tutorial is for you. We will assume that you can follow the simple plotting tutorials. Instead, we will try to understand the ideas behind matplotlib's structure, but only enough to develop intuition for efficiently switching between simple, high-level commands and all the glorious guts beneath.
This intuition requires an understanding of the different roles of pyplot, backends, and the matplotlib API. We'll start by understanding backends, why they're confusing, and how we can stop thinking about them. We'll then see how convenient pylot is, and what its limitations are. We'll end with a dive into the matplotlib object space. Expect some bouncing back-and-forth between concepts and explicit examples.
Requirements:
matplotlib
, numpy
In [17]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
plt.show()
Observations:
show
function caused the plot to render in a jupyter notebook.Lots of magic here...
Matplotlib consists of 3 groups of Python objects:
Efficient, effective use of matplotlib involves: configuring the backend correctly, using pyplot when possible, and dropping into the API when necessary.
Foundational ideas: (http://matplotlib.org/faq/usage_faq.html)
Environmental hierarchy:
matplotlib.pyplot.FUNCTION
The cell below is same example as above, but more verbose: the pyplot state machine maintains knowledge
of what was plotted with plot
, such that a call to show
renders the figure, and the
%matplotlib
magic command handles the backend (object to pixel mapping, display, inline figure embedding). We've also made the namespaces really explicit.
In [18]:
import matplotlib.pyplot
%matplotlib inline
matplotlib.pyplot.plot([1,2,3,4])
matplotlib.pyplot.ylabel('some numbers')
#matplotlib.pyplot.show()
Out[18]:
It turns out that we don't even need the call to show
because of what the %matplotlib
call does.
We can remove the magic function if we specify the backend with an explicit 'write' call. This is a good example of how pyplot and the backend manage separate concerns.
Restart kernel here if you want to prove it actually works.
In [19]:
import matplotlib.pyplot
matplotlib.pyplot.plot([1,2,3,4])
matplotlib.pyplot.ylabel('some numbers')
matplotlib.pyplot.savefig('test.png')
Look for test.png
in your local directory.
What have we done? There's still the pyplot state machine (running in the ipython state machine!), but we've made explicit some of the magic the %matplotlib inline
call was performing. There are other arguments for %matplotlib
including notebook
, which provides a more interactive inline plot, see http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=magic#magic-matplotlib
[OPTIONAL]
We can also set the backend explicitly; see http://matplotlib.org/faq/usage_faq.html#what-is-a-backend. You'll need to restart your kernel again and uncomment the code in the cell below.
In [23]:
#import matplotlib
#matplotlib.use('TkAgg')
#import matplotlib.pyplot
#matplotlib.pyplot.plot([1,2,3,4])
#matplotlib.pyplot.ylabel('some numbers')
#matplotlib.pyplot.show()
That's all we'll say about backends. You have options for how and where to render your plots: you can make use of the magic %matplotlib
function, which sets an appropriate backend and does the inline plotting, or you can write your plots explicitly to files. We'll do the former here.
You can find lots of neat plotting examples at http://matplotlib.org/examples/index.html. Most of them involve both pyplot
function calls and operations on matplotlib API objects. The pattern is simple:
matplotlib.OBJECT() = matplotlib.pyplot.FUNCTION()
Let's go back to our example, in which we were discarding the return values of the pyplot functions.
In [20]:
import matplotlib.pyplot as plt
%matplotlib inline
# call to 'plot' returns a tuple of Line2D objects
line, = plt.plot([1,2,3,4])
# call to 'ylabel' returns a Text object
text = plt.ylabel('some numbers')
line
is an object representing the line on the chart and all of its properties. Because it's an object, we can use its methods to modify the line.
In [21]:
# Here's an example of a setter method
line, = plt.plot([1,2,3,4])
line.set_color('r')
In [22]:
# and here's the pyplot way of doing the same thing:
_ = plt.plot([1,2,3,4],'r')
In [23]:
# an example of some getter methods
line, = plt.plot([1,2,3,4])
axes = line.axes
y_axis = axes.get_yaxis()
# And another setter (and getter) call
labels = y_axis.set_ticklabels([1,'',2,'','three','',4])
In [24]:
# and the pyplot version
import numpy as np
plt.plot([1,2,3,4])
_ = plt.yticks(np.arange(1,4.5,0.5),(1,'',2,'','three','',4))
In [27]:
# now for something really crazy
import matplotlib.ticker as ticker
# plot and get the y-axis
line, = plt.plot([1,2,3,4])
axes = line.axes
y_axis = axes.get_yaxis()
# I want major tick marks at integer y-values (meh, just use pyplot)
_ = plt.yticks(range(1,5))
# I want a single, un-labeled, minor tick at y = 2.75
y_axis.set_minor_locator( ticker.FixedLocator([2.75]))
# I want proportional precision in my y-labels
def func(x,pos):
"""
return a string representation of `x` with floating point precision `pos`
"""
return_str = '{0:.' + str(pos) + 'f}'
return return_str.format(x)
tick_formatter = ticker.FuncFormatter( func )
labels = y_axis.set_major_formatter(tick_formatter)
What do we observe?
We've seen how pyplot functions interact with maplotlib API by returning objects. Now we'll briefly note the 3 layers of classes that make the API work.
Inherits from matplotlib.backend_bases.FigureCanvas
Inherits from matplotlib.backend_bases.Renderer
Inherits from matplotlib.artist.Artist
Artist
objects handle all the high level constructs like representing and laying out the figure, text, and lines. Almost all objects interacted with are Artists, including container-like objects such as Figure
, Axes
, Axis
, and graphical primitives such as Rectangle
, Line2D
, and Text
.
Axes
objects are one of the most important API components, because they are the containers that hold and reference most other objects. The Axes
class provides helpful interface methods like plot
and hist
which create primitive Artist
instances, like Line2D
, from input numpy arrays and strings.
In [28]:
import matplotlib.pyplot as plt
# use pyplot to get a figure
fig = plt.figure()
# create an array of Axes objects and get the specified one
ax = fig.add_subplot(2,1,1) # two rows, one column, first plot
Note above that there is a Subplot
class, and that inherits from Axes
.
A Figure
can place an Axes
at an arbitrary location with the add_axes
method.
In [29]:
fig2 = plt.figure()
ax2 = fig2.add_axes([0.15, 0.1, 0.7, 0.3])
Let's make some dummy data.
In [30]:
# have to recreate the figure and subplots because inline plotting is opaque.
fig = plt.figure()
ax = fig.add_subplot(2,1,1)
t = np.arange(0.0, 1.0, 0.01)
s = np.sin(2*np.pi*t)
line, = ax.plot(t, s, color='blue', lw=2)
Now, let's compare the 'line' object, a Line2D
instance, to the graphical primitives that the Axes
knows about.
In [31]:
ax.lines[0]
Out[31]:
In [32]:
line
Out[32]:
As expected (hoped for?), they are the same object.
In addition to modules that define common Artists, the API contains a bunch of modules that provide helper classes around a particular topic. You saw an example earlier when we imported matplotlib.ticker
and used its Locator and Formatter classes.
Other examples are (http://matplotlib.org/api/index.html):
Most of what is implemented in these modules are additional Artists, which can delegate their rendering to the same classes used by other Artists.
Let's dig into an example that uses the colors
module. There are patterns of use here that are typical across the API. See documentation and examples at http://matplotlib.org/examples/color/colormaps_reference.html.
Goal: make a dynamic set of curves with different colors, and label the curves by their RGB values.
Why? Suppose you have an unknown number of curves that will display on a dashboard. To distinguish them, you'll need to dynamically assign distinguishable colors.
In [33]:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
%matplotlib inline
# some random data:
NCURVES = 6
np.random.seed(101)
# each curve consists of 20 points with random values in [0,1]
curves = [np.random.random(20) for i in range(NCURVES)]
# make one Subplot/Axes object in the figure,
# and make the figure a bit bigger
fig = plt.figure(figsize=(14,8))
ax = fig.add_subplot(111)
# use pyplot to get a color map
jet = plt.get_cmap('jet')
# make an object that represent a normalization onto a range
color_normalizer = colors.Normalize(vmin=0, vmax=len(curves)-1)
# an object to map scalar data in the range specified by the normalizer
# to a color on the specified color map
scalar_map = cmx.ScalarMappable(norm=color_normalizer,cmap=jet)
for idx in range(len(curves)):
# get the np.array
line = curves[idx]
# map it to a color value
color_val = scalar_map.to_rgba(idx)
# make the legend text
color_text = (
'color {0}: ({1:4.2f},{2:4.2f},{3:4.2f})'.format(idx,color_val[0],color_val[1],color_val[2])
)
# use Axes.plot to plot it
_ = ax.plot(line,
color=color_val,
label=color_text)
# legend stuff
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
#plt.show()
plt.tight_layout()
Artist
objects include graphical primitives (lines, text), container objects (axes), and helper classes defined in auxiliary modules (like DateFormatter
in matplotlib.dates
).