Matplotlib: Introduction
DS Data manipulation, analysis and visualisation in Python
December, 2019© 2016, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons
Matplotlib is a Python package used widely throughout the scientific Python community to produce high quality 2D publication graphics. It transparently supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG and has interfaces for all of the major desktop GUI (graphical user interface) toolkits. It is a great package with lots of options.
However, matplotlib is...
The 800-pound gorilla — and like most 800-pound gorillas, this one should probably be avoided unless you genuinely need its power, e.g., to make a custom plot or produce a publication-ready graphic.
(As we’ll see, when it comes to statistical visualization, the preferred tack might be: “do as much as you easily can in your convenience layer of choice [nvdr e.g. directly from Pandas, or with seaborn], and then use matplotlib for the rest.”)
(quote used from this blogpost)
And that's we mostly did, just use the .plot
function of Pandas. So, why do we learn matplotlib? Well, for the ...then use matplotlib for the rest.; at some point, somehow!
Matplotlib comes with a convenience sub-package called pyplot
which, for consistency with the wider matplotlib community, should always be imported as plt
:
In [1]:
import numpy as np
import matplotlib.pyplot as plt
Figure
, axes
and axis
At the heart of every plot is the figure object. The "Figure" object is the top level concept which can be drawn to one of the many output formats, or simply just to screen. Any object which can be drawn in this way is known as an "Artist" in matplotlib.
Lets create our first artist using pyplot, and then show it:
In [2]:
fig = plt.figure()
plt.show()
On its own, drawing the figure artist is uninteresting and will result in an empty piece of paper (that's why we didn't see anything above).
By far the most useful artist in matplotlib is the Axes artist. The Axes artist represents the "data space" of a typical plot, a rectangular axes (the most common, but not always the case, e.g. polar plots) will have 2 (confusingly named) Axis artists with tick labels and tick marks.
There is no limit on the number of Axes artists which can exist on a Figure artist. Let's go ahead and create a figure with a single Axes artist, and show it using pyplot:
In [3]:
ax = plt.axes()
Matplotlib's pyplot
module makes the process of creating graphics easier by allowing us to skip some of the tedious Artist construction. For example, we did not need to manually create the Figure artist with plt.figure
because it was implicit that we needed a figure when we created the Axes artist.
Under the hood matplotlib still had to create a Figure artist, its just we didn't need to capture it into a variable. We can access the created object with the "state" functions found in pyplot called gcf
and gca
.
Some example data:
In [4]:
x = np.linspace(0, 5, 10)
y = x ** 2
Observe the following difference:
1. pyplot style: plt... (you will see this a lot for code online!)
In [5]:
plt.plot(x, y, '-')
Out[5]:
2. creating objects
In [6]:
fig, ax = plt.subplots()
ax.plot(x, y, '-')
Out[6]:
Although a little bit more code is involved, the advantage is that we now have full control of where the plot axes are placed, and we can easily add more than one axis to the figure:
In [7]:
fig, ax1 = plt.subplots()
ax1.plot(x, y, '-')
ax1.set_ylabel('y')
ax2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes
ax2.set_xlabel('x')
ax2.plot(x, y*2, 'r-')
Out[7]:
fig, ax = plt.subplots()
In [8]:
fig, ax = plt.subplots()
ax.plot(x, y, '-')
# ...
Out[8]:
In [9]:
x = np.linspace(-1, 0, 100)
fig, ax = plt.subplots(figsize=(10, 7))
# Adjust the created axes so that its topmost extent is 0.8 of the figure.
fig.subplots_adjust(top=0.9)
ax.plot(x, x**2, color='0.4', label='power 2')
ax.plot(x, x**3, color='0.8', linestyle='--', label='power 3')
ax.vlines(x=-0.75, ymin=0., ymax=0.8, color='0.4', linestyle='-.')
ax.axhline(y=0.1, color='0.4', linestyle='-.')
ax.fill_between(x=[-1, 1.1], y1=[0.65], y2=[0.75], color='0.85')
fig.suptitle('Figure title', fontsize=18,
fontweight='bold')
ax.set_title('Axes title', fontsize=16)
ax.set_xlabel('The X axis')
ax.set_ylabel('The Y axis $y=f(x)$', fontsize=16)
ax.set_xlim(-1.0, 1.1)
ax.set_ylim(-0.1, 1.)
ax.text(0.5, 0.2, 'Text centered at (0.5, 0.2)\nin data coordinates.',
horizontalalignment='center', fontsize=14)
ax.text(0.5, 0.5, 'Text centered at (0.5, 0.5)\nin Figure coordinates.',
horizontalalignment='center', fontsize=14,
transform=ax.transAxes, color='grey')
ax.legend(loc='upper right', frameon=True, ncol=2, fontsize=14)
Out[9]:
For more information on legend positioning, check this post on stackoverflow!
...understandable
Matplotlib had a bad reputation in terms of its default styling as figures created with earlier versions of Matplotlib were very Matlab-lookalike and mostly not really catchy.
Since Matplotlib 2.0, this has changed: https://matplotlib.org/users/dflt_style_changes.html!
However...
Des goûts et des couleurs, on ne discute pas...
(check this link if you're not french-speaking)
To account different tastes, Matplotlib provides a number of styles that can be used to quickly change a number of settings:
In [10]:
plt.style.available
Out[10]:
In [11]:
x = np.linspace(0, 10)
with plt.style.context('seaborn'): # 'seaborn', ggplot', 'bmh', 'grayscale', 'seaborn-whitegrid', 'seaborn-muted'
fig, ax = plt.subplots()
ax.plot(x, np.sin(x) + x + np.random.randn(50))
ax.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50))
ax.plot(x, np.sin(x) + 2 * x + np.random.randn(50))
We should not start discussing about colors and styles, just pick your favorite style!
In [12]:
plt.style.use('seaborn-whitegrid')
or go all the way and define your own custom style, see the official documentation or this tutorial.
plt.style.use('...')
)What we have been doing while plotting with Pandas:
In [13]:
import pandas as pd
In [14]:
flowdata = pd.read_csv('../data/vmm_flowdata.csv',
index_col='Time',
parse_dates=True)
In [15]:
flowdata.plot()
Out[15]:
In [16]:
flowdata.plot(figsize=(16, 6)) # shift tab this!
Out[16]:
Making this with matplotlib...
In [17]:
fig, ax = plt.subplots(figsize=(16, 6))
ax.plot(flowdata)
ax.legend(["L06_347", "LS06_347", "LS06_348"])
Out[17]:
is still ok!
In [18]:
axs = flowdata.plot(subplots=True, sharex=True,
figsize=(16, 8), colormap='viridis', # Dark2
fontsize=15, rot=0)
Mimicking this in matplotlib (just as a reference):
In [19]:
from matplotlib import cm
import matplotlib.dates as mdates
colors = [cm.viridis(x) for x in np.linspace(0.0, 1.0, len(flowdata.columns))] # list comprehension to set up the colors
fig, axs = plt.subplots(3, 1, figsize=(16, 8))
for ax, col, station in zip(axs, colors, flowdata.columns):
ax.plot(flowdata.index, flowdata[station], label=station, color=col)
ax.legend()
if not ax.is_last_row():
ax.xaxis.set_ticklabels([])
ax.xaxis.set_major_locator(mdates.YearLocator())
else:
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
ax.set_xlabel('Time')
ax.tick_params(labelsize=15)
Is already a bit harder ;-)
In [20]:
fig, ax = plt.subplots() #prepare a matplotlib figure
flowdata.plot(ax=ax) # use pandas for the plotting
Out[20]:
In [21]:
fig, ax = plt.subplots(figsize=(15, 5)) #prepare a matplotlib figure
flowdata.plot(ax=ax) # use pandas for the plotting
# Provide further adaptations with matplotlib:
ax.set_xlabel("")
ax.grid(which="major", linewidth='0.5', color='0.8')
fig.suptitle('Flow station time series', fontsize=15)
Out[21]:
In [22]:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16, 6)) #provide with matplotlib 2 axis
flowdata[["L06_347", "LS06_347"]].plot(ax=ax1) # plot the two timeseries of the same location on the first plot
flowdata["LS06_348"].plot(ax=ax2, color='0.2') # plot the other station on the second plot
# further adapt with matplotlib
ax1.set_ylabel("L06_347")
ax2.set_ylabel("LS06_348")
ax2.legend()
Out[22]:
An example of such a reusable function to plot data:
In [23]:
%%file plotter.py
#this writes a file in your directory, check it(!)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import cm
from matplotlib.ticker import MaxNLocator
def vmm_station_plotter(flowdata, label="flow (m$^3$s$^{-1}$)"):
colors = [cm.viridis(x) for x in np.linspace(0.0, 1.0, len(flowdata.columns))] # list comprehension to set up the color sequence
fig, axs = plt.subplots(3, 1, figsize=(16, 8))
for ax, col, station in zip(axs, colors, flowdata.columns):
ax.plot(flowdata.index, flowdata[station], label=station, color=col) # this plots the data itself
ax.legend(fontsize=15)
ax.set_ylabel(label, size=15)
ax.yaxis.set_major_locator(MaxNLocator(4)) # smaller set of y-ticks for clarity
if not ax.is_last_row(): # hide the xticklabels from the none-lower row x-axis
ax.xaxis.set_ticklabels([])
ax.xaxis.set_major_locator(mdates.YearLocator())
else: # yearly xticklabels from the lower x-axis in the subplots
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
ax.tick_params(axis='both', labelsize=15, pad=8) # enlarge the ticklabels and increase distance to axis (otherwise overlap)
return fig, axs
In [24]:
from plotter import vmm_station_plotter
# fig, axs = vmm_station_plotter(flowdata)
In [25]:
fig, axs = vmm_station_plotter(flowdata,
label="NO$_3$ (mg/l)")
fig.suptitle('Ammonium concentrations in the Maarkebeek', fontsize='17')
fig.savefig('ammonium_concentration.pdf')
For more in-depth material: