Matplotlib: Introduction

DS Data manipulation, analysis and visualisation in Python
December, 2019

© 2016, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons

Matplotlib

Matplotlib is a Python package used widely throughout the scientific Python community to produce high quality 2D publication graphics. It transparently supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG and has interfaces for all of the major desktop GUI (graphical user interface) toolkits. It is a great package with lots of options.

However, matplotlib is...

The 800-pound gorilla — and like most 800-pound gorillas, this one should probably be avoided unless you genuinely need its power, e.g., to make a custom plot or produce a publication-ready graphic.

(As we’ll see, when it comes to statistical visualization, the preferred tack might be: “do as much as you easily can in your convenience layer of choice [nvdr e.g. directly from Pandas, or with seaborn], and then use matplotlib for the rest.”)

(quote used from this blogpost)

And that's we mostly did, just use the .plot function of Pandas. So, why do we learn matplotlib? Well, for the ...then use matplotlib for the rest.; at some point, somehow!

Matplotlib comes with a convenience sub-package called pyplot which, for consistency with the wider matplotlib community, should always be imported as plt:



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt

- dry stuff - The matplotlib `Figure`, `axes` and `axis`

At the heart of every plot is the figure object. The "Figure" object is the top level concept which can be drawn to one of the many output formats, or simply just to screen. Any object which can be drawn in this way is known as an "Artist" in matplotlib.

Lets create our first artist using pyplot, and then show it:



In [2]:

    
fig = plt.figure()
plt.show()









    





<Figure size 432x288 with 0 Axes>

On its own, drawing the figure artist is uninteresting and will result in an empty piece of paper (that's why we didn't see anything above).

By far the most useful artist in matplotlib is the Axes artist. The Axes artist represents the "data space" of a typical plot, a rectangular axes (the most common, but not always the case, e.g. polar plots) will have 2 (confusingly named) Axis artists with tick labels and tick marks.

There is no limit on the number of Axes artists which can exist on a Figure artist. Let's go ahead and create a figure with a single Axes artist, and show it using pyplot:



In [3]:

    
ax = plt.axes()

Matplotlib's pyplot module makes the process of creating graphics easier by allowing us to skip some of the tedious Artist construction. For example, we did not need to manually create the Figure artist with plt.figure because it was implicit that we needed a figure when we created the Axes artist.

Under the hood matplotlib still had to create a Figure artist, its just we didn't need to capture it into a variable. We can access the created object with the "state" functions found in pyplot called gcf and gca.

- essential stuff - `pyplot` versus Object based

Some example data:



In [4]:

    
x = np.linspace(0, 5, 10)
y = x ** 2

Observe the following difference:

1. pyplot style: plt... (you will see this a lot for code online!)



In [5]:

    
plt.plot(x, y, '-')









    Out[5]:





[<matplotlib.lines.Line2D at 0x7fbf1aa11090>]

2. creating objects



In [6]:

    
fig, ax = plt.subplots()
ax.plot(x, y, '-')









    Out[6]:





[<matplotlib.lines.Line2D at 0x7fbf1a2599d0>]

Although a little bit more code is involved, the advantage is that we now have full control of where the plot axes are placed, and we can easily add more than one axis to the figure:



In [7]:

    
fig, ax1 = plt.subplots()
ax1.plot(x, y, '-')
ax1.set_ylabel('y')

ax2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes
ax2.set_xlabel('x')
ax2.plot(x, y*2, 'r-')









    Out[7]:





[<matplotlib.lines.Line2D at 0x7fbf1a146b90>]

REMEMBER:

Use the object oriented power of Matplotlib!
Get yourself used to writing fig, ax = plt.subplots()



In [8]:

    
fig, ax = plt.subplots()
ax.plot(x, y, '-')
# ...









    Out[8]:





[<matplotlib.lines.Line2D at 0x7fbf1a264050>]

An small cheat-sheet reference for some common elements



In [9]:

    
x = np.linspace(-1, 0, 100)

fig, ax  = plt.subplots(figsize=(10, 7))

# Adjust the created axes so that its topmost extent is 0.8 of the figure.
fig.subplots_adjust(top=0.9)

ax.plot(x, x**2, color='0.4', label='power 2')
ax.plot(x, x**3, color='0.8', linestyle='--', label='power 3')

ax.vlines(x=-0.75, ymin=0., ymax=0.8, color='0.4', linestyle='-.') 
ax.axhline(y=0.1, color='0.4', linestyle='-.')
ax.fill_between(x=[-1, 1.1], y1=[0.65], y2=[0.75], color='0.85')

fig.suptitle('Figure title', fontsize=18, 
             fontweight='bold')
ax.set_title('Axes title', fontsize=16)

ax.set_xlabel('The X axis')
ax.set_ylabel('The Y axis $y=f(x)$', fontsize=16)

ax.set_xlim(-1.0, 1.1)
ax.set_ylim(-0.1, 1.)

ax.text(0.5, 0.2, 'Text centered at (0.5, 0.2)\nin data coordinates.',
        horizontalalignment='center', fontsize=14)

ax.text(0.5, 0.5, 'Text centered at (0.5, 0.5)\nin Figure coordinates.',
        horizontalalignment='center', fontsize=14, 
        transform=ax.transAxes, color='grey')

ax.legend(loc='upper right', frameon=True, ncol=2, fontsize=14)









    Out[9]:





<matplotlib.legend.Legend at 0x7fbf1a061190>

For more information on legend positioning, check this post on stackoverflow!

I do not like the style...

...understandable

Matplotlib had a bad reputation in terms of its default styling as figures created with earlier versions of Matplotlib were very Matlab-lookalike and mostly not really catchy.

Since Matplotlib 2.0, this has changed: https://matplotlib.org/users/dflt_style_changes.html!

However...

Des goûts et des couleurs, on ne discute pas...

(check this link if you're not french-speaking)

To account different tastes, Matplotlib provides a number of styles that can be used to quickly change a number of settings:



In [10]:

    
plt.style.available









    Out[10]:





['seaborn-poster',
 'seaborn-dark',
 'grayscale',
 'dark_background',
 'seaborn-dark-palette',
 'classic',
 'fast',
 'seaborn-paper',
 'tableau-colorblind10',
 'seaborn-pastel',
 'seaborn-white',
 'seaborn-bright',
 'seaborn-notebook',
 'seaborn-whitegrid',
 'bmh',
 'fivethirtyeight',
 'seaborn-colorblind',
 'seaborn-talk',
 '_classic_test',
 'seaborn-deep',
 'Solarize_Light2',
 'ggplot',
 'seaborn-ticks',
 'seaborn-muted',
 'seaborn-darkgrid',
 'seaborn']



In [11]:

    
x = np.linspace(0, 10)

with plt.style.context('seaborn'):  # 'seaborn', ggplot', 'bmh', 'grayscale', 'seaborn-whitegrid', 'seaborn-muted'
    fig, ax = plt.subplots()
    ax.plot(x, np.sin(x) + x + np.random.randn(50))
    ax.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50))
    ax.plot(x, np.sin(x) + 2 * x + np.random.randn(50))

We should not start discussing about colors and styles, just pick your favorite style!



In [12]:

    
plt.style.use('seaborn-whitegrid')

or go all the way and define your own custom style, see the official documentation or this tutorial.

REMEMBER:

If you just want quickly a good-looking plot, use one of the available styles (plt.style.use('...'))
Otherwise, the object-oriented way of working makes it possible to change everything!

Interaction with Pandas

What we have been doing while plotting with Pandas:



In [13]:

    
import pandas as pd



In [14]:

    
flowdata = pd.read_csv('../data/vmm_flowdata.csv', 
                       index_col='Time', 
                       parse_dates=True)



In [15]:

    
flowdata.plot()









    Out[15]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf18144210>

Pandas versus matplotlib

Comparison 1: single plot



In [16]:

    
flowdata.plot(figsize=(16, 6)) # shift tab this!









    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf17bac6d0>

Making this with matplotlib...



In [17]:

    
fig, ax = plt.subplots(figsize=(16, 6))
ax.plot(flowdata)
ax.legend(["L06_347", "LS06_347", "LS06_348"])









    Out[17]:





<matplotlib.legend.Legend at 0x7fbf1783a890>

is still ok!

Comparison 2: with subplots



In [18]:

    
axs = flowdata.plot(subplots=True, sharex=True,
                    figsize=(16, 8), colormap='viridis', # Dark2
                    fontsize=15, rot=0)

Mimicking this in matplotlib (just as a reference):



In [19]:

    
from matplotlib import cm
import matplotlib.dates as mdates

colors = [cm.viridis(x) for x in np.linspace(0.0, 1.0, len(flowdata.columns))] # list comprehension to set up the colors

fig, axs = plt.subplots(3, 1, figsize=(16, 8))

for ax, col, station in zip(axs, colors, flowdata.columns):
    ax.plot(flowdata.index, flowdata[station], label=station, color=col)
    ax.legend()
    if not ax.is_last_row():
        ax.xaxis.set_ticklabels([])
        ax.xaxis.set_major_locator(mdates.YearLocator())
    else:
        ax.xaxis.set_major_locator(mdates.YearLocator())
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
        ax.set_xlabel('Time')
    ax.tick_params(labelsize=15)

Is already a bit harder ;-)

Best of both worlds...



In [20]:

    
fig, ax = plt.subplots() #prepare a matplotlib figure

flowdata.plot(ax=ax) # use pandas for the plotting









    Out[20]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fbf1781add0>



In [21]:

    
fig, ax = plt.subplots(figsize=(15, 5)) #prepare a matplotlib figure

flowdata.plot(ax=ax) # use pandas for the plotting

# Provide further adaptations with matplotlib:
ax.set_xlabel("")
ax.grid(which="major", linewidth='0.5', color='0.8')
fig.suptitle('Flow station time series', fontsize=15)









    Out[21]:





Text(0.5, 0.98, 'Flow station time series')



In [22]:

    
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16, 6)) #provide with matplotlib 2 axis

flowdata[["L06_347", "LS06_347"]].plot(ax=ax1) # plot the two timeseries of the same location on the first plot
flowdata["LS06_348"].plot(ax=ax2, color='0.2') # plot the other station on the second plot

# further adapt with matplotlib
ax1.set_ylabel("L06_347")
ax2.set_ylabel("LS06_348")
ax2.legend()









    Out[22]:





<matplotlib.legend.Legend at 0x7fbf16f73a50>

Remember:

You can do anything with matplotlib, but at a cost... stackoverflow
The preformatting of Pandas provides mostly enough flexibility for quick analysis and draft reporting. It is not for paper-proof figures or customization

If you take the time to make your perfect/spot-on/greatest-ever matplotlib-figure: Make it a reusable function!

An example of such a reusable function to plot data:



In [23]:

    
%%file plotter.py  
#this writes a file in your directory, check it(!)

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from matplotlib import cm
from matplotlib.ticker import MaxNLocator

def vmm_station_plotter(flowdata, label="flow (m$^3$s$^{-1}$)"):
    colors = [cm.viridis(x) for x in np.linspace(0.0, 1.0, len(flowdata.columns))] # list comprehension to set up the color sequence

    fig, axs = plt.subplots(3, 1, figsize=(16, 8))

    for ax, col, station in zip(axs, colors, flowdata.columns):
        ax.plot(flowdata.index, flowdata[station], label=station, color=col) # this plots the data itself
        
        ax.legend(fontsize=15)
        ax.set_ylabel(label, size=15)
        ax.yaxis.set_major_locator(MaxNLocator(4)) # smaller set of y-ticks for clarity
        
        if not ax.is_last_row():  # hide the xticklabels from the none-lower row x-axis
            ax.xaxis.set_ticklabels([])
            ax.xaxis.set_major_locator(mdates.YearLocator())
        else:                     # yearly xticklabels from the lower x-axis in the subplots
            ax.xaxis.set_major_locator(mdates.YearLocator())
            ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
        ax.tick_params(axis='both', labelsize=15, pad=8) # enlarge the ticklabels and increase distance to axis (otherwise overlap)
    return fig, axs









    



Writing plotter.py



In [24]:

    
from plotter import vmm_station_plotter
# fig, axs = vmm_station_plotter(flowdata)



In [25]:

    
fig, axs = vmm_station_plotter(flowdata, 
                               label="NO$_3$ (mg/l)")
fig.suptitle('Ammonium concentrations in the Maarkebeek', fontsize='17')
fig.savefig('ammonium_concentration.pdf')

NOTE:

Let your hard work pay off, write your own custom functions!

Remember: `fig.savefig()` to save your Figure object!

Need more matplotlib inspiration?

For more in-depth material:

http://www.labri.fr/perso/nrougier/teaching/matplotlib/
notebooks in matplotlib section: http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb#4.-Visualization-with-Matplotlib
main reference: matplotlib homepage

Remember(!)

matplotlib gallery is an important resource to start from