Imports


In [ ]:
# Panda will be usefull for quick data parsing
import pandas as pd
import numpy as np

# Small trick to get a larger display
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

Pyplot is the Matplotlib plotting backend and the inline magic to see the graph directly in the notebook


In [ ]:
import matplotlib.pyplot as pl
%matplotlib inline

Or you can use pylab, which simplifies all the calling to matplotlib and numpy a little


In [ ]:
import pylab as pl
%pylab inline

We can define a default size for all plots that will be generated by matplotlib


In [ ]:
pylab.rcParams['figure.figsize'] = (20,7)

Introduction to plotting with matplotlib

  • 2D plotting library which produces high quality figures
  • Full integration in jupyter
  • Can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, ... with just a few lines of code
  • For the power user, you have full control of line styles, font properties, axes properties, ...

Before we start

  • And many different marker types for plot points (keyword: marker)

  • pyplot also provides stylesheet to yield high quality rendering effortlessly

In Jupyter, we can change the default parameter with pl.rcparam


In [ ]:
pl.rcParams['figure.figsize'] = 20, 7
pl.rcParams['font.family'] = 'sans-serif'
pl.rcParams['font.sans-serif'] = ['DejaVu Sans']

the stylesheet can also be defined by default


In [ ]:
pl.style.available

Let's use ggplot style (R style) for this notebook


In [ ]:
pl.style.use('ggplot')

Line plot

Plot lines and/or markers to the Axes

Requires 2 lists of coordinates for the x and the y axis (OR only 1 list for the Y axis and X will be automatically created)


In [ ]:
# Create random datasets with numpy random module
x = np.arange(50)
y = np.random.rand(50)

#Plot y using default line style and color x is automatically inferred
pl.plot(y)

# Plot x and y without line and purple diamon markers
pl.plot(x, y+1, marker ='d', linewidth=0, color="purple")

# Plot x and y using dotted line and 
pl.plot(x, y+2, color = 'dodgerblue', linestyle='--')

# Plot x and y using blue circle markers
pl.plot(x, y+3, color='green', linewidth=2, marker='>', linestyle="-.")

# Plot x and y using blue circle markers
pl.plot(x, y+4, color='green', linewidth=4, marker='o', linestyle="-")

Scatter plot

Make a scatter plot of x vs y, where x and y are sequence-like objects of the same length.

Requires 2 lists of coordinates for the x and the y axis


In [ ]:
pl.scatter (np.random.randn(200),np.random.randn(200), color="coral")
pl.scatter (np.random.randn(100)+2,np.random.randn(100)+3, color="lightgreen")
pl.scatter (np.random.randn(100)-2,np.random.randn(100)*4, color="dodgerblue")

Bar plot

Make a bar plot with rectangles

Required a list of coordinates for the left side of the bars, a list of height, and the width of the bars

Now plot the data as a bar plot


In [ ]:
# Create random datasets with numpy random module
x = np.arange(10)

# If the x coordinates are similar the bar are merged at the same position
h1 = np.random.rand(10)
pl.bar(left=x, height=h1, width=0.2, color="dodgerblue")

# To create a stacked graph, the bottom position of the series need to correspond to the previous series
h2 = np.random.rand(10)
pl.bar(left=x, height=h2, bottom= h1, width=0.2, color="lightblue")

# Offset the x coordinate to add a new series and customize color and aspect
h3 = np.random.rand(10)
pl.bar(left=x+0.2, height=h3, width=0.2, color ='salmon', linewidth=2, edgecolor="red")

# Add yerr bars
h4 = np.random.rand(10)
pl.bar(left=x+0.4, height=h4, width=0.2, color ='green', yerr=np.random.randn(10)/10, ecolor="black")

Histogram

Compute and draw the histogram of x

Requires a list of values and a number of bins to split the data into

possible types of histogram to draw (histtype):

  • bar : a traditional bar-type histogram. If multiple data are given the bars are aranged side by side.
  • barstacked : a bar-type histogram where multiple data are stacked on top of each other.
  • step : a lineplot that is by default unfilled.
  • stepfilled : a lineplot that is by default filled.

The return value is a tuple containing the following:

  • n = The values of the histogram bins after eventual normalisation
  • bins = The edges of the bins
  • patches = List of individual patches used to create the histogram

In [ ]:
# Generate a list of 2* 1000 values following a normal distibution

n, bins, patches = pl.hist(x=x, bins=30, histtype='bar')
print (n)
print (bins)

In [ ]:
# Generate a list of 2* 1000 values following a normal distibution
# Contrary to the first plot, this time, series are stacked

x = np.random.randn(1000, 2)
n, bins, patches = pl.hist(x=x, bins=30, histtype='barstacked')

In [ ]:
# Generate a list of 1000 values following a normal distibution
# The plot is cummulative and step style

x = np.random.randn(1000)
n, bins, patches = pl.hist(x=x, bins=30, histtype='step', cumulative=True)

In [ ]:
# Generate a list of 2* 1000 values following a normal distibution
# The plot is rotated to horizontal orientation and represented in stepfilled style

x = np.random.randn(1000)
n, bins, patches = pl.hist(x=x, bins=30, histtype='stepfilled', orientation="horizontal")

Customize the plotting area

The plotting area can be customized easily as shown below


In [ ]:
# Size of the ploting area
pl.figure(figsize=(15,10))

# Customize X and Y limits
pl.xlim(-1,10)
pl.ylim(-0.5,1.5)

# Add X label, y label and a title
pl.xlabel("this is my x label", fontsize=15)
pl.ylabel("this is my Y label", fontsize=15)
pl.title("this is my title", fontsize=20)

# Add a grid
pl.grid(True, color="grey", linewidth=0.5, linestyle="--")

# finally plot the graphs
pl.plot(np.arange(10), np.random.rand(10), color="coral", marker=">", label = "series1")
pl.plot(np.arange(10), np.random.rand(10), color="dodgerblue", marker="<", label = "series2")

#Add the legend outside of the plotting area
pl.legend(bbox_to_anchor=(1, 1), loc=2, frameon=False, fontsize=15)

The figure area can also be divided to plot several graphs side by side with the subplot command


In [ ]:
pl.figure()

# First plot in the left half
pl.subplot(121)
pl.plot(np.arange(10), np.random.rand(10), label="1")
pl.plot(np.arange(10), np.random.rand(10), label="2")
pl.title("Series1")
pl.legend()

# First plot in the right half
pl.subplot(122)
pl.plot(np.arange(10), np.random.rand(10), label="3")
pl.plot(np.arange(10), np.random.rand(10), label="4")
pl.title("Series2")
pl.legend()

In [ ]:
pl.figure(figsize=(15,15))

# First plot in the top left corner
pl.subplot(221)
pl.plot(np.arange(10), np.random.rand(10))

# First plot in the top right corner
#pl.subplot(222)
#pl.plot(np.arange(10), np.random.rand(10))

# First plot in the bottom left corner
plt.subplot(223)
pl.plot(np.arange(10), np.random.rand(10))

# First plot in the bottom right corner
plt.subplot(224)
pl.plot(np.arange(10), np.random.rand(10))

Further reading on Matplotlib

Python plotting beyond Matplotlib

  • Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • Plotly is a collaborative browser-based plotting and analytics platform. You can generate graphs and analyze data from the in-browser
  • ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making profressional looking, plots quickly with minimal code.