In the last lesson of the term, we'll be looking at how to apply what we learnt last week with data manipulation with Numpy and Pandas to Data Visualisation with:
* Matplotlib
* Seaborn
In [ ]:
# Run this if you are using a Mac machine or have multiple versions of Python installed
!pip3 install numpy==1.11.1 pandas==0.19.1 matplotlib==1.5.3 seaborn==0.7.1 --upgrade
In [ ]:
# Run this if you are using a Windows machine
!pip install numpy==1.11.1 pandas==0.19.0 matplotlib==1.5.3 seaborn==0.7.1 --upgrade
In [ ]:
# Everyone run this block
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Matplotlib is the standard package for 2D plot and graph generation in Python. Much of the interface is based on the interface for MATLAB, a powerful scripting language for scientific computing that I use everyday.
matplotlib library, shortened to mpl, provides tools to create highly customisable figures and plots.pyplot, or plt, library is the interface section of mpl which provides the easy to use interactions with the rich mpl tools.Here is a link to some official beginner tutorials from the Matplotlib project if you want to try something more complex than what we will do today.
In [ ]:
# Create a linearly spaced 1x256 vector between -pi to pi
X = np.linspace(-np.pi,np.pi,256,endpoint=True)
# Create sine and cosine wave vectors from the X vector
C,S = np.cos(X),np.sin(X)
In [ ]:
# Create a figure object. This is an abstract frame to plot things in
plt.figure()
# Plot the Cosine against X, and Sine against X
plt.plot(X,C)
plt.plot(X,S)
# Force the display of the figure
plt.show()
In [ ]:
X = np.linspace(-np.pi,np.pi,256,endpoint=True)
C,S = np.cos(X),np.sin(X)
# Create a figure object. This is an abstract frame to plot things in
plt.figure(figsize=(10,8),dpi=80)
# Plot the Cosine against X, and Sine against X
plt.plot(X,C,color='m',linewidth=2.3,linestyle=':',label='cos')
plt.plot(X,S,color='c',linewidth=5,linestyle='--',label='sin')
# Set the x and y axis limits
plt.xlim(X.min()*1.1,X.max()*1.1)
plt.ylim(C.min()*1.1,C.max()*1.1)
# Add figure labels
plt.title('Sine and Cosine waves from -$\pi$ to $\pi$',fontsize=20)
plt.xlabel('X axis')
plt.ylabel('Y axis')
# Set the x and y ticks
plt.xticks([-np.pi,-np.pi/2,0,np.pi/2,np.pi],
['$-\pi$','$-\pi/2$','$0$','$+\pi/2$','$+\pi$'],
fontsize=20)
plt.yticks([-1,0,1],fontsize=20)
# Add a legend
plt.legend(loc='upper left',fontsize=20)
# Force the display of the figure
plt.show()
tan(x) function to the figure we just created.Things to consider:
In [ ]:
n = 1024
X = np.random.normal(0, 1, n)
Y = np.random.normal(0, 1, n)
T = np.arctan2(Y, X)
colors = mpl.cm.rainbow(np.linspace(0, 1,n)) # replace the rainbow arg with T
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.scatter(X, Y, s=50, c=colors, alpha=0.5)
plt.xlim(-1.5, 1.5)
plt.xticks(())
plt.ylim(-1.5, 1.5)
plt.yticks(())
plt.show()
In [ ]:
# Define a normal distribution
mu, sigma = 0,1
samples = np.random.normal(mu,sigma,100000000)
count,bins,ignored = plt.hist(samples,30,normed=True)
plt.plot(bins,1/(sigma*np.sqrt(2*np.pi)) *
np.exp(-(bins-mu)**2 / (2*sigma**2)),
linewidth=3,linestyle=':',color='r')
plt.show()
In [ ]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-4, 4, 0.25)
Y = np.arange(-4, 4, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X ** 2 + Y ** 2)
Z = np.sin(R)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=plt.cm.rainbow)
ax.contourf(X, Y, Z, zdir='z', offset=-2, cmap=plt.cm.rainbow)
ax.set_zlim(-2, 2)
plt.show()
Seaborn is an alternative data visualisation library which has more of a focus on statistical graphs. Seaborn works really well with Pandas and if you create a Pandas DataFrame you can easily generate a Seaborn plot.
In [ ]:
sns.set() # Set up fig with default stylings
# Everybody help me fill in this dictionary!
birthdays = {
'jan':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'feb':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'mar':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'apr':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'may':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'jun':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'jul':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'aug':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'sep':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'oct':{18:0,19:0,20:0,21:0,22:1,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'nov':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'dec':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0}
}
df = pd.DataFrame.from_dict(birthdays)
df
In [ ]:
sns.heatmap(df)
In [ ]:
sns.heatmap(df,annot=True,fmt='d',linewidths=.5)
The Seaborn Tutorial Page has a lot of resources for helping you get started with this package
In [ ]:
# TO DO
sns.set()
# END TO DO