In the last lesson of the term, we'll be looking at how to apply what we learnt last week with data manipulation with Numpy and Pandas to Data Visualisation with:
* Matplotlib
* Seaborn
In [ ]:
# Run this if you are using a Mac machine or have multiple versions of Python installed
!pip3 install numpy==1.11.1 pandas==0.19.1 matplotlib==1.5.3 seaborn==0.7.1 --upgrade
In [ ]:
# Run this if you are using a Windows machine
!pip install numpy==1.11.1 pandas==0.19.0 matplotlib==1.5.3 seaborn==0.7.1 --upgrade
In [ ]:
# Everyone run this block
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Matplotlib is the standard package for 2D plot and graph generation in Python. Much of the interface is based on the interface for MATLAB, a powerful scripting language for scientific computing that I use everyday.
matplotlib
library, shortened to mpl
, provides tools to create highly customisable figures and plots.pyplot
, or plt
, library is the interface section of mpl
which provides the easy to use interactions with the rich mpl
tools.Here is a link to some official beginner tutorials from the Matplotlib project if you want to try something more complex than what we will do today.
In [ ]:
# Create a linearly spaced 1x256 vector between -pi to pi
X = np.linspace(-np.pi,np.pi,256,endpoint=True)
# Create sine and cosine wave vectors from the X vector
C,S = np.cos(X),np.sin(X)
In [ ]:
# Create a figure object. This is an abstract frame to plot things in
plt.figure()
# Plot the Cosine against X, and Sine against X
plt.plot(X,C)
plt.plot(X,S)
# Force the display of the figure
plt.show()
In [ ]:
X = np.linspace(-np.pi,np.pi,256,endpoint=True)
C,S = np.cos(X),np.sin(X)
# Create a figure object. This is an abstract frame to plot things in
plt.figure(figsize=(10,8),dpi=80)
# Plot the Cosine against X, and Sine against X
plt.plot(X,C,color='m',linewidth=2.3,linestyle=':',label='cos')
plt.plot(X,S,color='c',linewidth=5,linestyle='--',label='sin')
# Set the x and y axis limits
plt.xlim(X.min()*1.1,X.max()*1.1)
plt.ylim(C.min()*1.1,C.max()*1.1)
# Add figure labels
plt.title('Sine and Cosine waves from -$\pi$ to $\pi$',fontsize=20)
plt.xlabel('X axis')
plt.ylabel('Y axis')
# Set the x and y ticks
plt.xticks([-np.pi,-np.pi/2,0,np.pi/2,np.pi],
['$-\pi$','$-\pi/2$','$0$','$+\pi/2$','$+\pi$'],
fontsize=20)
plt.yticks([-1,0,1],fontsize=20)
# Add a legend
plt.legend(loc='upper left',fontsize=20)
# Force the display of the figure
plt.show()
tan(x)
function to the figure we just created.Things to consider:
In [ ]:
n = 1024
X = np.random.normal(0, 1, n)
Y = np.random.normal(0, 1, n)
T = np.arctan2(Y, X)
colors = mpl.cm.rainbow(np.linspace(0, 1,n)) # replace the rainbow arg with T
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.scatter(X, Y, s=50, c=colors, alpha=0.5)
plt.xlim(-1.5, 1.5)
plt.xticks(())
plt.ylim(-1.5, 1.5)
plt.yticks(())
plt.show()
In [ ]:
# Define a normal distribution
mu, sigma = 0,1
samples = np.random.normal(mu,sigma,100000000)
count,bins,ignored = plt.hist(samples,30,normed=True)
plt.plot(bins,1/(sigma*np.sqrt(2*np.pi)) *
np.exp(-(bins-mu)**2 / (2*sigma**2)),
linewidth=3,linestyle=':',color='r')
plt.show()
In [ ]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-4, 4, 0.25)
Y = np.arange(-4, 4, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X ** 2 + Y ** 2)
Z = np.sin(R)
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=plt.cm.rainbow)
ax.contourf(X, Y, Z, zdir='z', offset=-2, cmap=plt.cm.rainbow)
ax.set_zlim(-2, 2)
plt.show()
Seaborn is an alternative data visualisation library which has more of a focus on statistical graphs. Seaborn works really well with Pandas and if you create a Pandas DataFrame you can easily generate a Seaborn plot.
In [ ]:
sns.set() # Set up fig with default stylings
# Everybody help me fill in this dictionary!
birthdays = {
'jan':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'feb':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'mar':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'apr':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'may':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'jun':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'jul':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'aug':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'sep':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'oct':{18:0,19:0,20:0,21:0,22:1,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'nov':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0},
'dec':{18:0,19:0,20:0,21:0,22:0,23:0,24:0,25:0,26:0,27:0,28:0,29:0,30:0}
}
df = pd.DataFrame.from_dict(birthdays)
df
In [ ]:
sns.heatmap(df)
In [ ]:
sns.heatmap(df,annot=True,fmt='d',linewidths=.5)
The Seaborn Tutorial Page has a lot of resources for helping you get started with this package
In [ ]:
# TO DO
sns.set()
# END TO DO