Visualisations are a very powerful way for humans to get inferences about data. It allows us to abstract huge amounts of information into easy digestible graphs.
Python has a wonderful tool called Matplotlib, which incidentally is inspired by Matlab's visualisation library. Let's begin with a few basic plots.
We will also start incorporating more and more data visualisations in the next two sections, so it's not restricted to just toy problems.
In [1]:
"""
We begin by using an inbuilt iPython Magic function to display plots
within the window.
"""
%matplotlib inline
import matplotlib.pyplot as plt
In [2]:
import matplotlib
print(matplotlib.__version__)
import matplotlib.pyplot as plt is python convention.
If you want, you can potentially write import matplotlib.pyplot as chuck_norris as below.
'as plt' is the accepted convention though, and helps you write code with speed.
| Colour Code | Colour |
|---|---|
| r | Red |
| b | Blue |
| g | Green |
| c | Cyan |
| m | Magenta |
| y | Yellow |
| k | Black |
| w | White |
| Linestyle Code | Displayed Line Style |
|---|---|
| – | Solid Line |
| — | Dashed Line |
| : | Dotted Line |
| -. | Dash-Dotted Line |
| None | No Connecting Lines |
| Marker Code | Marker Displayed |
|---|---|
| + | Plus Sign |
| . | Dot |
| o | Circle |
| ^ | Triangle |
| p | Pentagon |
| s | Square |
| x | X Character |
| D | Diamond |
| h | Hexagon |
| * | Asterisk |
British spellings often give errors like these:
AttributeError: Unknown property colour
To be on the safer side, use color, unless if you're using R packages written by Hadley Wickham.
In [3]:
%matplotlib inline
import matplotlib.pyplot as chuck_norris
In [4]:
y = [1,2,3,4,5,4,3,2,1]
x = [2,4,6,8,10,12,10,8,6]
chuck_norris.plot(x, y, marker='D', linestyle='-.', color='m')
chuck_norris.plot([1,2,3,4,5,4,3,2,1], marker='^', linestyle='-', color='r')
chuck_norris.ylabel('Numbers')
#chuck_norris.show()
Out[4]:
So as you see, the convention plt can save you from typing chuck_norris every single time.
Back to business though. Let's reimport matplotlib.
In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
# We have two lists, or more in mathematical terms, arrays, x and y
In [6]:
plt.plot(x, y)
Out[6]:
Let's break down what's happening.
In [7]:
# Import libraries
import matplotlib.pyplot as plt
%matplotlib inline
In [8]:
# Prepare the data
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
In [9]:
# Plot the data
plt.plot(x,y, label='Sales')
# Add a legend
plt.legend()
# Add more information
plt.xlabel('Adwords Spending (ZIM $)')
plt.ylabel('Monthly Sales (Oranges)')
plt.title('Effect of Adwords Spending on Monthly Sales')
Out[9]:
But this is too small. Let's specify the size of the plot. Note that you set it once at the very top, right after you import your libraries, or keep varying it every time you want to plot a graph.
In [10]:
plt.rcParams["figure.figsize"] = (15,7)
In [11]:
# Plot the data
plt.plot(x, y, label='Sales')
# Add a legend
plt.legend()
# Add more information
plt.xlabel('Adwords Spending (ZIM $)')
plt.ylabel('Monthly Sales (Oranges)')
plt.title('Effect of Adwords Spending on Monthly Sales')
Out[11]:
In [12]:
%matplotlib inline
import matplotlib.pyplot as plt
y = [1,4,9,16,25,36,49,64,81,100]
x1 = [5,10,15,20,25,30,35,40,45,47]
x2 = [1,1,2,3,5,8,13,21,34,53]
In [13]:
plt.rcParams["figure.figsize"] = (15,7)
plt.plot(y,x1, marker='+', linestyle='--', color='b',label='Blue Shift')
plt.plot(y,x2, marker='o', linestyle='-', color='r', label='Red Shift')
plt.xlabel('Days to Election')
plt.ylabel('Popularity')
plt.title('Candidate Popularity')
plt.legend(loc='lower right')
Out[13]:
In [14]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (15,7)
# Declare Values
vals = [10, 5, 3, 5, 7,6]
xval = [1, 2, 3, 4, 5,6]
# Bar Plot
plt.bar(xval, vals)
plt.title('Sales per Executive')
plt.xlabel('ID Number')
plt.ylabel('Weekly Sales')
Out[14]:
In [15]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
plt.rcParams["figure.figsize"] = (15,7)
Y = []
for x in range(0,1000000):
Y.append(np.random.randn())
In [17]:
# Here 50 is the bin size. Try playing around with 10,100,200 etc and see how it effects the shape of the graph
plt.hist(Y, 500)
plt.title('Distribution of Random Numbers')
Out[17]:
In [18]:
radius = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
# We import the math library.
# This can also be done as from math import pi
# Then instead of math.pi, we simply use pi
import math
import matplotlib.pyplot as plt
% matplotlib inline
plt.rcParams["figure.figsize"] = (15,7)
# How awesome is list comprehension!!
area = [round((r**2)*math.pi,2) for r in radius]
print(area)
In [20]:
plt.xlabel('Radius')
plt.ylabel('Area')
plt.title('Radius of Circle v Area')
plt.scatter(radius, area, color='g', s=30)
Out[20]:
In [24]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["figure.figsize"] = (15,7)
x = np.random.randn(1, 500)
y = np.random.randn(1,500)
plt.scatter(x, y, color='b', s=50) # s = size of the point
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.title('Scatter Plot')
Out[24]:
In [25]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["figure.figsize"] = (15,7)
fig = plt.figure()
# 121 = row,column,plot number
# Plot for Left Hand Side - 121 means
imgage1 = fig.add_subplot(121)
N=500
x = np.random.randn(N)
y = np.random.randn(N)
colors = np.random.rand(N)
size =(20 * np.random.rand(N))**2
plt.scatter(x, y, s=size, c=colors, alpha=0.4)
# Plot for Right Hand Side
imgage2 = fig.add_subplot(122)
N=1000
x1 = np.random.randn(N)
y1 = np.random.randn(N)
area= (5 * np.random.rand(N))**3
colors = ['magenta', 'blue', 'black', 'yellow',]
plt.scatter(x1, y1, s=area, c=colors, alpha=0.6)
imgage2.grid(True)
In [26]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (15,7)
y = [1,4,9,16,25,36,49,64,81,100]
x1 = [5,10,15,20,25,30,35,40,45,47]
x2 = [1,1,2,3,5,8,13,21,34,53]
fig = plt.figure()
fig.suptitle("Candidate Popularity", fontsize="x-large")
# 121 = row,column,plot number
# Plot for Left Hand Side - 121 means
imgage011 = fig.add_subplot(121)
plt.xlabel('Days to Election')
plt.plot(y,x1, marker='+', linestyle='--', color='b')
# Plot for Right Hand Side
imgage2 = fig.add_subplot(122)
plt.xlabel('Days to Election')
plt.plot(y,x2, marker='o', linestyle='-', color='r')
#imgage2.grid(True)
Out[26]:
In [27]:
## Alternate Method
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (15,7)
fig = plt.figure()
fig.suptitle("Candidate Popularity", fontsize="x-large")
ax1 = fig.add_subplot(121)
ax1.plot(y, x1, 'r-')
ax1.set_title("Candidate 1")
ax2 = fig.add_subplot(122)
ax2.plot(y, x2, 'k-')
ax2.set_title("Candidate 2")
plt.tight_layout()
fig = plt.gcf()
In [28]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (15,7)
y = [1,4,9,16,25,36,49,64,81,100]
x1 = [5,10,15,20,25,30,35,40,45,47]
x2 = [1,1,2,3,5,8,13,21,34,53]
fig = plt.figure()
fig.suptitle("Candidate Popularity", fontsize="x-large")
# 121 = row,column,plot number
# Plot for Left Hand Side - 121 means
imgage011 = fig.add_subplot(121)
plt.xlabel('Days to Election')
plt.plot(y,x1, marker='+', linestyle='--', color='b')
# Plot for Right Hand Side
imgage2 = fig.add_subplot(122)
plt.xlabel('Days to Election')
plt.plot(y,x2, marker='o', linestyle='-', color='r')
#imgage2.grid(True)
# Save Figure
plt.savefig("images/pop.png")
# Save Transparent Figure
plt.savefig("images/pop2.png", transparent=True)