Let's construct a simple stacked bar chart to visualize our Titanic analysis.

Import the training data.


In [1]:
import csv
import numpy as np

nfile_ref = open('train.csv', 'r')
csv_file = csv.reader(nfile_ref)                       # Load the csv file.
header = csv_file.next()                               # Skip the first line as it is a header.
data = []                                              # Create a variable to hold the data.

for row in csv_file:                                   # Skip through each row in the csv file,
    data.append(row[0:])                               # adding each row to the data variable.
data = np.array(data)                                  # Then convert from a list to a Numpy array.
nfile_ref.close()


Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 20 days
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 20 days

Import matplotlib library and Numpy:


In [2]:
# Import matplotlib and allow it to plot in the notebook.
import matplotlib.pyplot as plt
%matplotlib inline                                     

# Import Numpy
import numpy as np

Set some configurations for the plot:

The location along the x-axis where the bars will sit.

In [5]:
bottom_locs = np.array([1., 2.])
The width of the bars.

In [4]:
width = 0.3

Generate the actual values to plot:

The number of men that died and that survived.

In [7]:
men_only_stats = data[0::, 4] != "female"                   # This finds where all the men are in the data set. 
men_onboard = data[men_only_stats, 1].astype(np.float)      # 1st column of data (survived= 0,1), but only men.
men = (np.size(men_onboard)-np.sum(men_onboard), np.sum(men_onboard))

The number of women that died and that survived.


In [6]:
women_only_stats = data[0::, 4] == "female"                 # This finds where all the women are in the data set.
women_onboard = data[women_only_stats, 1].astype(np.float)  # 1st column of data (survived= 0,1), but only women. 
women = (np.size(women_onboard)-np.sum(women_onboard), np.sum(women_onboard))

Generate the plot.


In [8]:
# Add the values to the plot.
plt.bar(bottom_locs, men, label='Male', width=width)
plt.bar(bottom_locs, women, color='m', label='Female', width=width, bottom=men)

# Decorate the plot.
plt.ylabel('Count')
plt.title('Who Survived the Titanic?')
plt.legend(loc='best')
plt.xticks(bottom_locs+width/2., ('Died', 'Survived'))


Out[8]:
([<matplotlib.axis.XTick at 0x774c4a8>, <matplotlib.axis.XTick at 0x774c048>],
 <a list of 2 Text xticklabel objects>)