In this guided project, you'll practice recreating some of the plots using Matplotlib that Seaborn and Pandas allow you to generate using high-level functions. This deliberate practice will help prepare you for creating new kinds of plots in the future that these libraries don't provide.
We'll continue to work with the dataset from the American Community Survey on job outcomes for recent college graduates. Here are some of the columns in the dataset:
Before we start creating data visualizations, let's import the libraries we need and remove rows contain null values.
In [1]:
# Setup the environment by importing the libraries we need
import pandas as pd
import matplotlib.pyplot as plt
# And run the necessary Jupyter magic so plots are displayed inline
%matplotlib notebook
In [2]:
# Read the dataset into a DataFrame
recent_grads = pd.read_csv('../data/recent-grads.csv')
# Start exploring the beginning and end of the data
recent_grads.head()
Out[2]:
In [3]:
recent_grads.tail()
Out[3]:
In [4]:
# Look at some summary statistics
recent_grads.describe()
Out[4]:
In [5]:
# Use shape to see how many rows and columns we have
recent_grads.shape
Out[5]:
In [6]:
# Create a new DataFrame with rows containing NaN values dropped
filtered_recent = recent_grads.dropna()
# And make sure we didn't drop too many rows
filtered_recent.shape
Out[6]:
In [8]:
# Create a scatter matrix with pandas
pd.scatter_matrix(recent_grads[['ShareWomen', 'Unemployment_rate']], figsize=(8,8))
Out[8]:
In [9]:
# Create a Figure instance and create 4 Axes instances
fig = plt.figure(figsize=(8,8))
ax11 = fig.add_subplot(2,2,1)
ax12 = fig.add_subplot(2,2,2)
ax21 = fig.add_subplot(2,2,3)
ax22 = fig.add_subplot(2,2,4)
# Now that we have 4 Axes instances, we can generate graphs for each
ax11.hist(filtered_recent['ShareWomen'])
ax22.hist(filtered_recent['Unemployment_rate'])
ax12.scatter(filtered_recent['Unemployment_rate'], filtered_recent['ShareWomen'])
ax21.scatter(filtered_recent['ShareWomen'], filtered_recent['Unemployment_rate'])
# Now let's tweak the appearance.
# To tweak how the axis ticks look, you need to grab a subplot's XAxis
# or YAxis instance and call specific methods.
# Use the Axes methods get_xaxis() and get_yaxis() to get these axes.
# Hide the x-axis ticks for the 2 subplots on the top row
ax11.xaxis.set_visible(False)
ax12.xaxis.set_visible(False)
ax12.yaxis.set_visible(False)
ax22.yaxis.set_visible(False)
# Assign the column names as the x-aix and y-axis labels
ax11.set_ylabel('ShareWomen')
ax21.set_ylabel('Unemployment_rate')
ax21.set_xlabel('ShareWomen')
ax22.set_xlabel('Unemployment_rate')
# Remove the spacing between subplots to match the Pandas scatter matrix
fig.subplots_adjust(wspace=0, hspace=0)
# The last remaining piece is to customize the x-axis and y-axis ticks
# Use the Axes methods set_xlim() and set_ylim to set data limits
ax11.set(ylim=(0,30))
ax12.set(ylim=(0.0,1.0))
ax21.set(xlim=(0.0,1.0), ylim=(0.0,0.20))
ax22.set(xlim=(0.0,0.20))
# Use the Axes metods set_xticklabels and set_yticklabels()
ax11.set_yticklabels([0, 5, 10, 15, 20, 25, 30])
ax21.set_yticklabels([0.0, 0.05, 0.10, 0.15])
ax21.set_xticklabels([0.0, 0.2, 0.4, 0.6, 0.8], rotation=90)
ax22.set_xticklabels([0.00, 0.05, 0.10, 0.15, 0.20], rotation=90)
Out[9]:
In [10]:
# Add a ShareMen column
recent_grads['ShareMen'] = 1 - recent_grads['ShareWomen']
In [11]:
# First filter the DataFrame down to the columns you want visualized
arts = recent_grads[recent_grads['Major_category'] == 'Arts']
arts.set_index("Major", inplace=True)
arts.head()
Out[11]:
In [12]:
# Create a Grouped Bar Plot using Pandas
arts[['ShareMen', 'ShareWomen']].plot(kind='bar', figsize=(8,8))
Out[12]:
In [13]:
# import NumPy and use arange to generate a list of integer values
import numpy as np
locs = np.arange(len(arts))
locs
Out[13]:
In [14]:
# Create a Figure instance and add a single subplotplot
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
# Generate the bars for the ShareMen column
bar_width = 0.35
bar_1 = ax.bar(left=locs, height=arts['ShareMen'], width=bar_width)
# Use the Axes method set_xticklabels() to assign the major names
ax.set_xticklabels(arts.index, rotation=90)
# We need a list of placement values for the new bars that are offset
offset_locs = locs + bar_width
# Generate the bars for the ShareWomen column
bar_2 = ax.bar(left=offset_locs, height=arts['ShareWomen'], width=bar_width, color='green')
# Align the x-asis labels better with the grouped bars
ax.set_xticks(offset_locs)
# Create a legend
plt.legend((bar_1, bar_2), ('ShareMen', 'ShareWomen'), loc='upper left')
# Display the baground grid
plt.grid(True)
Here are some ideas to continue practicing what you've learned:
In [15]:
# Gender ratios stacked bar plot
arts[['ShareMen', 'ShareWomen']].plot.bar(figsize=(8,8), stacked=True)
Out[15]:
In [16]:
# Box plot
arts[['ShareMen', 'ShareWomen']].plot.box(figsize=(8,8))
Out[16]:
In [ ]: