Exercise 03

Analyze the baby names dataset using pandas


In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [2]:
# Load dataset
import zipfile
with zipfile.ZipFile('../datasets/baby-names2.csv.zip', 'r') as z:
    f = z.open('baby-names2.csv')
    names = pd.io.parsers.read_table(f, sep=',')

In [3]:
names.head()


Out[3]:
year name prop sex soundex
0 1880 John 0.081541 boy J500
1 1880 William 0.080511 boy W450
2 1880 James 0.050057 boy J520
3 1880 Charles 0.045167 boy C642
4 1880 George 0.043292 boy G620

In [4]:
names[names.year == 1993].head()


Out[4]:
year name prop sex soundex
113000 1993 Michael 0.024010 boy M240
113001 1993 Christopher 0.018572 boy C623
113002 1993 Matthew 0.017332 boy M300
113003 1993 Joshua 0.016268 boy J200
113004 1993 Tyler 0.014439 boy T460

segment the data into boy and girl names


In [5]:
boys = names[names.sex == 'boy'].copy()    
girls = names[names.sex == 'girl'].copy()

Analyzing the popularity of a name over time


In [6]:
william = boys[boys['name']=='William']

plt.plot(range(william.shape[0]), william['prop'])
plt.xticks(range(william.shape[0])[::5], william['year'].values[::5], rotation='vertical')
plt.ylim([0, 0.1])
plt.show()



In [7]:
Daniel = boys[boys['name']=='Daniel']

plt.plot(range(Daniel.shape[0]), Daniel['prop'])
plt.xticks(range(Daniel.shape[0])[::5], Daniel['year'].values[::5], rotation='vertical')
plt.ylim([0, 0.1])
plt.show()


Exercise 03.1

Which has been the most popular boy name every decade?


In [ ]:

Exercise 03.2

Which has been the most popular girl name?


In [ ]:

Exercise 03.3

What is the most popular new girl name? (new is a name that appears only in the 2000's)


In [ ]: