Exercise 02.1

Create a function that receives two inputs a and b, and returns the product of the a decimal of pi and the b decimal of pi.

i.e, 
pi = 3.14159
if a = 2 and b = 4
result = 4 * 5
result = 20

Caveats:

a and b are between 1 and 15
decimals positions 1 and 2 are 1 and 4, respectively. (remember that python start indexing in 0)



In [ ]:

    
from math import pi
def mult_dec_pi(a, b):
    
    # Add the solution here
    
    result = ''
    return result



In [ ]:

    
mult_dec_pi(a=2, b=4)
# 20.0



In [ ]:

    
mult_dec_pi(a=5, b=10)
# 45.0



In [ ]:

    
mult_dec_pi(a=14, b=1)
# 9.0



In [ ]:

    
mult_dec_pi(a=6, b=8)
# 10.0



In [ ]:

    
# Bonus
mult_dec_pi(a=16, b=4)
# 'Error'

Exercise 02.2

Using the given dataset. Estimate a linear regression between Employed and GNP.

$$Employed = b_0 + b_1 * GNP $$$$\hat b = (X^TX)^{-1}X^TY$$$$Y = Employed$$$$X = [1 \quad GNP]$$



In [13]:

    
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Import data
raw_data = """
Year,Employed,GNP
1947,60.323,234.289
1948,61.122,259.426
1949,60.171,258.054
1950,61.187,284.599
1951,63.221,328.975
1952,63.639,346.999
1953,64.989,365.385
1954,63.761,363.112
1955,66.019,397.469
1956,67.857,419.18
1957,68.169,442.769
1958,66.513,444.546
1959,68.655,482.704
1960,69.564,502.601
1961,69.331,518.173
1962,70.551,554.894"""

data = []
for line in raw_data.splitlines()[2:]:
    words = line.split(',')
    data.append(words)
data = np.array(data, dtype=np.float)
n_obs = data.shape[0]
plt.plot(data[:, 2], data[:, 1], 'bo')
plt.xlabel("GNP")
plt.ylabel("Employed")









    Out[13]:





Text(0,0.5,'Employed')

Exercise 02.3

Analyze the baby names dataset using pandas



In [7]:

    
import pandas as pd
# Load dataset
import zipfile
with zipfile.ZipFile('../datasets/baby-names2.csv.zip', 'r') as z:
    f = z.open('baby-names2.csv')
    names = pd.io.parsers.read_table(f, sep=',')



In [8]:

    
names.head()



In [9]:

    
names[names.year == 1993].head()









    Out[9]:







  
    
      
      year
      name
      prop
      sex
      soundex
    
  
  
    
      113000
      1993
      Michael
      0.024010
      boy
      M240
    
    
      113001
      1993
      Christopher
      0.018572
      boy
      C623
    
    
      113002
      1993
      Matthew
      0.017332
      boy
      M300
    
    
      113003
      1993
      Joshua
      0.016268
      boy
      J200
    
    
      113004
      1993
      Tyler
      0.014439
      boy
      T460

segment the data into boy and girl names



In [11]:

    
boys = names[names.sex == 'boy'].copy()    
girls = names[names.sex == 'girl'].copy()

Analyzing the popularity of a name over time



In [14]:

    
william = boys[boys['name']=='William']

plt.plot(range(william.shape[0]), william['prop'])
plt.xticks(range(william.shape[0])[::5], william['year'].values[::5], rotation='vertical')
plt.ylim([0, 0.1])
plt.show()



In [15]:

    
Daniel = boys[boys['name']=='Daniel']

plt.plot(range(Daniel.shape[0]), Daniel['prop'])
plt.xticks(range(Daniel.shape[0])[::5], Daniel['year'].values[::5], rotation='vertical')
plt.ylim([0, 0.1])
plt.show()

Exercise 02.3

Which has been the most popular boy name every decade?



In [ ]:

Exercise 02.4

Which has been the most popular girl name?



In [ ]:

Exercise 02.5

What is the most popular new girl name? (new is a name that appears only in the 2000's)



In [ ]:

	year	name	prop	sex	soundex
0	1880	John	0.081541	boy	J500
1	1880	William	0.080511	boy	W450
2	1880	James	0.050057	boy	J520
3	1880	Charles	0.045167	boy	C642
4	1880	George	0.043292	boy	G620

	year	name	prop	sex	soundex
113000	1993	Michael	0.024010	boy	M240
113001	1993	Christopher	0.018572	boy	C623
113002	1993	Matthew	0.017332	boy	M300
113003	1993	Joshua	0.016268	boy	J200
113004	1993	Tyler	0.014439	boy	T460