Goals

The focus of this notebook is on baby names that have been given to both male and female.



In [1]:

    
%matplotlib inline



In [2]:

    
import matplotlib.pyplot as plt
import numpy as np

from pylab import figure, show

from pandas import DataFrame, Series
import pandas as pd



In [3]:

    
try:
    import mpld3
    from mpld3 import enable_notebook
    from mpld3 import plugins
    enable_notebook()
except Exception as e:
    print "Attempt to import and enable mpld3 failed", e



In [4]:

    
# what would seaborn do?
try:
    import seaborn as sns
except Exception as e:
    print "Attempt to import and enable seaborn failed", e









    



/Users/prabha/anaconda/lib/python2.7/site-packages/numpy/oldnumeric/__init__.py:11: ModuleDeprecationWarning: The oldnumeric module will be dropped in Numpy 1.9
  warnings.warn(_msg, ModuleDeprecationWarning)

Preliminaries: Assumed location of pydata-book files

To make it more practical for me to look at your homework, I'm again going to assume a relative placement of files. I placed the files from

https://github.com/pydata/pydata-book

in a local directory, which in my case is "/Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/"

and then symbolically linked (ln -s) to the the pydata-book from the root directory of the working-open-data folder. i.e., on OS X

cd /Users/raymondyee/D/Document/Working_with_Open_Data/working-open-data
ln -s /Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/ pydata-book

That way the files from the pydata-book repository look like they sit in the working-open-data directory -- without having to actually copy the files.

With this arrangment, I should then be able to drop your notebook into my own notebooks directory and run them without having to mess around with paths.



In [5]:

    
import os

NAMES_DIR = os.path.join(os.pardir, "pydata-book", "ch02", "names")

assert os.path.exists(NAMES_DIR)

Please make sure the above assertion works.

Baby names dataset

discussed in p. 35 of PfDA book

To download all the data, including that for 2011 and 2012: Popular Baby Names --> includes state by state data.

Loading all data into Pandas



In [6]:

    
# show the first five files in the NAMES_DIR

import glob
glob.glob(NAMES_DIR + "/*")[:5]









    Out[6]:





['../pydata-book/ch02/names/NationalReadMe.pdf',
 '../pydata-book/ch02/names/yob1880.txt',
 '../pydata-book/ch02/names/yob1881.txt',
 '../pydata-book/ch02/names/yob1882.txt',
 '../pydata-book/ch02/names/yob1883.txt']



In [7]:

    
# 2010 is the last available year in the pydata-book repo
import os

years = range(1880, 2011)

pieces = []
columns = ['name', 'sex', 'births']

for year in years:
    path = os.path.join(NAMES_DIR, 'yob%d.txt' % year)
    frame = pd.read_csv(path, names=columns)

    frame['year'] = year
    pieces.append(frame)

# Concatenate everything into a single DataFrame
names = pd.concat(pieces, ignore_index=True)

# why floats?  I'm not sure.
names.describe()









    Out[7]:






  
    
      
      births
      year
    
  
  
    
      count
       1690784.000000
       1690784.000000
    
    
      mean
           190.682386
          1969.454384
    
    
      std
          1615.899711
            32.823526
    
    
      min
             5.000000
          1880.000000
    
    
      25%
             7.000000
          1946.000000
    
    
      50%
            12.000000
          1979.000000
    
    
      75%
            32.000000
          1997.000000
    
    
      max
         99651.000000
          2010.000000
    
  

8 rows × 2 columns



In [8]:

    
# how many people, names, males and females  represented in names?

names.births.sum()









    Out[8]:





322402727



In [9]:

    
# F vs M

names.groupby('sex')['births'].sum()









    Out[9]:





sex
F      159990140
M      162412587
Name: births, dtype: int64



In [10]:

    
# total number of names

len(names.groupby('name'))









    Out[10]:





88496



In [11]:

    
# use pivot_table to collect records by year (rows) and sex (columns)

total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum)
total_births.head()









    Out[11]:






  
    
      sex
      F
      M
    
    
      year
      
      
    
  
  
    
      1880
        90993
       110493
    
    
      1881
        91955
       100748
    
    
      1882
       107851
       113687
    
    
      1883
       112322
       104632
    
    
      1884
       129021
       114445
    
  

5 rows × 2 columns



In [12]:

    
# You can use groupy to get equivalent pivot_table calculation

names.groupby('year').apply(lambda s: s.groupby('sex').agg('sum')).unstack()['births']









    Out[12]:






  
    
      sex
      F
      M
    
    
      year
      
      
    
  
  
    
      1880
         90993
        110493
    
    
      1881
         91955
        100748
    
    
      1882
        107851
        113687
    
    
      1883
        112322
        104632
    
    
      1884
        129021
        114445
    
    
      1885
        133056
        107802
    
    
      1886
        144538
        110785
    
    
      1887
        145983
        101412
    
    
      1888
        178631
        120857
    
    
      1889
        178369
        110590
    
    
      1890
        190377
        111026
    
    
      1891
        185486
        101198
    
    
      1892
        212350
        122038
    
    
      1893
        212908
        112319
    
    
      1894
        222923
        115775
    
    
      1895
        233632
        117398
    
    
      1896
        237924
        119575
    
    
      1897
        234199
        112760
    
    
      1898
        258771
        122703
    
    
      1899
        233022
        106218
    
    
      1900
        299873
        150554
    
    
      1901
        239351
        106478
    
    
      1902
        264079
        122660
    
    
      1903
        261976
        119240
    
    
      1904
        275375
        128129
    
    
      1905
        291641
        132319
    
    
      1906
        295301
        133159
    
    
      1907
        318558
        146838
    
    
      1908
        334277
        154339
    
    
      1909
        347191
        163983
    
    
      1910
        396416
        194198
    
    
      1911
        418180
        225936
    
    
      1912
        557939
        429926
    
    
      1913
        624317
        512482
    
    
      1914
        761376
        654746
    
    
      1915
        983824
        848647
    
    
      1916
       1044249
        890142
    
    
      1917
       1081194
        925512
    
    
      1918
       1157585
       1013720
    
    
      1919
       1130149
        980215
    
    
      1920
       1198214
       1064468
    
    
      1921
       1232845
       1101374
    
    
      1922
       1200796
       1088380
    
    
      1923
       1206239
       1096227
    
    
      1924
       1248821
       1132671
    
    
      1925
       1217217
       1115798
    
    
      1926
       1185078
       1110440
    
    
      1927
       1192207
       1126259
    
    
      1928
       1152836
       1107113
    
    
      1929
       1116284
       1074833
    
    
      1930
       1125521
       1096663
    
    
      1931
       1064233
       1038586
    
    
      1932
       1066930
       1043512
    
    
      1933
       1007523
        990677
    
    
      1934
       1043879
       1031962
    
    
      1935
       1048264
       1040649
    
    
      1936
       1040068
       1036662
    
    
      1937
       1063722
       1065964
    
    
      1938
       1103173
       1108480
    
    
      1939
       1096394
       1106328
    
    
      
      ...
      ...
    
  

131 rows × 2 columns



In [13]:

    
# how to calculate the total births / year

names.groupby('year').sum().plot(title="total births by year")









    Out[13]:





<matplotlib.axes.AxesSubplot at 0x115fe4e90>



In [14]:

    
names.groupby('year').apply(lambda s: s.groupby('sex').agg('sum')).unstack()['births'].plot(title="births (M/F) by year")









    Out[14]:





<matplotlib.axes.AxesSubplot at 0x10ea330d0>



In [15]:

    
# from book: add prop to names

def add_prop(group):
    # Integer division floors
    births = group.births.astype(float)

    group['prop'] = births / births.sum()
    return group

names = names.groupby(['year', 'sex']).apply(add_prop)



In [16]:

    
# verify prop --> all adds up to 1

np.allclose(names.groupby(['year', 'sex']).prop.sum(), 1)









    Out[16]:





True



In [17]:

    
# number of records in full names dataframe

len(names)









    Out[17]:





1690784

How to do top1000 calculation

This section on the top1000 calculation is kept in here to provide some inspiration on how to work with baby names



In [18]:

    
#  from book: useful to work with top 1000 for each year/sex combo
# can use groupby/apply

names.groupby(['year', 'sex']).apply(lambda g: g.sort_index(by='births', ascending=False)[:1000])









    Out[18]:






  
    
      
      
      
      name
      sex
      births
      year
      prop
    
    
      year
      sex
      
      
      
      
      
      
    
  
  
    
      1880
      F
      0 
            Mary
       F
       7065
       1880
       0.077643
    
    
      1 
            Anna
       F
       2604
       1880
       0.028618
    
    
      2 
            Emma
       F
       2003
       1880
       0.022013
    
    
      3 
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4 
          Minnie
       F
       1746
       1880
       0.019188
    
    
      5 
        Margaret
       F
       1578
       1880
       0.017342
    
    
      6 
             Ida
       F
       1472
       1880
       0.016177
    
    
      7 
           Alice
       F
       1414
       1880
       0.015540
    
    
      8 
          Bertha
       F
       1320
       1880
       0.014507
    
    
      9 
           Sarah
       F
       1288
       1880
       0.014155
    
    
      10
           Annie
       F
       1258
       1880
       0.013825
    
    
      11
           Clara
       F
       1226
       1880
       0.013474
    
    
      12
            Ella
       F
       1156
       1880
       0.012704
    
    
      13
        Florence
       F
       1063
       1880
       0.011682
    
    
      14
            Cora
       F
       1045
       1880
       0.011484
    
    
      15
          Martha
       F
       1040
       1880
       0.011429
    
    
      16
           Laura
       F
       1012
       1880
       0.011122
    
    
      17
          Nellie
       F
        995
       1880
       0.010935
    
    
      18
           Grace
       F
        982
       1880
       0.010792
    
    
      19
          Carrie
       F
        949
       1880
       0.010429
    
    
      20
           Maude
       F
        858
       1880
       0.009429
    
    
      21
           Mabel
       F
        808
       1880
       0.008880
    
    
      22
          Bessie
       F
        794
       1880
       0.008726
    
    
      23
          Jennie
       F
        793
       1880
       0.008715
    
    
      24
        Gertrude
       F
        787
       1880
       0.008649
    
    
      25
           Julia
       F
        783
       1880
       0.008605
    
    
      26
          Hattie
       F
        769
       1880
       0.008451
    
    
      27
           Edith
       F
        768
       1880
       0.008440
    
    
      28
          Mattie
       F
        704
       1880
       0.007737
    
    
      29
            Rose
       F
        700
       1880
       0.007693
    
    
      30
       Catherine
       F
        688
       1880
       0.007561
    
    
      31
         Lillian
       F
        672
       1880
       0.007385
    
    
      32
             Ada
       F
        652
       1880
       0.007165
    
    
      33
          Lillie
       F
        647
       1880
       0.007110
    
    
      34
           Helen
       F
        636
       1880
       0.006990
    
    
      35
          Jessie
       F
        635
       1880
       0.006979
    
    
      36
          Louise
       F
        635
       1880
       0.006979
    
    
      37
           Ethel
       F
        633
       1880
       0.006957
    
    
      38
            Lula
       F
        621
       1880
       0.006825
    
    
      39
          Myrtle
       F
        615
       1880
       0.006759
    
    
      40
             Eva
       F
        614
       1880
       0.006748
    
    
      41
         Frances
       F
        605
       1880
       0.006649
    
    
      42
            Lena
       F
        603
       1880
       0.006627
    
    
      43
            Lucy
       F
        591
       1880
       0.006495
    
    
      44
            Edna
       F
        588
       1880
       0.006462
    
    
      45
          Maggie
       F
        582
       1880
       0.006396
    
    
      46
           Pearl
       F
        569
       1880
       0.006253
    
    
      47
           Daisy
       F
        564
       1880
       0.006198
    
    
      48
          Fannie
       F
        560
       1880
       0.006154
    
    
      49
       Josephine
       F
        544
       1880
       0.005978
    
    
      50
            Dora
       F
        524
       1880
       0.005759
    
    
      51
            Rosa
       F
        507
       1880
       0.005572
    
    
      52
       Katherine
       F
        502
       1880
       0.005517
    
    
      53
           Agnes
       F
        473
       1880
       0.005198
    
    
      54
           Marie
       F
        471
       1880
       0.005176
    
    
      55
            Nora
       F
        471
       1880
       0.005176
    
    
      56
             May
       F
        462
       1880
       0.005077
    
    
      57
           Mamie
       F
        436
       1880
       0.004792
    
    
      58
         Blanche
       F
        427
       1880
       0.004693
    
    
      59
          Stella
       F
        414
       1880
       0.004550
    
    
      
      
      
      ...
      ...
      ...
      ...
      ...
    
  

261877 rows × 5 columns



In [19]:

    
def get_top1000(group):
    return group.sort_index(by='births', ascending=False)[:1000]

grouped = names.groupby(['year', 'sex'])
top1000 = grouped.apply(get_top1000)
top1000.head()









    Out[19]:






  
    
      
      
      
      name
      sex
      births
      year
      prop
    
    
      year
      sex
      
      
      
      
      
      
    
  
  
    
      1880
      F
      0
            Mary
       F
       7065
       1880
       0.077643
    
    
      1
            Anna
       F
       2604
       1880
       0.028618
    
    
      2
            Emma
       F
       2003
       1880
       0.022013
    
    
      3
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4
          Minnie
       F
       1746
       1880
       0.019188
    
  

5 rows × 5 columns



In [20]:

    
# Do pivot table: row: year and cols= names for top 1000

top_births = top1000.pivot_table('births', rows='year', cols='name', aggfunc=np.sum)
top_births.tail()









    Out[20]:






  
    
      name
      Aaden
      Aaliyah
      Aarav
      Aaron
      Aarush
      Ab
      Abagail
      Abb
      Abbey
      Abbie
      Abbigail
      Abbott
      Abby
      Abdiel
      Abdul
      Abdullah
      Abe
      Abel
      Abelardo
      Abigail
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
        NaN
       3737
       NaN
       8279
       NaN
      NaN
       297
      NaN
       404
       440
       630
      NaN
       1682
       NaN
      NaN
       219
      NaN
        922
      NaN
       15615
      ...
    
    
      2007
        NaN
       3941
       NaN
       8914
       NaN
      NaN
       313
      NaN
       349
       468
       651
      NaN
       1573
       NaN
      NaN
       224
      NaN
        939
      NaN
       15447
      ...
    
    
      2008
        955
       4028
       219
       8511
       NaN
      NaN
       317
      NaN
       344
       400
       608
      NaN
       1328
       199
      NaN
       210
      NaN
        863
      NaN
       15045
      ...
    
    
      2009
       1265
       4352
       270
       7936
       NaN
      NaN
       296
      NaN
       307
       369
       675
      NaN
       1274
       229
      NaN
       256
      NaN
        960
      NaN
       14342
      ...
    
    
      2010
        448
       4628
       438
       7374
       226
      NaN
       277
      NaN
       295
       324
       585
      NaN
       1140
       264
      NaN
       225
      NaN
       1119
      NaN
       14124
      ...
    
  

5 rows × 6865 columns



In [21]:

    
# is your name in the top_births list?

top_births['Raymond'].plot(title='plot for Raymond')









    Out[21]:





<matplotlib.axes.AxesSubplot at 0x113ac1390>



In [22]:

    
# for Aaden, which shows up at the end

top_births.Aaden.plot(xlim=[1880,2010])









    Out[22]:





<matplotlib.axes.AxesSubplot at 0x113aca5d0>



In [23]:

    
# number of names represented in top_births

len(top_births.columns)









    Out[23]:





6865



In [24]:

    
# how to get the most popular name of all time in top_births?

most_common_names = top_births.sum()
most_common_names.sort(ascending=False)

most_common_names.head()









    Out[24]:





name
James      5071647
John       5060953
Robert     4787187
Michael    4263083
Mary       4117746
dtype: float64



In [25]:

    
# as of mpl v 0.1 (2014.03.04), the name labeling doesn't work -- so disble mpld3 for this figure

mpld3.disable_notebook()
plt.figure()
most_common_names[:50][::-1].plot(kind='barh', figsize=(10,10))









    Out[25]:





<matplotlib.axes.AxesSubplot at 0x112c5cc10>



In [26]:

    
# turn mpld3 back on

mpld3.enable_notebook()

all_births pivot table



In [27]:

    
# instead of top_birth -- get all_births

all_births = names.pivot_table('births', rows='year', cols='name', aggfunc=sum)



In [28]:

    
all_births = all_births.fillna(0)
all_births.tail()









    Out[28]:






  
    
      name
      Aaban
      Aabid
      Aabriella
      Aadam
      Aadan
      Aadarsh
      Aaden
      Aadesh
      Aadhav
      Aadhavan
      Aadhya
      Aadi
      Aadil
      Aadin
      Aadison
      Aadit
      Aadith
      Aaditri
      Aaditya
      Aadon
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
       0
       0
       0
       9
        0
       14
         55
       0
        5
       0
        0
       74
       11
        0
       0
       17
       0
       0
       42
        7
      ...
    
    
      2007
       5
       0
       0
       8
        8
       13
        155
       0
        0
       0
       10
       72
       15
       10
       0
       31
       7
       0
       43
       10
      ...
    
    
      2008
       0
       0
       5
       6
       22
       13
        955
       0
        0
       0
        9
       76
       20
       22
       0
       24
       5
       0
       51
       10
      ...
    
    
      2009
       6
       0
       0
       9
       23
       16
       1270
       5
        5
       0
       18
       76
       17
       25
       6
       12
       0
       0
       38
       23
      ...
    
    
      2010
       9
       0
       0
       7
       11
        0
        448
       0
       13
       5
       19
       54
       11
       18
       0
       23
       0
       5
       37
        0
      ...
    
  

5 rows × 88496 columns



In [29]:

    
# set up to do start/end calculation

all_births_cumsum = all_births.apply(lambda s: s.cumsum(), axis=0)



In [30]:

    
all_births_cumsum.tail()









    Out[30]:






  
    
      name
      Aaban
      Aabid
      Aabriella
      Aadam
      Aadan
      Aadarsh
      Aaden
      Aadesh
      Aadhav
      Aadhavan
      Aadhya
      Aadi
      Aadil
      Aadin
      Aadison
      Aadit
      Aadith
      Aaditri
      Aaditya
      Aadon
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
        0
       5
       0
       103
        5
        67
        149
        5
       11
       0
        0
       171
       175
       10
       0
        67
        5
       0
       153
       18
      ...
    
    
      2007
        5
       5
       0
       111
       13
        80
        304
        5
       11
       0
       10
       243
       190
       20
       0
        98
       12
       0
       196
       28
      ...
    
    
      2008
        5
       5
       5
       117
       35
        93
       1259
        5
       11
       0
       19
       319
       210
       42
       0
       122
       17
       0
       247
       38
      ...
    
    
      2009
       11
       5
       5
       126
       58
       109
       2529
       10
       16
       0
       37
       395
       227
       67
       6
       134
       17
       0
       285
       61
      ...
    
    
      2010
       20
       5
       5
       133
       69
       109
       2977
       10
       29
       5
       56
       449
       238
       85
       6
       157
       17
       5
       322
       61
      ...
    
  

5 rows × 88496 columns

Names that are both M and F



In [31]:

    
# remind ourselves of what's in names

names.head()









    Out[31]:






  
    
      
      name
      sex
      births
      year
      prop
    
  
  
    
      0
            Mary
       F
       7065
       1880
       0.077643
    
    
      1
            Anna
       F
       2604
       1880
       0.028618
    
    
      2
            Emma
       F
       2003
       1880
       0.022013
    
    
      3
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4
          Minnie
       F
       1746
       1880
       0.019188
    
  

5 rows × 5 columns



In [32]:

    
# columns in names

names.columns









    Out[32]:





Index([u'name', u'sex', u'births', u'year', u'prop'], dtype='object')

Approach to exploring ambigendered names

Some things to think about:

calculate a set of ambi_names -- names that are both M and F in the database: names_ambi
calculate a pivot table ambi_names_pt that use a hierarchical index name/sex vs years
for a specific name, make a plot of male vs female population to validate your approach
think of using cumulative vs year-by-year instantaneous populations
think about metrics for measuring the sex shift of names
think about how to calculate how ambigendered a name is

Exercise

Submit a notebook that describes what you've learned about the nature of ambigendered names in the baby names database. (Due date: Monday, March 10 at 11:5pm --> bCourses assignment to come.) I'm interested in seeing what you do with the data set in this regard. At the minimum, show that you are able to run Day_13_C_Baby_Names_MF_Completed. Be creative and have fun.



In [32]:

	births	year
count	1690784.000000	1690784.000000
mean	190.682386	1969.454384
std	1615.899711	32.823526
min	5.000000	1880.000000
25%	7.000000	1946.000000
50%	12.000000	1979.000000
75%	32.000000	1997.000000
max	99651.000000	2010.000000

sex	F	M
year
1880	90993	110493
1881	91955	100748
1882	107851	113687
1883	112322	104632
1884	129021	114445

			name	sex	births	year	prop
year	sex
1880	F	0	Mary	F	7065	1880	0.077643
		1	Anna	F	2604	1880	0.028618
		2	Emma	F	2003	1880	0.022013
		3	Elizabeth	F	1939	1880	0.021309
		4	Minnie	F	1746	1880	0.019188
		5	Margaret	F	1578	1880	0.017342
		6	Ida	F	1472	1880	0.016177
		7	Alice	F	1414	1880	0.015540
		8	Bertha	F	1320	1880	0.014507
		9	Sarah	F	1288	1880	0.014155
		10	Annie	F	1258	1880	0.013825
		11	Clara	F	1226	1880	0.013474
		12	Ella	F	1156	1880	0.012704
		13	Florence	F	1063	1880	0.011682
		14	Cora	F	1045	1880	0.011484
		15	Martha	F	1040	1880	0.011429
		16	Laura	F	1012	1880	0.011122
		17	Nellie	F	995	1880	0.010935
		18	Grace	F	982	1880	0.010792
		19	Carrie	F	949	1880	0.010429
		20	Maude	F	858	1880	0.009429
		21	Mabel	F	808	1880	0.008880
		22	Bessie	F	794	1880	0.008726
		23	Jennie	F	793	1880	0.008715
		24	Gertrude	F	787	1880	0.008649
		25	Julia	F	783	1880	0.008605
		26	Hattie	F	769	1880	0.008451
		27	Edith	F	768	1880	0.008440
		28	Mattie	F	704	1880	0.007737
		29	Rose	F	700	1880	0.007693
		30	Catherine	F	688	1880	0.007561
		31	Lillian	F	672	1880	0.007385
		32	Ada	F	652	1880	0.007165
		33	Lillie	F	647	1880	0.007110
		34	Helen	F	636	1880	0.006990
		35	Jessie	F	635	1880	0.006979
		36	Louise	F	635	1880	0.006979
		37	Ethel	F	633	1880	0.006957
		38	Lula	F	621	1880	0.006825
		39	Myrtle	F	615	1880	0.006759
		40	Eva	F	614	1880	0.006748
		41	Frances	F	605	1880	0.006649
		42	Lena	F	603	1880	0.006627
		43	Lucy	F	591	1880	0.006495
		44	Edna	F	588	1880	0.006462
		45	Maggie	F	582	1880	0.006396
		46	Pearl	F	569	1880	0.006253
		47	Daisy	F	564	1880	0.006198
		48	Fannie	F	560	1880	0.006154
		49	Josephine	F	544	1880	0.005978
		50	Dora	F	524	1880	0.005759
		51	Rosa	F	507	1880	0.005572
		52	Katherine	F	502	1880	0.005517
		53	Agnes	F	473	1880	0.005198
		54	Marie	F	471	1880	0.005176
		55	Nora	F	471	1880	0.005176
		56	May	F	462	1880	0.005077
		57	Mamie	F	436	1880	0.004792
		58	Blanche	F	427	1880	0.004693
		59	Stella	F	414	1880	0.004550
			...	...	...	...	...

name	Aaden	Aaliyah	Aarav	Aaron	Aarush	Ab	Abagail	Abb	Abbey	Abbie	Abbigail	Abbott	Abby	Abdiel	Abdul	Abdullah	Abe	Abel	Abelardo	Abigail
year
2006	NaN	3737	NaN	8279	NaN	NaN	297	NaN	404	440	630	NaN	1682	NaN	NaN	219	NaN	922	NaN	15615	...
2007	NaN	3941	NaN	8914	NaN	NaN	313	NaN	349	468	651	NaN	1573	NaN	NaN	224	NaN	939	NaN	15447	...
2008	955	4028	219	8511	NaN	NaN	317	NaN	344	400	608	NaN	1328	199	NaN	210	NaN	863	NaN	15045	...
2009	1265	4352	270	7936	NaN	NaN	296	NaN	307	369	675	NaN	1274	229	NaN	256	NaN	960	NaN	14342	...
2010	448	4628	438	7374	226	NaN	277	NaN	295	324	585	NaN	1140	264	NaN	225	NaN	1119	NaN	14124	...