Goals

The focus of this notebook is on baby names that have been given to both male and female.



In [1]:

    
%matplotlib inline



In [2]:

    
import matplotlib.pyplot as plt
import numpy as np

from pylab import figure, show

from pandas import DataFrame, Series
import pandas as pd



In [3]:

    
try:
    import mpld3
    from mpld3 import enable_notebook
    from mpld3 import plugins
    enable_notebook()
except Exception as e:
    print "Attempt to import and enable mpld3 failed", e



In [4]:

    
# what would seaborn do?
try:
    import seaborn as sns
except Exception as e:
    print "Attempt to import and enable seaborn failed", e









    



/Users/prabha/anaconda/lib/python2.7/site-packages/numpy/oldnumeric/__init__.py:11: ModuleDeprecationWarning: The oldnumeric module will be dropped in Numpy 1.9
  warnings.warn(_msg, ModuleDeprecationWarning)

Preliminaries: Assumed location of pydata-book files

To make it more practical for me to look at your homework, I'm again going to assume a relative placement of files. I placed the files from

https://github.com/pydata/pydata-book

in a local directory, which in my case is "/Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/"

and then symbolically linked (ln -s) to the the pydata-book from the root directory of the working-open-data folder. i.e., on OS X

cd /Users/raymondyee/D/Document/Working_with_Open_Data/working-open-data
ln -s /Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/ pydata-book

That way the files from the pydata-book repository look like they sit in the working-open-data directory -- without having to actually copy the files.

With this arrangment, I should then be able to drop your notebook into my own notebooks directory and run them without having to mess around with paths.



In [5]:

    
import os

NAMES_DIR = os.path.join(os.pardir, "pydata-book", "ch02", "names")

assert os.path.exists(NAMES_DIR)

Please make sure the above assertion works.

Baby names dataset

discussed in p. 35 of PfDA book

To download all the data, including that for 2011 and 2012: Popular Baby Names --> includes state by state data.

Loading all data into Pandas



In [6]:

    
# show the first five files in the NAMES_DIR

import glob
glob.glob(NAMES_DIR + "/*")[:5]









    Out[6]:





['../pydata-book/ch02/names/NationalReadMe.pdf',
 '../pydata-book/ch02/names/yob1880.txt',
 '../pydata-book/ch02/names/yob1881.txt',
 '../pydata-book/ch02/names/yob1882.txt',
 '../pydata-book/ch02/names/yob1883.txt']



In [7]:

    
# 2010 is the last available year in the pydata-book repo
import os

years = range(1880, 2011)

pieces = []
columns = ['name', 'sex', 'births']

for year in years:
    path = os.path.join(NAMES_DIR, 'yob%d.txt' % year)
    frame = pd.read_csv(path, names=columns)

    frame['year'] = year
    pieces.append(frame)

# Concatenate everything into a single DataFrame
names = pd.concat(pieces, ignore_index=True)

# why floats?  I'm not sure.
names.describe()









    Out[7]:






  
    
      
      births
      year
    
  
  
    
      count
       1690784.000000
       1690784.000000
    
    
      mean
           190.682386
          1969.454384
    
    
      std
          1615.899711
            32.823526
    
    
      min
             5.000000
          1880.000000
    
    
      25%
             7.000000
          1946.000000
    
    
      50%
            12.000000
          1979.000000
    
    
      75%
            32.000000
          1997.000000
    
    
      max
         99651.000000
          2010.000000
    
  

8 rows × 2 columns



In [8]:

    
# how many people, names, males and females  represented in names?

names.births.sum()









    Out[8]:





322402727



In [9]:

    
# F vs M

names.groupby('sex')['births'].sum()









    Out[9]:





sex
F      159990140
M      162412587
Name: births, dtype: int64



In [10]:

    
# total number of names

len(names.groupby('name'))









    Out[10]:





88496



In [11]:

    
# use pivot_table to collect records by year (rows) and sex (columns)

total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum)
total_births.head()









    Out[11]:






  
    
      sex
      F
      M
    
    
      year
      
      
    
  
  
    
      1880
        90993
       110493
    
    
      1881
        91955
       100748
    
    
      1882
       107851
       113687
    
    
      1883
       112322
       104632
    
    
      1884
       129021
       114445
    
  

5 rows × 2 columns



In [12]:

    
# You can use groupy to get equivalent pivot_table calculation

names.groupby('year').apply(lambda s: s.groupby('sex').agg('sum')).unstack()['births']









    Out[12]:






  
    
      sex
      F
      M
    
    
      year
      
      
    
  
  
    
      1880
         90993
        110493
    
    
      1881
         91955
        100748
    
    
      1882
        107851
        113687
    
    
      1883
        112322
        104632
    
    
      1884
        129021
        114445
    
    
      1885
        133056
        107802
    
    
      1886
        144538
        110785
    
    
      1887
        145983
        101412
    
    
      1888
        178631
        120857
    
    
      1889
        178369
        110590
    
    
      1890
        190377
        111026
    
    
      1891
        185486
        101198
    
    
      1892
        212350
        122038
    
    
      1893
        212908
        112319
    
    
      1894
        222923
        115775
    
    
      1895
        233632
        117398
    
    
      1896
        237924
        119575
    
    
      1897
        234199
        112760
    
    
      1898
        258771
        122703
    
    
      1899
        233022
        106218
    
    
      1900
        299873
        150554
    
    
      1901
        239351
        106478
    
    
      1902
        264079
        122660
    
    
      1903
        261976
        119240
    
    
      1904
        275375
        128129
    
    
      1905
        291641
        132319
    
    
      1906
        295301
        133159
    
    
      1907
        318558
        146838
    
    
      1908
        334277
        154339
    
    
      1909
        347191
        163983
    
    
      1910
        396416
        194198
    
    
      1911
        418180
        225936
    
    
      1912
        557939
        429926
    
    
      1913
        624317
        512482
    
    
      1914
        761376
        654746
    
    
      1915
        983824
        848647
    
    
      1916
       1044249
        890142
    
    
      1917
       1081194
        925512
    
    
      1918
       1157585
       1013720
    
    
      1919
       1130149
        980215
    
    
      1920
       1198214
       1064468
    
    
      1921
       1232845
       1101374
    
    
      1922
       1200796
       1088380
    
    
      1923
       1206239
       1096227
    
    
      1924
       1248821
       1132671
    
    
      1925
       1217217
       1115798
    
    
      1926
       1185078
       1110440
    
    
      1927
       1192207
       1126259
    
    
      1928
       1152836
       1107113
    
    
      1929
       1116284
       1074833
    
    
      1930
       1125521
       1096663
    
    
      1931
       1064233
       1038586
    
    
      1932
       1066930
       1043512
    
    
      1933
       1007523
        990677
    
    
      1934
       1043879
       1031962
    
    
      1935
       1048264
       1040649
    
    
      1936
       1040068
       1036662
    
    
      1937
       1063722
       1065964
    
    
      1938
       1103173
       1108480
    
    
      1939
       1096394
       1106328
    
    
      
      ...
      ...
    
  

131 rows × 2 columns



In [13]:

    
import seaborn



In [14]:

    
# how to calculate the total births / year

names.groupby('year').sum().plot(title="total births by year")









    Out[14]:





<matplotlib.axes.AxesSubplot at 0x114842450>



In [15]:

    
names.groupby('year').apply(lambda s: s.groupby('sex').agg('sum')).unstack()['births'].plot(title="births (M/F) by year")









    Out[15]:





<matplotlib.axes.AxesSubplot at 0x110af8ed0>



In [16]:

    
# from book: add prop to names

def add_prop(group):
    # Integer division floors
    births = group.births.astype(float)

    group['prop'] = births / births.sum()
    return group

names = names.groupby(['year', 'sex']).apply(add_prop)



In [17]:

    
# verify prop --> all adds up to 1

np.allclose(names.groupby(['year', 'sex']).prop.sum(), 1)









    Out[17]:





True



In [18]:

    
# number of records in full names dataframe

len(names)









    Out[18]:





1690784

How to do top1000 calculation

This section on the top1000 calculation is kept in here to provide some inspiration on how to work with baby names



In [19]:

    
#  from book: useful to work with top 1000 for each year/sex combo
# can use groupby/apply

names.groupby(['year', 'sex']).apply(lambda g: g.sort_index(by='births', ascending=False)[:1000])









    Out[19]:






  
    
      
      
      
      name
      sex
      births
      year
      prop
    
    
      year
      sex
      
      
      
      
      
      
    
  
  
    
      1880
      F
      0 
            Mary
       F
       7065
       1880
       0.077643
    
    
      1 
            Anna
       F
       2604
       1880
       0.028618
    
    
      2 
            Emma
       F
       2003
       1880
       0.022013
    
    
      3 
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4 
          Minnie
       F
       1746
       1880
       0.019188
    
    
      5 
        Margaret
       F
       1578
       1880
       0.017342
    
    
      6 
             Ida
       F
       1472
       1880
       0.016177
    
    
      7 
           Alice
       F
       1414
       1880
       0.015540
    
    
      8 
          Bertha
       F
       1320
       1880
       0.014507
    
    
      9 
           Sarah
       F
       1288
       1880
       0.014155
    
    
      10
           Annie
       F
       1258
       1880
       0.013825
    
    
      11
           Clara
       F
       1226
       1880
       0.013474
    
    
      12
            Ella
       F
       1156
       1880
       0.012704
    
    
      13
        Florence
       F
       1063
       1880
       0.011682
    
    
      14
            Cora
       F
       1045
       1880
       0.011484
    
    
      15
          Martha
       F
       1040
       1880
       0.011429
    
    
      16
           Laura
       F
       1012
       1880
       0.011122
    
    
      17
          Nellie
       F
        995
       1880
       0.010935
    
    
      18
           Grace
       F
        982
       1880
       0.010792
    
    
      19
          Carrie
       F
        949
       1880
       0.010429
    
    
      20
           Maude
       F
        858
       1880
       0.009429
    
    
      21
           Mabel
       F
        808
       1880
       0.008880
    
    
      22
          Bessie
       F
        794
       1880
       0.008726
    
    
      23
          Jennie
       F
        793
       1880
       0.008715
    
    
      24
        Gertrude
       F
        787
       1880
       0.008649
    
    
      25
           Julia
       F
        783
       1880
       0.008605
    
    
      26
          Hattie
       F
        769
       1880
       0.008451
    
    
      27
           Edith
       F
        768
       1880
       0.008440
    
    
      28
          Mattie
       F
        704
       1880
       0.007737
    
    
      29
            Rose
       F
        700
       1880
       0.007693
    
    
      30
       Catherine
       F
        688
       1880
       0.007561
    
    
      31
         Lillian
       F
        672
       1880
       0.007385
    
    
      32
             Ada
       F
        652
       1880
       0.007165
    
    
      33
          Lillie
       F
        647
       1880
       0.007110
    
    
      34
           Helen
       F
        636
       1880
       0.006990
    
    
      35
          Jessie
       F
        635
       1880
       0.006979
    
    
      36
          Louise
       F
        635
       1880
       0.006979
    
    
      37
           Ethel
       F
        633
       1880
       0.006957
    
    
      38
            Lula
       F
        621
       1880
       0.006825
    
    
      39
          Myrtle
       F
        615
       1880
       0.006759
    
    
      40
             Eva
       F
        614
       1880
       0.006748
    
    
      41
         Frances
       F
        605
       1880
       0.006649
    
    
      42
            Lena
       F
        603
       1880
       0.006627
    
    
      43
            Lucy
       F
        591
       1880
       0.006495
    
    
      44
            Edna
       F
        588
       1880
       0.006462
    
    
      45
          Maggie
       F
        582
       1880
       0.006396
    
    
      46
           Pearl
       F
        569
       1880
       0.006253
    
    
      47
           Daisy
       F
        564
       1880
       0.006198
    
    
      48
          Fannie
       F
        560
       1880
       0.006154
    
    
      49
       Josephine
       F
        544
       1880
       0.005978
    
    
      50
            Dora
       F
        524
       1880
       0.005759
    
    
      51
            Rosa
       F
        507
       1880
       0.005572
    
    
      52
       Katherine
       F
        502
       1880
       0.005517
    
    
      53
           Agnes
       F
        473
       1880
       0.005198
    
    
      54
           Marie
       F
        471
       1880
       0.005176
    
    
      55
            Nora
       F
        471
       1880
       0.005176
    
    
      56
             May
       F
        462
       1880
       0.005077
    
    
      57
           Mamie
       F
        436
       1880
       0.004792
    
    
      58
         Blanche
       F
        427
       1880
       0.004693
    
    
      59
          Stella
       F
        414
       1880
       0.004550
    
    
      
      
      
      ...
      ...
      ...
      ...
      ...
    
  

261877 rows × 5 columns



In [20]:

    
def get_top1000(group):
    return group.sort_index(by='births', ascending=False)[:1000]

grouped = names.groupby(['year', 'sex'])
top1000 = grouped.apply(get_top1000)
top1000.head()









    Out[20]:






  
    
      
      
      
      name
      sex
      births
      year
      prop
    
    
      year
      sex
      
      
      
      
      
      
    
  
  
    
      1880
      F
      0
            Mary
       F
       7065
       1880
       0.077643
    
    
      1
            Anna
       F
       2604
       1880
       0.028618
    
    
      2
            Emma
       F
       2003
       1880
       0.022013
    
    
      3
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4
          Minnie
       F
       1746
       1880
       0.019188
    
  

5 rows × 5 columns



In [21]:

    
# Do pivot table: row: year and cols= names for top 1000

top_births = top1000.pivot_table('births', rows='year', cols='name', aggfunc=np.sum)
top_births.tail()









    Out[21]:






  
    
      name
      Aaden
      Aaliyah
      Aarav
      Aaron
      Aarush
      Ab
      Abagail
      Abb
      Abbey
      Abbie
      Abbigail
      Abbott
      Abby
      Abdiel
      Abdul
      Abdullah
      Abe
      Abel
      Abelardo
      Abigail
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
        NaN
       3737
       NaN
       8279
       NaN
      NaN
       297
      NaN
       404
       440
       630
      NaN
       1682
       NaN
      NaN
       219
      NaN
        922
      NaN
       15615
      ...
    
    
      2007
        NaN
       3941
       NaN
       8914
       NaN
      NaN
       313
      NaN
       349
       468
       651
      NaN
       1573
       NaN
      NaN
       224
      NaN
        939
      NaN
       15447
      ...
    
    
      2008
        955
       4028
       219
       8511
       NaN
      NaN
       317
      NaN
       344
       400
       608
      NaN
       1328
       199
      NaN
       210
      NaN
        863
      NaN
       15045
      ...
    
    
      2009
       1265
       4352
       270
       7936
       NaN
      NaN
       296
      NaN
       307
       369
       675
      NaN
       1274
       229
      NaN
       256
      NaN
        960
      NaN
       14342
      ...
    
    
      2010
        448
       4628
       438
       7374
       226
      NaN
       277
      NaN
       295
       324
       585
      NaN
       1140
       264
      NaN
       225
      NaN
       1119
      NaN
       14124
      ...
    
  

5 rows × 6865 columns



In [22]:

    
# is your name in the top_births list?

top_births['Raymond'].plot(title='plot for Raymond')









    Out[22]:





<matplotlib.axes.AxesSubplot at 0x113383910>



In [23]:

    
# for Aaden, which shows up at the end

top_births.Aaden.plot(xlim=[1880,2010])









    Out[23]:





<matplotlib.axes.AxesSubplot at 0x112435690>



In [24]:

    
# number of names represented in top_births

len(top_births.columns)









    Out[24]:





6865



In [25]:

    
# how to get the most popular name of all time in top_births?

most_common_names = top_births.sum()
most_common_names.sort(ascending=False)

most_common_names.head()









    Out[25]:





name
James      5071647
John       5060953
Robert     4787187
Michael    4263083
Mary       4117746
dtype: float64



In [26]:

    
# as of mpl v 0.1 (2014.03.04), the name labeling doesn't work -- so disble mpld3 for this figure

mpld3.disable_notebook()
plt.figure()
most_common_names[:50][::-1].plot(kind='barh', figsize=(10,10))









    Out[26]:





<matplotlib.axes.AxesSubplot at 0x112a25b90>



In [27]:

    
# turn mpld3 back on

mpld3.enable_notebook()

all_births pivot table



In [28]:

    
# instead of top_birth -- get all_births

all_births = names.pivot_table('births', rows='year', cols='name', aggfunc=sum)



In [29]:

    
all_births = all_births.fillna(0)
all_births.tail()









    Out[29]:






  
    
      name
      Aaban
      Aabid
      Aabriella
      Aadam
      Aadan
      Aadarsh
      Aaden
      Aadesh
      Aadhav
      Aadhavan
      Aadhya
      Aadi
      Aadil
      Aadin
      Aadison
      Aadit
      Aadith
      Aaditri
      Aaditya
      Aadon
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
       0
       0
       0
       9
        0
       14
         55
       0
        5
       0
        0
       74
       11
        0
       0
       17
       0
       0
       42
        7
      ...
    
    
      2007
       5
       0
       0
       8
        8
       13
        155
       0
        0
       0
       10
       72
       15
       10
       0
       31
       7
       0
       43
       10
      ...
    
    
      2008
       0
       0
       5
       6
       22
       13
        955
       0
        0
       0
        9
       76
       20
       22
       0
       24
       5
       0
       51
       10
      ...
    
    
      2009
       6
       0
       0
       9
       23
       16
       1270
       5
        5
       0
       18
       76
       17
       25
       6
       12
       0
       0
       38
       23
      ...
    
    
      2010
       9
       0
       0
       7
       11
        0
        448
       0
       13
       5
       19
       54
       11
       18
       0
       23
       0
       5
       37
        0
      ...
    
  

5 rows × 88496 columns



In [31]:

    
# set up to do start/end calculation

all_births_cumsum = all_births.apply(lambda s: s.cumsum(), axis=0)



In [32]:

    
all_births_cumsum.tail()









    Out[32]:






  
    
      name
      Aaban
      Aabid
      Aabriella
      Aadam
      Aadan
      Aadarsh
      Aaden
      Aadesh
      Aadhav
      Aadhavan
      Aadhya
      Aadi
      Aadil
      Aadin
      Aadison
      Aadit
      Aadith
      Aaditri
      Aaditya
      Aadon
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
        0
       5
       0
       103
        5
        67
        149
        5
       11
       0
        0
       171
       175
       10
       0
        67
        5
       0
       153
       18
      ...
    
    
      2007
        5
       5
       0
       111
       13
        80
        304
        5
       11
       0
       10
       243
       190
       20
       0
        98
       12
       0
       196
       28
      ...
    
    
      2008
        5
       5
       5
       117
       35
        93
       1259
        5
       11
       0
       19
       319
       210
       42
       0
       122
       17
       0
       247
       38
      ...
    
    
      2009
       11
       5
       5
       126
       58
       109
       2529
       10
       16
       0
       37
       395
       227
       67
       6
       134
       17
       0
       285
       61
      ...
    
    
      2010
       20
       5
       5
       133
       69
       109
       2977
       10
       29
       5
       56
       449
       238
       85
       6
       157
       17
       5
       322
       61
      ...
    
  

5 rows × 88496 columns

Names that are both M and F



In [33]:

    
# remind ourselves of what's in names

names.head()









    Out[33]:






  
    
      
      name
      sex
      births
      year
      prop
    
  
  
    
      0
            Mary
       F
       7065
       1880
       0.077643
    
    
      1
            Anna
       F
       2604
       1880
       0.028618
    
    
      2
            Emma
       F
       2003
       1880
       0.022013
    
    
      3
       Elizabeth
       F
       1939
       1880
       0.021309
    
    
      4
          Minnie
       F
       1746
       1880
       0.019188
    
  

5 rows × 5 columns



In [34]:

    
# columns in names

names.columns









    Out[34]:





Index([u'name', u'sex', u'births', u'year', u'prop'], dtype='object')

Calculating ambigendered names



In [35]:

    
# calculate set of male_only, female_only, ambigender names

def calc_of_sex_of_names():

    k = names.groupby('sex').apply(lambda s: set(list(s['name'])))
    male_only_names = k['M'] - k['F']
    female_only_names = k['F'] - k['M']
    ambi_names = k['F'] & k['M'] # intersection of two 
    return {'male_only_names': male_only_names, 
            'female_only_names': female_only_names,
            'ambi_names': ambi_names }
    
names_by_sex = calc_of_sex_of_names() 
ambi_names_array = np.array(list(names_by_sex['ambi_names']))

[(k, len(v)) for (k,v) in names_by_sex.items()]









    Out[35]:





[('female_only_names', 51754),
 ('male_only_names', 27090),
 ('ambi_names', 9652)]



In [36]:

    
# total number of people in names
names.births.sum()









    Out[36]:





322402727



In [37]:

    
# pivot table of ambigendered names to aggregate 

names_ambi = names[np.in1d(names.name,ambi_names_array)]
ambi_names_pt = names_ambi.pivot_table('births',
                            rows='year', 
                            cols=['name','sex'], 
                            aggfunc='sum')



In [38]:

    
# total number of people in k1 -- almost everyone!

ambi_names_pt.sum().sum()









    Out[38]:





299879378.0



In [40]:

    
# fill n/a with 0 and look at the table at the end

ambi_names_pt=ambi_names_pt.fillna(0L)
ambi_names_pt.tail()









    Out[40]:






  
    
      name
      Aaden
      Aadi
      Aadyn
      Aalijah
      Aaliyah
      Aamari
      Aaren
      Aareon
      Aarian
      Aarin
      
    
    
      sex
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      F
      M
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
       0
         55
       5
       69
       0
       16
       5
        5
       3737
       0
        5
        0
       0
       26
       0
        0
       0
       0
       10
       12
      ...
    
    
      2007
       0
        155
       0
       72
       0
       27
       8
       10
       3941
       0
       10
       10
       0
       26
       0
        5
       0
       6
        6
       20
      ...
    
    
      2008
       0
        955
       0
       76
       9
       56
       5
       15
       4028
       0
        5
        9
       0
       29
       0
        0
       0
       0
        9
       16
      ...
    
    
      2009
       5
       1265
       0
       76
       7
       76
       7
       12
       4352
       0
        0
        8
       0
       28
       0
        6
       6
       7
        0
       19
      ...
    
    
      2010
       0
        448
       0
       54
       0
       38
       0
       15
       4628
       6
        8
        8
       5
       30
       0
       11
       0
       5
        7
       21
      ...
    
  

5 rows × 19304 columns



In [41]:

    
# plot M, F in ambigender_names over time
ambi_names_pt.T.xs('M',level='sex').sum().cumsum()









    Out[41]:





year
1880     106651
1881     204087
1882     313916
1883     415179
1884     525828
1885     630369
1886     737903
1887     836292
1888     953442
1889    1060938
1890    1168749
1891    1267012
1892    1385329
1893    1494578
1894    1606974
...
1996    130749651
1997    132512323
1998    134292862
1999    136076209
2000    137891972
2001    139680202
2002    141462692
2003    143272468
2004    145083717
2005    146897504
2006    148753457
2007    150617962
2008    152440177
2009    154199909
2010    155887704
Length: 131, dtype: float64



In [42]:

    
ambi_names_pt.T.xs('F',level='sex').sum().cumsum()









    Out[42]:





year
1880      85843
1881     172815
1882     274572
1883     380536
1884     501868
1885     626787
1886     762777
1887     899953
1888    1067632
1889    1235155
1890    1413925
1891    1588115
1892    1787380
1893    1987334
1894    2196434
...
1996    123666074
1997    125136589
1998    126618849
1999    128096077
2000    129593542
2001    131064314
2002    132524950
2003    133994760
2004    135461519
2005    136920409
2006    138398341
2007    139872628
2008    141304691
2009    142678345
2010    143991674
Length: 131, dtype: float64



In [43]:

    
# don't know what pivot table has type float
# https://github.com/pydata/pandas/issues/3283
ambi_names_pt['Raymond', 'M'].dtype









    Out[43]:





dtype('float64')



In [44]:

    
# calculate proportion of males for given name

def prop_male(name):
    return (ambi_names_pt[name]['M']/ \
    ((ambi_names_pt[name]['M'] + ambi_names_pt[name]['F'])))

def prop_c_male(name):
    return (ambi_names_pt[name]['M'].cumsum()/ \
    ((ambi_names_pt[name]['M'].cumsum() + ambi_names_pt[name]['F'].cumsum())))



In [45]:

    
prop_c_male('Leslie').plot()









    Out[45]:





<matplotlib.axes.AxesSubplot at 0x112a0c7d0>



In [46]:

    
# I couldn't figure out a way of iterating over the names rather than names/sex combo in
# a vectorized way.  

from itertools import islice

names_to_calc = list(islice(list(ambi_names_pt.T.index.levels[0]),None))

m = [(name_, ambi_names_pt[name_]['M']/(ambi_names_pt[name_]['F'] + ambi_names_pt[name_]['M']))  \
     for name_ in names_to_calc]
p_m_instant = DataFrame(dict(m))
p_m_instant.tail()









    Out[46]:






  
    
      
      Aaden
      Aadi
      Aadyn
      Aalijah
      Aaliyah
      Aamari
      Aaren
      Aareon
      Aarian
      Aarin
      Aarion
      Aaris
      Aaron
      Aarya
      Aaryn
      Aba
      Abba
      Abbey
      Abbie
      Abbigail
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
       1.000000
       0.932432
       1.000000
       0.500000
       0.000000
       0.000000
       1.000000
      NaN
            NaN
       0.545455
       0.730769
       1.000000
       0.997109
       0.481481
       0.595745
        1
      NaN
       0
       0
       0
      ...
    
    
      2007
       1.000000
       1.000000
       1.000000
       0.555556
       0.000000
       0.500000
       1.000000
        1
       1.000000
       0.769231
       0.794118
       0.454545
       0.997426
       0.240506
       0.518519
      NaN
      NaN
       0
       0
       0
      ...
    
    
      2008
       1.000000
       1.000000
       0.861538
       0.750000
       0.000000
       0.642857
       1.000000
      NaN
            NaN
       0.640000
       0.666667
            NaN
       0.996604
       0.213333
       0.480519
      NaN
      NaN
       0
       0
       0
      ...
    
    
      2009
       0.996063
       1.000000
       0.915663
       0.631579
       0.000000
       1.000000
       1.000000
        1
       0.538462
       1.000000
       0.750000
            NaN
       0.995984
       0.247312
       0.406250
      NaN
      NaN
       0
       0
       0
      ...
    
    
      2010
       1.000000
       1.000000
       1.000000
       1.000000
       0.001295
       0.500000
       0.857143
        1
       1.000000
       0.750000
       1.000000
            NaN
       0.996891
       0.265306
       0.340000
      NaN
      NaN
       0
       0
       0
      ...
    
  

5 rows × 9652 columns



In [47]:

    
# similar calculation except instead of looking at the proportions for a given year only,
# we look at the cumulative number of male/female babies for given name

from itertools import islice

names_to_calc = list(islice(list(ambi_names_pt.T.index.levels[0]),None))

m = [(name_, ambi_names_pt[name_]['M'].cumsum()/(ambi_names_pt[name_]['F'].cumsum() + ambi_names_pt[name_]['M'].cumsum()))  \
     for name_ in names_to_calc]
p_m_cum = DataFrame(dict(m))
p_m_cum.tail()









    Out[47]:






  
    
      
      Aaden
      Aadi
      Aadyn
      Aalijah
      Aaliyah
      Aamari
      Aaren
      Aareon
      Aarian
      Aarin
      Aarion
      Aaris
      Aaron
      Aarya
      Aaryn
      Aba
      Abba
      Abbey
      Abbie
      Abbigail
      
    
    
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2006
       1.000000
       0.970760
       1.000000
       0.461538
       0.001677
       0.289474
       0.650694
       0.500000
       0.238095
       0.500000
       0.714667
       0.52381
       0.991825
       0.481818
       0.391437
       0.185185
       0.666667
       0.002404
       0.017656
       0.000761
      ...
    
    
      2007
       1.000000
       0.979424
       1.000000
       0.477064
       0.001494
       0.362069
       0.661783
       0.600000
       0.407407
       0.512727
       0.721271
       0.50000
       0.991925
       0.418060
       0.398068
       0.185185
       0.666667
       0.002348
       0.017220
       0.000693
      ...
    
    
      2008
       1.000000
       0.984326
       0.934783
       0.519380
       0.001344
       0.416667
       0.673349
       0.600000
       0.407407
       0.518261
       0.718245
       0.50000
       0.992003
       0.377005
       0.403777
       0.185185
       0.666667
       0.002295
       0.016863
       0.000639
      ...
    
    
      2009
       0.998023
       0.987342
       0.927602
       0.533784
       0.001213
       0.475000
       0.683790
       0.677419
       0.450000
       0.533670
       0.719647
       0.50000
       0.992064
       0.351178
       0.403912
       0.185185
       0.666667
       0.002250
       0.016547
       0.000588
      ...
    
    
      2010
       0.998320
       0.988864
       0.938224
       0.576687
       0.001221
       0.479167
       0.690450
       0.761905
       0.511111
       0.543408
       0.729787
       0.50000
       0.992131
       0.336283
       0.401305
       0.185185
       0.666667
       0.002208
       0.016280
       0.000550
      ...
    
  

5 rows × 9652 columns



In [48]:

    
p_m_cum['Donnie'].plot()









    Out[48]:





<matplotlib.axes.AxesSubplot at 0x11bdbd490>



In [54]:

    
# some metrics that attempt to measure how a time series s has changed

def min_max_range(s):
    """range of s signed -- positive if slope between two points p +ve and negative
    otherwise; 0 if slope is 0"""
    # note np.argmax, np.argmin returns the position of first occurence of global max, min
    sign = np.sign(np.argmax(s) - np.argmin(s))
    if sign == 0:
        return 0.0
    else:
        return sign*(np.max(s) - np.min(s))

def last_first_diff(s):
    """difference between latest and earliest value"""
    s0 = s.dropna()
    return (s0.iloc[-1] - s0.iloc[0])



In [55]:

    
# population distributions of ambinames 
# might want to remove from consideration instances when total ratio is too great
# or range of existence of a name/sex combo too short

total_pop_ambiname = all_births.sum()[np.in1d(all_births.sum().index, ambi_names_array)]
total_pop_ambiname.sort(ascending=False)
total_pop_ambiname.plot(logy=True)









    Out[55]:





<matplotlib.axes.AxesSubplot at 0x138c1c290>



In [56]:

    
# now calculate a DataFrame to visualize results

# calculate the total population, the change in p_m from last to first appearance, 
# the change from max to min in p_m, and the percentage of males overall for name

df = DataFrame()
df['total_pop'] = total_pop_ambiname
df['last_first_diff'] = p_m_cum.apply(last_first_diff)
df['min_max_range'] = p_m_cum.apply(min_max_range)
df['abs_min_max_range'] = np.abs(df.min_max_range)
df['p_m'] = p_m_cum.iloc[-1]

# distance from full ambigender -- p_m=0.5 leads to 1, p_m=1 or 0 -> 0
df['ambi_index'] = df.p_m.apply(lambda p: 1 - 2* np.abs(p-0.5))

df.head()









    Out[56]:






  
    
      
      total_pop
      last_first_diff
      min_max_range
      abs_min_max_range
      p_m
      ambi_index
    
    
      name
      
      
      
      
      
      
    
  
  
    
      James
       5072771
      -0.000845
      -0.002123
       0.002123
       0.995457
       0.009085
    
    
      John
       5061897
       0.000479
      -0.001921
       0.001921
       0.995737
       0.008526
    
    
      Robert
       4788050
       0.000344
       0.002027
       0.002027
       0.995811
       0.008377
    
    
      Michael
       4265373
      -0.005034
      -0.006425
       0.006425
       0.994966
       0.010067
    
    
      Mary
       4119074
      -0.000132
      -0.000829
       0.000829
       0.003675
       0.007351
    
  

5 rows × 6 columns



In [57]:

    
# plot: x -> log10 of total population, y->how p_m has changed from first to last
# turn off d3 for this plot

mpld3.disable_notebook()
plt.scatter(np.log10(df.total_pop), df.last_first_diff, s=1)









    Out[57]:





<matplotlib.collections.PathCollection at 0x111e57f50>



In [58]:

    
# turn d3 back on

mpld3.enable_notebook()



In [59]:

    
# general directionality counts -- looking for over asymmetry

df.groupby(np.sign(df.last_first_diff)).count()









    Out[59]:






  
    
      
      total_pop
      last_first_diff
      min_max_range
      abs_min_max_range
      p_m
      ambi_index
    
    
      last_first_diff
      
      
      
      
      
      
    
  
  
    
      -1
       4890
       4890
       4890
       4890
       4890
       4890
    
    
       0
         24
         24
         24
         24
         24
         24
    
    
       1
       4738
       4738
       4738
       4738
       4738
       4738
    
  

3 rows × 6 columns



In [60]:

    
# let's concentrate on more populous names that have seen big swings in the cumulative p_m

# you can play with the population and range filter
popular_names_with_shifts = df[(df.total_pop>5000) & (df.abs_min_max_range >0.7)]
popular_names_with_shifts.sort_index(by="abs_min_max_range", ascending=False)









    Out[60]:






  
    
      
      total_pop
      last_first_diff
      min_max_range
      abs_min_max_range
      p_m
      ambi_index
    
    
      name
      
      
      
      
      
      
    
  
  
    
      Hailey
       123318
      -0.998151
      -0.998151
       0.998151
       0.001849
       0.003698
    
    
      Abbey
        15854
      -0.997792
      -0.997802
       0.997802
       0.002208
       0.004415
    
    
      Summer
        64702
      -0.997002
      -0.997002
       0.997002
       0.002998
       0.005997
    
    
      Raegan
         9744
      -0.990148
      -0.995873
       0.995873
       0.009852
       0.019704
    
    
      Bria
        11160
      -0.995072
      -0.995072
       0.995072
       0.004928
       0.009857
    
    
      Fallon
         7476
      -0.972311
      -0.994122
       0.994122
       0.027689
       0.055377
    
    
      Chanel
        14087
      -0.993966
      -0.993966
       0.993966
       0.006034
       0.012068
    
    
      Star
         6684
      -0.983543
      -0.993738
       0.993738
       0.016457
       0.032914
    
    
      Holly
       196587
      -0.992161
      -0.992161
       0.992161
       0.007839
       0.015678
    
    
      Nigel
        10501
       0.991906
       0.991906
       0.991906
       0.991906
       0.016189
    
    
      Michele
       225226
      -0.988372
      -0.991136
       0.991136
       0.011628
       0.023257
    
    
      Nova
         6899
      -0.930135
      -0.990991
       0.990991
       0.069865
       0.139730
    
    
      Ronda
        34628
      -0.989633
      -0.989633
       0.989633
       0.010367
       0.020735
    
    
      Paige
       122569
      -0.989198
      -0.989198
       0.989198
       0.010802
       0.021604
    
    
      Brooke
       173658
      -0.988489
      -0.988489
       0.988489
       0.011511
       0.023022
    
    
      Beverly
       380492
      -0.987824
      -0.987824
       0.987824
       0.012176
       0.024353
    
    
      Lauren
       450853
      -0.987302
      -0.987302
       0.987302
       0.012698
       0.025396
    
    
      Alexus
        17835
      -0.987272
      -0.987286
       0.987286
       0.012728
       0.025456
    
    
      Allison
       262727
      -0.985826
      -0.985826
       0.985826
       0.014174
       0.028349
    
    
      Cordell
         9464
       0.984362
       0.984362
       0.984362
       0.984362
       0.031276
    
    
      Lauri
        11199
      -0.983302
      -0.983302
       0.983302
       0.016698
       0.033396
    
    
      Joy
       131572
      -0.981827
      -0.981862
       0.981862
       0.018173
       0.036345
    
    
      Ashley
       832350
      -0.981496
      -0.981496
       0.981496
       0.018504
       0.037008
    
    
      Lyric
         8899
      -0.838409
      -0.980916
       0.980916
       0.161591
       0.323182
    
    
      Christy
        99452
      -0.980734
      -0.980760
       0.980760
       0.019266
       0.038531
    
    
      Kenna
         7979
      -0.980323
      -0.980659
       0.980659
       0.019677
       0.039353
    
    
      Tyrese
         7582
       0.980480
       0.980480
       0.980480
       0.980480
       0.039040
    
    
      Robby
        10399
       0.979229
       0.979229
       0.979229
       0.979229
       0.041542
    
    
      Mallory
        48990
      -0.977648
      -0.977648
       0.977648
       0.022352
       0.044703
    
    
      Madison
       308970
      -0.976341
      -0.976341
       0.976341
       0.023659
       0.047319
    
    
      Jermaine
        39286
       0.975386
       0.975386
       0.975386
       0.975386
       0.049229
    
    
      Shelly
        86081
      -0.974570
      -0.974570
       0.974570
       0.025430
       0.050859
    
    
      Carley
        14885
      -0.972455
      -0.972455
       0.972455
       0.027545
       0.055089
    
    
      Lacey
        47635
      -0.969770
      -0.969770
       0.969770
       0.030230
       0.060460
    
    
      Ainsley
         8817
      -0.968357
      -0.968357
       0.968357
       0.031643
       0.063287
    
    
      Santana
         7399
       0.422760
       0.966667
       0.966667
       0.422760
       0.845520
    
    
      Kelsey
       144166
      -0.964811
      -0.964811
       0.964811
       0.035189
       0.070377
    
    
      Ansley
         7202
      -0.964315
      -0.964315
       0.964315
       0.035685
       0.071369
    
    
      Ronnie
       186260
       0.960781
       0.964046
       0.964046
       0.960781
       0.078439
    
    
      Kay
       101704
      -0.962479
      -0.962605
       0.962605
       0.037521
       0.075041
    
    
      Delaney
        27608
      -0.962402
      -0.962402
       0.962402
       0.037598
       0.075196
    
    
      Lindsay
       131956
      -0.961351
      -0.961351
       0.961351
       0.038649
       0.077298
    
    
      Lesly
        12407
      -0.959942
      -0.959942
       0.959942
       0.040058
       0.080116
    
    
      Marquise
         9308
       0.957886
       0.957886
       0.957886
       0.957886
       0.084229
    
    
      Kenzie
         8793
      -0.956443
      -0.956443
       0.956443
       0.043557
       0.087115
    
    
      Hillary
        28763
      -0.956159
      -0.956159
       0.956159
       0.043841
       0.087682
    
    
      Mckenzie
        44315
      -0.953988
      -0.953988
       0.953988
       0.046012
       0.092023
    
    
      Linsey
         5138
      -0.953095
      -0.953095
       0.953095
       0.046905
       0.093811
    
    
      Lindsey
       159977
      -0.952274
      -0.952274
       0.952274
       0.047726
       0.095451
    
    
      Shamar
         5093
       0.951109
       0.951109
       0.951109
       0.951109
       0.097781
    
    
      Kinsey
         5800
      -0.951034
      -0.951034
       0.951034
       0.048966
       0.097931
    
    
      Sydney
       156602
      -0.943922
      -0.943922
       0.943922
       0.056078
       0.112157
    
    
      Kimber
         5455
      -0.943538
      -0.943538
       0.943538
       0.056462
       0.112924
    
    
      Raven
        37100
      -0.927143
      -0.943274
       0.943274
       0.072857
       0.145714
    
    
      Meredith
        73898
      -0.942502
      -0.942502
       0.942502
       0.057498
       0.114996
    
    
      Cassidy
        49871
      -0.941349
      -0.941349
       0.941349
       0.058651
       0.117303
    
    
      Whitney
        98164
      -0.940701
      -0.940701
       0.940701
       0.059299
       0.118597
    
    
      Richie
         6540
       0.938532
       0.938532
       0.938532
       0.938532
       0.122936
    
    
      Diamond
        32377
      -0.936776
      -0.936776
       0.936776
       0.063224
       0.126448
    
    
      Gay
        19363
      -0.928678
      -0.928678
       0.928678
       0.071322
       0.142643
    
    
      
      ...
      ...
      ...
      ...
      ...
      ...
    
  

150 rows × 6 columns



In [61]:

    
popular_names_with_shifts.groupby(np.sign(df.last_first_diff)).count()









    Out[61]:






  
    
      
      total_pop
      last_first_diff
      min_max_range
      abs_min_max_range
      p_m
      ambi_index
    
    
      last_first_diff
      
      
      
      
      
      
    
  
  
    
      -1
       116
       116
       116
       116
       116
       116
    
    
       1
        34
        34
        34
        34
        34
        34
    
  

2 rows × 6 columns



In [ ]:

    
#popular_names_with_shifts.to_pickle('popular_names_with_shifts.pickle')



In [62]:

    
fig, ax = plt.subplots(subplot_kw=dict(axisbg='#EEEEEE'))
x = np.log10(popular_names_with_shifts.total_pop)
y = popular_names_with_shifts.min_max_range 

scatter = ax.scatter(x, y)

ax.grid(color='white', linestyle='solid')
ax.set_title("Populous Names with Major Sex Shift", size=20)
ax.set_xlabel('log10(total_pop)')
ax.set_ylabel('min_max_range')

#labels = ['point {0}'.format(i + 1) for i in range(len(x))]
labels = list(popular_names_with_shifts.index)
tooltip = plugins.PointLabelTooltip(scatter, labels=labels)
plugins.connect(fig, tooltip)



In [63]:

    
prop_c_male('Ronnie').plot()









    Out[63]:





<matplotlib.axes.AxesSubplot at 0x138c2c910>

	births	year
count	1690784.000000	1690784.000000
mean	190.682386	1969.454384
std	1615.899711	32.823526
min	5.000000	1880.000000
25%	7.000000	1946.000000
50%	12.000000	1979.000000
75%	32.000000	1997.000000
max	99651.000000	2010.000000

sex	F	M
year
1880	90993	110493
1881	91955	100748
1882	107851	113687
1883	112322	104632
1884	129021	114445

			name	sex	births	year	prop
year	sex
1880	F	0	Mary	F	7065	1880	0.077643
		1	Anna	F	2604	1880	0.028618
		2	Emma	F	2003	1880	0.022013
		3	Elizabeth	F	1939	1880	0.021309
		4	Minnie	F	1746	1880	0.019188
		5	Margaret	F	1578	1880	0.017342
		6	Ida	F	1472	1880	0.016177
		7	Alice	F	1414	1880	0.015540
		8	Bertha	F	1320	1880	0.014507
		9	Sarah	F	1288	1880	0.014155
		10	Annie	F	1258	1880	0.013825
		11	Clara	F	1226	1880	0.013474
		12	Ella	F	1156	1880	0.012704
		13	Florence	F	1063	1880	0.011682
		14	Cora	F	1045	1880	0.011484
		15	Martha	F	1040	1880	0.011429
		16	Laura	F	1012	1880	0.011122
		17	Nellie	F	995	1880	0.010935
		18	Grace	F	982	1880	0.010792
		19	Carrie	F	949	1880	0.010429
		20	Maude	F	858	1880	0.009429
		21	Mabel	F	808	1880	0.008880
		22	Bessie	F	794	1880	0.008726
		23	Jennie	F	793	1880	0.008715
		24	Gertrude	F	787	1880	0.008649
		25	Julia	F	783	1880	0.008605
		26	Hattie	F	769	1880	0.008451
		27	Edith	F	768	1880	0.008440
		28	Mattie	F	704	1880	0.007737
		29	Rose	F	700	1880	0.007693
		30	Catherine	F	688	1880	0.007561
		31	Lillian	F	672	1880	0.007385
		32	Ada	F	652	1880	0.007165
		33	Lillie	F	647	1880	0.007110
		34	Helen	F	636	1880	0.006990
		35	Jessie	F	635	1880	0.006979
		36	Louise	F	635	1880	0.006979
		37	Ethel	F	633	1880	0.006957
		38	Lula	F	621	1880	0.006825
		39	Myrtle	F	615	1880	0.006759
		40	Eva	F	614	1880	0.006748
		41	Frances	F	605	1880	0.006649
		42	Lena	F	603	1880	0.006627
		43	Lucy	F	591	1880	0.006495
		44	Edna	F	588	1880	0.006462
		45	Maggie	F	582	1880	0.006396
		46	Pearl	F	569	1880	0.006253
		47	Daisy	F	564	1880	0.006198
		48	Fannie	F	560	1880	0.006154
		49	Josephine	F	544	1880	0.005978
		50	Dora	F	524	1880	0.005759
		51	Rosa	F	507	1880	0.005572
		52	Katherine	F	502	1880	0.005517
		53	Agnes	F	473	1880	0.005198
		54	Marie	F	471	1880	0.005176
		55	Nora	F	471	1880	0.005176
		56	May	F	462	1880	0.005077
		57	Mamie	F	436	1880	0.004792
		58	Blanche	F	427	1880	0.004693
		59	Stella	F	414	1880	0.004550
			...	...	...	...	...

name	Aaden	Aaliyah	Aarav	Aaron	Aarush	Ab	Abagail	Abb	Abbey	Abbie	Abbigail	Abbott	Abby	Abdiel	Abdul	Abdullah	Abe	Abel	Abelardo	Abigail
year
2006	NaN	3737	NaN	8279	NaN	NaN	297	NaN	404	440	630	NaN	1682	NaN	NaN	219	NaN	922	NaN	15615	...
2007	NaN	3941	NaN	8914	NaN	NaN	313	NaN	349	468	651	NaN	1573	NaN	NaN	224	NaN	939	NaN	15447	...
2008	955	4028	219	8511	NaN	NaN	317	NaN	344	400	608	NaN	1328	199	NaN	210	NaN	863	NaN	15045	...
2009	1265	4352	270	7936	NaN	NaN	296	NaN	307	369	675	NaN	1274	229	NaN	256	NaN	960	NaN	14342	...
2010	448	4628	438	7374	226	NaN	277	NaN	295	324	585	NaN	1140	264	NaN	225	NaN	1119	NaN	14124	...

	Aaden	Aadi	Aadyn	Aalijah	Aaliyah	Aamari	Aaren	Aareon	Aarian	Aarin	Aarion	Aaris	Aaron	Aarya	Aaryn	Aba	Abba	Abbey	Abbie	Abbigail
year
2006	1.000000	0.932432	1.000000	0.500000	0.000000	0.000000	1.000000	NaN	NaN	0.545455	0.730769	1.000000	0.997109	0.481481	0.595745	1	NaN	0	0	0	...
2007	1.000000	1.000000	1.000000	0.555556	0.000000	0.500000	1.000000	1	1.000000	0.769231	0.794118	0.454545	0.997426	0.240506	0.518519	NaN	NaN	0	0	0	...
2008	1.000000	1.000000	0.861538	0.750000	0.000000	0.642857	1.000000	NaN	NaN	0.640000	0.666667	NaN	0.996604	0.213333	0.480519	NaN	NaN	0	0	0	...
2009	0.996063	1.000000	0.915663	0.631579	0.000000	1.000000	1.000000	1	0.538462	1.000000	0.750000	NaN	0.995984	0.247312	0.406250	NaN	NaN	0	0	0	...
2010	1.000000	1.000000	1.000000	1.000000	0.001295	0.500000	0.857143	1	1.000000	0.750000	1.000000	NaN	0.996891	0.265306	0.340000	NaN	NaN	0	0	0	...

	total_pop	last_first_diff	min_max_range	abs_min_max_range	p_m	ambi_index
name
James	5072771	-0.000845	-0.002123	0.002123	0.995457	0.009085
John	5061897	0.000479	-0.001921	0.001921	0.995737	0.008526
Robert	4788050	0.000344	0.002027	0.002027	0.995811	0.008377
Michael	4265373	-0.005034	-0.006425	0.006425	0.994966	0.010067
Mary	4119074	-0.000132	-0.000829	0.000829	0.003675	0.007351

	total_pop	last_first_diff	min_max_range	abs_min_max_range	p_m	ambi_index
last_first_diff
-1	4890	4890	4890	4890	4890	4890
0	24	24	24	24	24	24
1	4738	4738	4738	4738	4738	4738

	total_pop	last_first_diff	min_max_range	abs_min_max_range	p_m	ambi_index
name
Hailey	123318	-0.998151	-0.998151	0.998151	0.001849	0.003698
Abbey	15854	-0.997792	-0.997802	0.997802	0.002208	0.004415
Summer	64702	-0.997002	-0.997002	0.997002	0.002998	0.005997
Raegan	9744	-0.990148	-0.995873	0.995873	0.009852	0.019704
Bria	11160	-0.995072	-0.995072	0.995072	0.004928	0.009857
Fallon	7476	-0.972311	-0.994122	0.994122	0.027689	0.055377
Chanel	14087	-0.993966	-0.993966	0.993966	0.006034	0.012068
Star	6684	-0.983543	-0.993738	0.993738	0.016457	0.032914
Holly	196587	-0.992161	-0.992161	0.992161	0.007839	0.015678
Nigel	10501	0.991906	0.991906	0.991906	0.991906	0.016189
Michele	225226	-0.988372	-0.991136	0.991136	0.011628	0.023257
Nova	6899	-0.930135	-0.990991	0.990991	0.069865	0.139730
Ronda	34628	-0.989633	-0.989633	0.989633	0.010367	0.020735
Paige	122569	-0.989198	-0.989198	0.989198	0.010802	0.021604
Brooke	173658	-0.988489	-0.988489	0.988489	0.011511	0.023022
Beverly	380492	-0.987824	-0.987824	0.987824	0.012176	0.024353
Lauren	450853	-0.987302	-0.987302	0.987302	0.012698	0.025396
Alexus	17835	-0.987272	-0.987286	0.987286	0.012728	0.025456
Allison	262727	-0.985826	-0.985826	0.985826	0.014174	0.028349
Cordell	9464	0.984362	0.984362	0.984362	0.984362	0.031276
Lauri	11199	-0.983302	-0.983302	0.983302	0.016698	0.033396
Joy	131572	-0.981827	-0.981862	0.981862	0.018173	0.036345
Ashley	832350	-0.981496	-0.981496	0.981496	0.018504	0.037008
Lyric	8899	-0.838409	-0.980916	0.980916	0.161591	0.323182
Christy	99452	-0.980734	-0.980760	0.980760	0.019266	0.038531
Kenna	7979	-0.980323	-0.980659	0.980659	0.019677	0.039353
Tyrese	7582	0.980480	0.980480	0.980480	0.980480	0.039040
Robby	10399	0.979229	0.979229	0.979229	0.979229	0.041542
Mallory	48990	-0.977648	-0.977648	0.977648	0.022352	0.044703
Madison	308970	-0.976341	-0.976341	0.976341	0.023659	0.047319
Jermaine	39286	0.975386	0.975386	0.975386	0.975386	0.049229
Shelly	86081	-0.974570	-0.974570	0.974570	0.025430	0.050859
Carley	14885	-0.972455	-0.972455	0.972455	0.027545	0.055089
Lacey	47635	-0.969770	-0.969770	0.969770	0.030230	0.060460
Ainsley	8817	-0.968357	-0.968357	0.968357	0.031643	0.063287
Santana	7399	0.422760	0.966667	0.966667	0.422760	0.845520
Kelsey	144166	-0.964811	-0.964811	0.964811	0.035189	0.070377
Ansley	7202	-0.964315	-0.964315	0.964315	0.035685	0.071369
Ronnie	186260	0.960781	0.964046	0.964046	0.960781	0.078439
Kay	101704	-0.962479	-0.962605	0.962605	0.037521	0.075041
Delaney	27608	-0.962402	-0.962402	0.962402	0.037598	0.075196
Lindsay	131956	-0.961351	-0.961351	0.961351	0.038649	0.077298
Lesly	12407	-0.959942	-0.959942	0.959942	0.040058	0.080116
Marquise	9308	0.957886	0.957886	0.957886	0.957886	0.084229
Kenzie	8793	-0.956443	-0.956443	0.956443	0.043557	0.087115
Hillary	28763	-0.956159	-0.956159	0.956159	0.043841	0.087682
Mckenzie	44315	-0.953988	-0.953988	0.953988	0.046012	0.092023
Linsey	5138	-0.953095	-0.953095	0.953095	0.046905	0.093811
Lindsey	159977	-0.952274	-0.952274	0.952274	0.047726	0.095451
Shamar	5093	0.951109	0.951109	0.951109	0.951109	0.097781
Kinsey	5800	-0.951034	-0.951034	0.951034	0.048966	0.097931
Sydney	156602	-0.943922	-0.943922	0.943922	0.056078	0.112157
Kimber	5455	-0.943538	-0.943538	0.943538	0.056462	0.112924
Raven	37100	-0.927143	-0.943274	0.943274	0.072857	0.145714
Meredith	73898	-0.942502	-0.942502	0.942502	0.057498	0.114996
Cassidy	49871	-0.941349	-0.941349	0.941349	0.058651	0.117303
Whitney	98164	-0.940701	-0.940701	0.940701	0.059299	0.118597
Richie	6540	0.938532	0.938532	0.938532	0.938532	0.122936
Diamond	32377	-0.936776	-0.936776	0.936776	0.063224	0.126448
Gay	19363	-0.928678	-0.928678	0.928678	0.071322	0.142643
	...	...	...	...	...	...