GP01: Birth Dates In The United States

The raw data behind the story Some People Are Too Superstitious To Have A Baby On Friday The 13th, which you can read here.

We'll be working with the data set from the Centers for Disease Control and Prevention's National National Center for Health Statistics.

The data set has the following structure:

  • year - Year
  • month - Month
  • date_of_month - Day number of the month
  • day_of_week - Day of week, where 1 is Monday and 7 is Sunday
  • births - Number of births

In [1]:
f = open("../data/GP01/births.csv", 'r')
text = f.read()
print(text[:193])


year,month,date_of_month,day_of_week,births
1994,1,1,6,8096
1994,1,2,7,7772
1994,1,3,1,10142
1994,1,4,2,11248
1994,1,5,3,11053
1994,1,6,4,11406
1994,1,7,5,11251
1994,1,8,6,8653
1994,1,9,7,7910


In [2]:
lines_list = text.split("\n")
lines_list[:10]


Out[2]:
['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

In [3]:
data_no_header = lines_list[1:len(lines_list)]
days_counts = dict()

for line in data_no_header:
    split_line = line.split(",")
    day_of_week = split_line[3]
    num_births = int(split_line[4])

    if day_of_week in days_counts:
        days_counts[day_of_week] = days_counts[day_of_week] + num_births
    else:
        days_counts[day_of_week] = num_births

days_counts


Out[3]:
{'1': 5789166,
 '2': 6446196,
 '3': 6322855,
 '4': 6288429,
 '5': 6233657,
 '6': 4562111,
 '7': 4079723}