Baby names



In [ ]:

    
import sys
import re

Define the extract_names() function below and change baby_names() to call it.

For writing regex, it's nice to include a copy of the target text for inspiration.

Here's what the html looks like in the baby.html files:

...

Popularity in 1990

.... 1MichaelJessica 2ChristopherAshley 3MatthewBrittany ...

Suggested milestones for incremental development:

Extract the year and print it
Extract the names and rank numbers and just print them
Get the names data into a dict and print it
Build the [year, 'name rank', ... ] list and print it
Fix baby_names() to use the extract_names list



In [ ]:

    
def extract_names(filename):
    """
    Given a file name for baby.html, returns a list starting with the year string
    followed by the name-rank strings in alphabetical order.
    ['2006', 'Aaliyah 91', Aaron 57', 'Abagail 895', ' ...]
    """
    # +++your code here+++
    return



In [ ]:

    
def baby_names(file_list, summary=False):
# +++your code here+++
# For each filename, get the names, then either print the text output
# or write it to a summary file



In [ ]:

    
baby_names(['data/babynames/baby1990.html'])



In [ ]:

    
wordcount('topcount', 'data/wiki.txt')



In [ ]:

    
baby_names(['data/babynames/baby1996.html'], summary=True)



In [ ]:

    
baby_names(['data/babynames/baby2000.html', 'data/babynames/baby2002.html'])



In [ ]:



In [ ]:



In [ ]:

Note: This notebook is an adaption of Google's python tutorial https://developers.google.com/edu/python