Baby names


In [ ]:
import sys
import re

Define the extract_names() function below and change baby_names() to call it.

For writing regex, it's nice to include a copy of the target text for inspiration.

Here's what the html looks like in the baby.html files:

...

Popularity in 1990

.... 1MichaelJessica 2ChristopherAshley 3MatthewBrittany ...

Suggested milestones for incremental development:

  • Extract the year and print it
  • Extract the names and rank numbers and just print them
  • Get the names data into a dict and print it
  • Build the [year, 'name rank', ... ] list and print it
  • Fix baby_names() to use the extract_names list

In [ ]:
def extract_names(filename):
    """
    Given a file name for baby.html, returns a list starting with the year string
    followed by the name-rank strings in alphabetical order.
    ['2006', 'Aaliyah 91', Aaron 57', 'Abagail 895', ' ...]
    """
    # +++your code here+++
    return

In [ ]:
def baby_names(file_list, summary=False):
# +++your code here+++
# For each filename, get the names, then either print the text output
# or write it to a summary file

In [ ]:
baby_names(['data/babynames/baby1990.html'])

In [ ]:
wordcount('topcount', 'data/wiki.txt')

In [ ]:
baby_names(['data/babynames/baby1996.html'], summary=True)

In [ ]:
baby_names(['data/babynames/baby2000.html', 'data/babynames/baby2002.html'])

In [ ]:


In [ ]:


In [ ]:

Note: This notebook is an adaption of Google's python tutorial https://developers.google.com/edu/python