Homework #4

These problem sets focus on list comprehensions, string operations and regular expressions.

Problem set #1: List slices and list comprehensions

Let's start with some data. The following cell contains a string with comma-separated integers, assigned to a variable called numbers_str:


In [1]:
numbers_str = '496,258,332,550,506,699,7,985,171,581,436,804,736,528,65,855,68,279,721,120'

In the following cell, complete the code with an expression that evaluates to a list of integers derived from the raw numbers in numbers_str, assigning the value of this expression to a variable numbers. If you do everything correctly, executing the cell should produce the output 985 (not '985').


In [2]:
number_list = numbers_str.split(",") 

numbers = [int(item) for item in number_list]

max(numbers)


Out[2]:
985

Great! We'll be using the numbers list you created above in the next few problems. In the cell below, fill in the square brackets so that the expression evaluates to a list of the ten largest values in numbers. Expected output: [506, 528, 550, 581, 699, 721, 736, 804, 855, 985]

(Hint: use a slice.)


In [3]:
#len(numbers)
sorted(numbers)[10:]


Out[3]:
[506, 528, 550, 581, 699, 721, 736, 804, 855, 985]

In the cell below, write an expression that evaluates to a list of the integers from numbers that are evenly divisible by three, sorted in numerical order. Expected output:

[120, 171, 258, 279, 528, 699, 804, 855]

(These outputs might vary slightly depending on your platform.)


In [5]:
from math import sqrt

squared = []

for item in numbers:
    if item < 100:
        numbers_squared = sqrt(item)
        squared.append(numbers_squared)
squared


Out[5]:
[2.6457513110645907, 8.06225774829855, 8.246211251235321]

Problem set #2: Still more list comprehensions

Still looking good. Let's do a few more with some different data. In the cell below, I've defined a data structure and assigned it to a variable planets. It's a list of dictionaries, with each dictionary describing the characteristics of a planet in the solar system. Make sure to run the cell before you proceed.


In [6]:
planets = [
 {'diameter': 0.382,
  'mass': 0.06,
  'moons': 0,
  'name': 'Mercury',
  'orbital_period': 0.24,
  'rings': 'no',
  'type': 'terrestrial'},
 {'diameter': 0.949,
  'mass': 0.82,
  'moons': 0,
  'name': 'Venus',
  'orbital_period': 0.62,
  'rings': 'no',
  'type': 'terrestrial'},
 {'diameter': 1.00,
  'mass': 1.00,
  'moons': 1,
  'name': 'Earth',
  'orbital_period': 1.00,
  'rings': 'no',
  'type': 'terrestrial'},
 {'diameter': 0.532,
  'mass': 0.11,
  'moons': 2,
  'name': 'Mars',
  'orbital_period': 1.88,
  'rings': 'no',
  'type': 'terrestrial'},
 {'diameter': 11.209,
  'mass': 317.8,
  'moons': 67,
  'name': 'Jupiter',
  'orbital_period': 11.86,
  'rings': 'yes',
  'type': 'gas giant'},
 {'diameter': 9.449,
  'mass': 95.2,
  'moons': 62,
  'name': 'Saturn',
  'orbital_period': 29.46,
  'rings': 'yes',
  'type': 'gas giant'},
 {'diameter': 4.007,
  'mass': 14.6,
  'moons': 27,
  'name': 'Uranus',
  'orbital_period': 84.01,
  'rings': 'yes',
  'type': 'ice giant'},
 {'diameter': 3.883,
  'mass': 17.2,
  'moons': 14,
  'name': 'Neptune',
  'orbital_period': 164.8,
  'rings': 'yes',
  'type': 'ice giant'}]

Now, in the cell below, write a list comprehension that evaluates to a list of names of the planets that have a diameter greater than four earth radii. Expected output:

['Jupiter', 'Saturn', 'Uranus']

In [7]:
[item['name'] for item in planets if item['diameter'] > 2]


Out[7]:
['Jupiter', 'Saturn', 'Uranus', 'Neptune']

In the cell below, write a single expression that evaluates to the sum of the mass of all planets in the solar system. Expected output: 446.79


In [8]:
sum([item['mass'] for item in planets])


Out[8]:
446.79

Good work. Last one with the planets. Write an expression that evaluates to the names of the planets that have the word giant anywhere in the value for their type key. Expected output: ['Jupiter', 'Saturn', 'Uranus', 'Neptune']


In [9]:
import re

In [10]:
planet_with_giant= [item['name'] for item in planets if re.search(r'\bgiant\b', item['type'])]

planet_with_giant


Out[10]:
['Jupiter', 'Saturn', 'Uranus', 'Neptune']

EXTREME BONUS ROUND: Write an expression below that evaluates to a list of the names of the planets in ascending order by their number of moons. (The easiest way to do this involves using the key parameter of the sorted function, which we haven't yet discussed in class! That's why this is an EXTREME BONUS question.) Expected output:

    ['Mercury', 'Venus', 'Earth', 'Mars', 'Neptune', 'Uranus', 'Saturn', 'Jupiter']

In [ ]:

Problem set #3: Regular expressions

In the following section, we're going to do a bit of digital humanities. (I guess this could also be journalism if you were... writing an investigative piece about... early 20th century American poetry?) We'll be working with the following text, Robert Frost's The Road Not Taken. Make sure to run the following cell before you proceed.


In [11]:
import re
poem_lines = ['Two roads diverged in a yellow wood,',
 'And sorry I could not travel both',
 'And be one traveler, long I stood',
 'And looked down one as far as I could',
 'To where it bent in the undergrowth;',
 '',
 'Then took the other, as just as fair,',
 'And having perhaps the better claim,',
 'Because it was grassy and wanted wear;',
 'Though as for that the passing there',
 'Had worn them really about the same,',
 '',
 'And both that morning equally lay',
 'In leaves no step had trodden black.',
 'Oh, I kept the first for another day!',
 'Yet knowing how way leads on to way,',
 'I doubted if I should ever come back.',
 '',
 'I shall be telling this with a sigh',
 'Somewhere ages and ages hence:',
 'Two roads diverged in a wood, and I---',
 'I took the one less travelled by,',
 'And that has made all the difference.']

In the cell above, I defined a variable poem_lines which has a list of lines in the poem, and imported the re library.

In the cell below, write a list comprehension (using re.search()) that evaluates to a list of lines that contain two words next to each other (separated by a space) that have exactly four characters. (Hint: use the \b anchor. Don't overthink the "two words in a row" requirement.) Expected result:

['Then took the other, as just as fair,', 'Had worn them really about the same,', 'And both that morning equally lay', 'I doubted if I should ever come back.', 'I shall be telling this with a sigh']


In [12]:
[item for item in poem_lines if re.search(r'\b[a-zA-Z]{4}\b \b[a-zA-Z]{4}\b', item)]


Out[12]:
['Then took the other, as just as fair,',
 'Had worn them really about the same,',
 'And both that morning equally lay',
 'I doubted if I should ever come back.',
 'I shall be telling this with a sigh']

['Then took the other, as just as fair,', 'Had worn them really about the same,', 'And both that morning equally lay', 'I doubted if I should ever come back.', 'I shall be telling this with a sigh']

Good! Now, in the following cell, write a list comprehension that evaluates to a list of lines in the poem that end with a five-letter word, regardless of whether or not there is punctuation following the word at the end of the line. (Hint: Try using the ? quantifier. Is there an existing character class, or a way to write a character class, that matches non-alphanumeric characters?) Expected output:

['And be one traveler, long I stood', 'And looked down one as far as I could', 'And having perhaps the better claim,', 'Though as for that the passing there', 'In leaves no step had trodden black.', 'Somewhere ages and ages hence:']


In [14]:
[item for item in poem_lines if re.search(r'\b[a-zA-Z]{5}\b.?$',item)]


Out[14]:
['And be one traveler, long I stood',
 'And looked down one as far as I could',
 'And having perhaps the better claim,',
 'Though as for that the passing there',
 'In leaves no step had trodden black.',
 'Somewhere ages and ages hence:']

Okay, now a slightly trickier one. In the cell below, I've created a string all_lines which evaluates to the entire text of the poem in one string. Execute this cell.


In [15]:
all_lines = " ".join(poem_lines)

Now, write an expression that evaluates to all of the words in the poem that follow the word 'I'. (The strings in the resulting list should not include the I.) Hint: Use re.findall() and grouping! Expected output:

['could', 'stood', 'could', 'kept', 'doubted', 'should', 'shall', 'took']


In [17]:
re.findall(r'[I] (\b\w+\b)', all_lines)


Out[17]:
['could', 'stood', 'could', 'kept', 'doubted', 'should', 'shall', 'took']

Finally, something super tricky. Here's a list of strings that contains a restaurant menu. Your job is to wrangle this plain text, slightly-structured data into a list of dictionaries.


In [18]:
entrees = [
    "Yam, Rosemary and Chicken Bowl with Hot Sauce $10.95",
    "Lavender and Pepperoni Sandwich $8.49",
    "Water Chestnuts and Peas Power Lunch (with mayonnaise) $12.95 - v",
    "Artichoke, Mustard Green and Arugula with Sesame Oil over noodles $9.95 - v",
    "Flank Steak with Lentils And Tabasco Pepper With Sweet Chilli Sauce $19.95",
    "Rutabaga And Cucumber Wrap $8.49 - v"
]

You'll need to pull out the name of the dish and the price of the dish. The v after the hyphen indicates that the dish is vegetarian---you'll need to include that information in your dictionary as well. I've included the basic framework; you just need to fill in the contents of the for loop.

Expected output:

[{'name': 'Yam, Rosemary and Chicken Bowl with Hot Sauce ', 'price': 10.95, 'vegetarian': False}, {'name': 'Lavender and Pepperoni Sandwich ', 'price': 8.49, 'vegetarian': False}, {'name': 'Water Chestnuts and Peas Power Lunch (with mayonnaise) ', 'price': 12.95, 'vegetarian': True}, {'name': 'Artichoke, Mustard Green and Arugula with Sesame Oil over noodles ', 'price': 9.95, 'vegetarian': True}, {'name': 'Flank Steak with Lentils And Tabasco Pepper With Sweet Chilli Sauce ', 'price': 19.95, 'vegetarian': False}, {'name': 'Rutabaga And Cucumber Wrap ', 'price': 8.49, 'vegetarian': True}]

Great work! You are done. Go cavort in the sun, or whatever it is you students do when you're done with your homework


In [20]:
menu = []

for item in entrees:
    entrees_dictionary= {}
    match = re.search(r'(.*) .(\d*\d\.\d{2})\ ?( - v+)?$', item)
    
    if match:
        name = match.group(1)
        price= match.group(2)
        if match.group(3):
            entrees_dictionary['vegetarian']= True
        else:
            entrees_dictionary['vegetarian']= False
            
        entrees_dictionary['name']= name
        entrees_dictionary['price']= price
       
        menu.append(entrees_dictionary)

menu


Out[20]:
[{'name': 'Yam, Rosemary and Chicken Bowl with Hot Sauce',
  'price': '10.95',
  'vegetarian': False},
 {'name': 'Lavender and Pepperoni Sandwich',
  'price': '8.49',
  'vegetarian': False},
 {'name': 'Water Chestnuts and Peas Power Lunch (with mayonnaise)',
  'price': '12.95',
  'vegetarian': True},
 {'name': 'Artichoke, Mustard Green and Arugula with Sesame Oil over noodles',
  'price': '9.95',
  'vegetarian': True},
 {'name': 'Flank Steak with Lentils And Tabasco Pepper With Sweet Chilli Sauce',
  'price': '19.95',
  'vegetarian': False},
 {'name': 'Rutabaga And Cucumber Wrap', 'price': '8.49', 'vegetarian': True}]

In [ ]: