We've covered list, tuples, sets, and dictionaries. These are the foundational data structures in Python. In this lecture, we'll go over some more advanced topics that are related to these datasets. By the end of this lecture, you should be able to
zip()
and index them with enumerate()
Here's the bad news: it's a different, and possibly less-easy-to-understand, but much more concise way of creating lists. We'll go over it bit by bit.
Let's look at an example from a previous lecture: creating a list of squares.
In [1]:
squares = []
for element in range(10):
squares.append(element ** 2)
print(squares)
I know this is repetitive, but let's break down what we have.
for element in range(10):
range(10)
, or a list[-like thing] of numbers [0, 10) by 1s.range(10)
is stored in element
.squares.append(element ** 2)
squares
element
, and computing its squareWe'll see these same pieces show up again, just in a slightly different order.
In [2]:
squares = [element ** 2 for element in range(10)]
print(squares)
There it is: a list comprehension. Let's break it down.
[ ]
of a list. This is for the exact reason you'd think: we're building a list!element
Let's say we have some dictionary of word counts:
In [3]:
word_counts = {
'the': 10,
'race': 2,
'is': 3,
'on': 5
}
and we want to generate a list of sentences:
In [4]:
sentences = ['"{}" appears {} times.'.format(word, count) for word, count in word_counts.items()]
print(sentences)
for word, count in word_counts.items()
'"{}" appears {} times.'.format(word, count)
[ ]
sentences
That said, if you ever get confused about generators, just think of them as lists. This can potentially get you in trouble with weird errors, but 90% of the time it'll work every time.
Let's start with an example you're probably already quite familiar with: range()
In [5]:
x = range(10)
As we know, this will create a list[-like thing] with the numbers 0 through 9, inclusive, and assign it to the variable x
.
Now you'll see why I've been using the "list[-like thing]" notation: it's not really a list!
In [6]:
print(x)
print(type(x))
To get a list, we've been casting the generator to a list:
In [7]:
list(x)
Out[7]:
and we get a vanilla Python list.
So range()
gives us a generator! Great! ...what does that mean, exactly?
For most practical purposes, generators and lists are indistinguishable. However, there are some key differences to be aware of:
range(10)
, not all 10 numbers are immediately computed; in fact, none of them are. They're computed on-the-fly in the loop itself! This really comes in handy if, say, you wanted to loop through 1 trillion numbers, or call range(1000000000000)
. With vanilla lists, this would immediately create 1 trillion numbers in memory and store them, taking up a whole lot of space. With generators, only 1 number is ever computed at a given loop iteration. Huge memory savings!How do we build generators? Aside from range()
, that is.
Remember list comprehensions? Just replace the brackets of a list comprehension [ ]
with parentheses ( )
.
In [8]:
x = [i for i in range(10)] # Brackets -> list
print(x)
In [9]:
x = (i for i in range(10)) # Parentheses -> generator
print(x)
Also--where have we seen parentheses before? TUPLES! You can think of a generator as a sort of tuple. After all, like a tuple, a generator is immutable (cannot be changed once created). Be careful with this, though: all generators are very like tuple, but not all tuples are like generators.
In sum, use lists if:
some_list[431]
On the other hand, use generators if:
We've already seen something like this before: the items()
method in dictionaries. Dictionaries are more or less two lists stacked right up against each other: one list holds the keys, and the corresponding elements of the other list holds the values for each key. items()
lets us loop through both simultaneously, giving us the corresponding elements from each list, one at a time:
In [10]:
d = {
'uga': 'University of Georgia',
'gt': 'Georgia Tech',
'upitt': 'University of Pittsburgh',
'cmu': 'Carnegie Mellon University'
}
for key, value in d.items():
print("'{}' stands for '{}'.".format(key, value))
zip()
does pretty much the same thing, but on steroids: rather than just "zipping" together two lists, it can zip together as many as you want.
Here's an example: first names, last names, and favorite programming languages.
In [11]:
first_names = ['Shannon', 'Jen', 'Natasha', 'Benjamin']
last_names = ['Quinn', 'Benoit', 'Romanov', 'Button']
fave_langs = ['Python', 'Java', 'Assembly', 'Go']
I want to loop through these three lists simultaneously, so I can print out the person's first name, last name, and their favorite language on the same line. Since I know they're the same length, I could just do a range(len(fname))
, but this is arguably more elegant:
In [12]:
for fname, lname, lang in zip(first_names, last_names, fave_langs):
print("{} {}'s favorite language is {}.".format(fname, lname, lang))
In [13]:
for fname, lname, lang in zip(first_names, last_names, fave_langs):
print("{} {}'s favorite language is {}.".format(fname, lname, lang))
This is great if all I want to do is loop through the lists simultaneously. But what if the ordering of the elements matters? For example, I want to prefix each sentence with the line number. How can I track what index I'm on in a loop if I don't use range()
?
enumerate()
handles this. By wrapping the object we loop over inside enumerate()
, on each loop iteration we not only get the next object of interest, but also the index of that object. To wit:
In [14]:
x = ['a', 'list', 'of', 'strings']
for index, element in enumerate(x):
print("Found '{}' at index {}.".format(element, index))
This comes in handy anytime you need to loop through a list or generator, but also need to know what index you're on.
Some questions to discuss and consider:
1: I want a list of all possible combinations of (x, y)
values for the range [0, 9]. Show how this can be done with a list comprehension using two for-loops.
2: Without consulting Google University, consider how generators might work under the hood. How do you think they're implemented?
3: Go back to the example with three lists (first names, last names, and programming languages). Show how you could use enumerate
to prepend a line number (the current index of the lists) to the sentence printed for each person, e.g.: "17: Joe Schmo's favorite language is C++.
"
"Cell was changed and shouldn't have" errors on your assignments. If you're getting these errors, it's because you put your code in the wrong cell. Make sure you edit only the cells that say # YOUR CODE HERE
or YOUR ANSWER HERE
. Also, be sure to delete or comment out the line that says raise NotImplementedError()
.
If you need to re-fetch an assignment, you have to delete the entire directory of the old version. For example, in the case where errors are found in the assignment and a new version needs to be pushed, you'll have to delete your current version as well as the folder it's in--so, everything--in order to re-fetch a new version.
Feedback is welcome! This is a new course being taught a new way using nascent technology. Let me know if you're running into problems or something should be improved!