zip()
and index them with enumerate()
itertools
moduleHere's some good news: if we get right down to it, having done loops and lists already, there's nothing new here.
Here's the bad news: it's a different, and possibly less-easy-to-understand, but much more concise way of creating lists. We'll go over it bit by bit.
Let's look at an example from a previous lecture: creating a list of squares.
In [1]:
squares = []
for element in range(10):
squares.append(element ** 2)
print(squares)
Let's break it down.
for element in range(10):
range(10)
, or a list[-like thing] of numbers [0, 10) by 1s.range(10)
is stored in element
.squares.append(element ** 2)
squares
element
, and computing its squareWe'll see these same pieces show up again, just in a slightly different order.
In [2]:
squares = [element ** 2 for element in range(10)]
print(squares)
There it is: a list comprehension. Let's break it down.
[ ]
of a list. This is for the exact reason you'd think: we're building a list!element
Let's say we have some dictionary of word counts:
In [1]:
word_counts = {
'the': 10,
'race': 2,
'is': 3,
'on': 5
}
and we want to generate a list of sentences:
In [4]:
sentences = ['"{}" appears {} times.'.format(word, count) for word, count in word_counts.items()]
print(sentences)
for word, count in word_counts.items()
'"{}" appears {} times.'.format(word, count)
[ ]
sentences
Here's another example: going from a dictionary of word counts to a list of squared counts.
In [2]:
squared_counts = [value ** 2 for key, value in word_counts.items()]
print(squared_counts)
items()
method on the dictionary again, which gives us a list of tuplesvalue ** 2
, squaring the valueGenerators are cool twists on lists (see what I did there). They've been around since Python 2 but took on a whole new life in Python 3.
That said, if you ever get confused about generators, just think of them as lists. This can potentially get you in trouble with weird errors, but 90% of the time it'll work every time.
Let's start with an example you're probably already quite familiar with: range()
In [5]:
x = range(10)
As we know, this will create a list[-like thing] with the numbers 0 through 9, inclusive, and assign it to the variable x
.
Now you'll see why I've been using the "list[-like thing]" notation: it's not really a list!
In [6]:
print(x)
print(type(x))
To get a list, we've been casting the generator to a list:
In [7]:
list(x)
Out[7]:
and we get a vanilla Python list.
So range()
gives us a generator! Great! ...what does that mean, exactly?
For most practical purposes, generators and lists are indistinguishable. However, there are some key differences to be aware of:
range(10)
, not all 10 numbers are immediately computed; in fact, none of them are. They're computed on-the-fly in the loop itself! This really comes in handy if, say, you wanted to loop through 1 trillion numbers, or call range(1000000000000)
. With vanilla lists, this would immediately create 1 trillion numbers in memory and store them, taking up a whole lot of space. With generators, only 1 number is ever computed at a given loop iteration. Huge memory savings!How do we build generators? Aside from range()
, that is.
Remember list comprehensions? Just replace the brackets of a list comprehension [ ]
with parentheses ( )
.
In [8]:
x = [i for i in range(10)] # Brackets -> list
print(x)
In [9]:
x = (i for i in range(10)) # Parentheses -> generator
print(x)
Also--where have we seen parentheses before? TUPLES! You can think of a generator as a sort of tuple. After all, like a tuple, a generator is immutable (cannot be changed once created). Be careful with this, though: all generators are very like tuples, but not all tuples are like generators.
In sum, use lists if:
some_list[431]
On the other hand, use generators if:
There are a few other advanced looping mechanisms in Python that are a little complex, but can make your life a lot easier when used correctly (especially if you're a convert from something like C++ or Java).
zip()
is a small method that packs a big punch. It "zips" multiple lists together into something of one big mega-list for the sole purpose of being able to iterate through them all simultaneously.
We've already seen something like this before: the items()
method in dictionaries. Dictionaries are more or less two lists stacked right up against each other: one list holds the keys, and the corresponding elements of the other list holds the values for each key. items()
lets us loop through both simultaneously, giving us the corresponding elements from each list, one at a time:
In [10]:
d = {
'uga': 'University of Georgia',
'gt': 'Georgia Tech',
'upitt': 'University of Pittsburgh',
'cmu': 'Carnegie Mellon University'
}
for key, value in d.items():
print("'{}' stands for '{}'.".format(key, value))
zip()
does pretty much the same thing, but on steroids: rather than just "zipping" together two lists, it can zip together as many as you want.
Here's an example: first names, last names, and favorite programming languages.
In [11]:
first_names = ['Shannon', 'Jen', 'Natasha', 'Benjamin']
last_names = ['Quinn', 'Benoit', 'Romanov', 'Button']
fave_langs = ['Python', 'Java', 'Assembly', 'Go']
I want to loop through these three lists simultaneously, so I can print out the person's first name, last name, and their favorite language on the same line. Since I know they're the same length, I could just do a range(len(fname))
, but this is arguably more elegant:
In [12]:
for fname, lname, lang in zip(first_names, last_names, fave_langs):
print("{} {}'s favorite language is {}.".format(fname, lname, lang))
Of course, there are always those situations where it's really, really nice to have an index variable in the loop. Let's take a look at that previous example:
In [13]:
for fname, lname, lang in zip(first_names, last_names, fave_langs):
print("{} {}'s favorite language is {}.".format(fname, lname, lang))
This is great if all I want to do is loop through the lists simultaneously. But what if the ordering of the elements matters? For example, I want to prefix each sentence with the line number. How can I track what index I'm on in a loop if I don't use range()
?
enumerate()
handles this. By wrapping the object we loop over inside enumerate()
, on each loop iteration we not only get the next object of interest, but also the index of that object. To wit:
In [14]:
x = ['a', 'list', 'of', 'strings']
for index, element in enumerate(x):
print("Found '{}' at index {}.".format(element, index))
This comes in handy anytime you need to loop through a list or generator, but also need to know what index you're on.
for
loops, you specify how many times to run the loop.while
loops, you iterate until some condition is met.For the vast majority of cases, this works well. But sometimes you need just a little more control for extenuating circumstances.
Take the example of a web server:
This is essentially implemented using a purposefully-infinite loop:
In [ ]:
while True:
# Listen for incoming requests
# Handle the request
How do you get out of this infinite loop? With a break
statement.
In [2]:
import numpy as np
def handle_request():
return np.random.randint(100)
loops = 0
while True:
loops += 1
req = handle_request()
break
print("Exiting program after {} loop.".format(loops))
Just break
. That will snap whatever loop you're currently in and immediately dump you out just after it.
Same thing with for
loops:
In [3]:
for i in range(100000): # Loop 100,000 times!
break
print(i)
Similar to break
is continue
, though you use this when you essentially want to "skip" certain iterations.
continue
will also halt the current iteration, but instead of ending the loop entirely, it basically skips you on to the next iteration of the loop without executing any code that may be below it.
In [4]:
for i in range(100):
continue
print("This will never be printed.")
print(i)
Notice how the print
statement inside the loop is never executed, but our loop counter i
is still incremented through the very end.
Aside: Welcome to the big wide world of Beyond Core Python.
Technically we're still in base Python, but in order to use more advanced iterating tools, we have to actually pull in an external package--itertools
.
Justin Duke has an excellent web tutorial, which I'll reproduce in part here. Let's say we have a couple of lists we want to operate on:
In [3]:
letters = ['a', 'b', 'c', 'd', 'e', 'f']
booleans = [1, 0, 1, 0, 0, 1]
numbers = [23, 20, 44, 32, 7, 12]
decimals = [0.1, 0.7, 0.4, 0.4, 0.5]
Now: I want you to string all four of these lists together, end-to-end, into one long list. How would you do it?
Here's a way simpler way, though it requires pulling in an external package. You can do this with the keyword import
:
In [4]:
import itertools
Now, we have access to all the functions available in the itertools
package--to use them, just type the package name, a dot ".", and the function you want to call.
In this example, we want to use the itertools.chain()
function:
In [6]:
monster = itertools.chain(letters, booleans, numbers, decimals)
print(monster)
Err, what's an itertools.chain
object?
Don't panic--any thoughts as to what kind of object this might be?
It's an iterable, and we know how to handle those!
In [7]:
for item in monster:
print(item, end = " ")
And there they are--all four lists, joined at the hip.
Another phenomenal function is combinations
.
If you've ever taken a combinatorics class, or are at all interested in the idea of finding all the possible combinations of a certain collection of things, this is your function.
A common task in data science is finding combinations of configuration values that work well together--e.g., plotting your data in two dimensions. Which two dimensions will give the nicest plot?
Here's a list of numbers. How many possible pairings are there?
In [8]:
items = ['one', 'two', 'three', 'four', 'five', 'six']
In [9]:
combos = itertools.combinations(items, 2) # The "2" means pairs.
for combo in combos:
print(combo)
It doesn't have to be pairs; we can also try to find all the triplets of items.
In [10]:
combos = itertools.combinations(items, 3) # Now it's a 3.
for combo in combos:
print(combo)
1: I want a list of all possible combinations of (x, y)
values for the range [0, 9]. Show how this can be done with a list comprehension using two for-loops.
2: Without consulting Google University, consider how generators might work under the hood. How do you think they're implemented?
3: Go back to the example with three lists (first names, last names, and programming languages). Show how you could use enumerate
to prepend a line number (the current index of the lists) to the sentence printed for each person, e.g.: "17: Joe Schmo's favorite language is C++.
"
4: Consider that you have a list of lists--representing a matrix--and you want to convert each "row" of the matrix to a tuple, where the first element is an integer row index, and the second element is the row itself (the original list). Assuming the list-of-lists matrix already exists, how could you add the list of indices to it?
5: How would you implement itertools.combinations
on your own, just using loops?
"Cell was changed and shouldn't have" errors on your assignments. If you're getting these errors, it's because you put your code in the wrong cell. Make sure you edit only the cells that say # YOUR CODE HERE
or YOUR ANSWER HERE
. Also, be sure to delete or comment out the line that says raise NotImplementedError()
.
If you need to re-fetch an assignment, you have to delete the entire directory of the old version. For example, in the case where errors are found in the assignment and a new version needs to be pushed, you'll have to delete your current version as well as the folder it's in--so, everything--in order to re-fetch a new version.
First review session on Thursday! Come with questions. I'll have some exercises for everyone to work on as well--exercises that could very likely end up on the midterm or final in some form.
How is A1 going? A2 will be out on Thursday as well.
itertools
. http://jmduke.com/posts/a-gentle-introduction-to-itertools/