Lecture 4: Collections and loops

CSCI 1360E: Foundations for Informatics and Analytics

Overview and Objectives

In this lecture, we'll go over the different collections of objects that Python natively supports. In previous lectures we dealt only with single strings, ints, and floats; now we'll be able to handle arbitrarily large numbers of them. By the end of the lecture, you should be able to

Describe the differences between sets, tuples, lists, and dictionaries.
Perform basic arithmetic operations using arbitrary-length collections.
Use loops and arrays in conjuction to manage lists of data.

Part 1: Lists

Lists are probably the most basic data structure in Python; they contain ordered elements and can be of abitrary length. Other languages may refer to this basic structure as an "array", and indeed there are similarities. But for our purposes and anytime you're coding in Python, we'll use the term "list."

(Aside)

When I say "data structures," I mean any type that is more sophisticated than the ints and floats we've been working with up until now. I purposefully omitted strs, because they are in fact a data structure unto themselves: they build on the single character, and are effective a "list of characters." In much the same way, we'll see in this lecture how to define a "list of ints", a "list of floats", and even a "list of strs"!

Lists in Python have a few core properties:

Ordered. This means the list structure maintains an instrinic ordering of the elements held inside.
Mutable. This means the structure of the list can change; elements can be added, removed, or changed in-place.

Enough talk, let's see a list in action!



In [1]:

    
x = list()

Here I've defined an empty list, called x. Like our previous variables, this has both a name (x) and a type (list). However, it doesn't have any actual value beyond that; it's just an empty list. Imagine a filing cabinet with nothing in it.

So how do we add things? Lists, as it turns out, have a few methods we can invoke (methods are pieces of functionality that we'll cover more when we get to functions, but for example: print() is a method!). Here's a useful one:



In [2]:

    
x.append(1)

The append() method takes whatever argument I supply to the function, and inserts it into the next available position in the list. Which, in this case, is the very first position (since the list was previously empty).

What does our list look like now?



In [3]:

    
print(x)

[1]

It's tough to tell that there's really anything going on, but those square brackets [ and ] are the key: those denote a list, and anything inside those brackets is an element of the list.

Let's look at another list, a bit more interesting this time.



In [4]:

    
y = list()
y.append(1)
y.append(2)
y.append(3)
print(y)









    



[1, 2, 3]

In this example, I've created a new list y, initially empty, and added three integer values. Notice the ordering of the elements when I print the list at the end: from left-to-right, you'll see the elements in the order that they were added.

You can put any elements you want into a list. If you wanted, you could add strings



In [5]:

    
y.append("this is perfectly legal")

and floats



In [6]:

    
y.append(4.2)

and even other lists!



In [7]:

    
y.append(list())  # Inception BWAAAAAAAAAA
print(y)









    



[1, 2, 3, 'this is perfectly legal', 4.2, []]

Indexing

So I have these lists and I've stored some things in them. I can print them out and see what I've stored...but so far they seem pretty unwieldy. How do I remove things? If someone asks me for whatever was added 3$^{rd}$, how do I give that to them without giving them the whole list?

Glad you asked! The answers to these questions involve indexing.

Indexing is what happens when you refer to an existing element in a list. For example, in our hybrid list y with lots of random stuff in it, what's the first element?



In [8]:

    
first_element = y[1]
print(first_element)
print(y)









    



2
[1, 2, 3, 'this is perfectly legal', 4.2, []]

In this code example, I've used the number 1 as an index to y. In doing so, I took out the value at index 1 and put it into a variable named first_element. I then printed it, as well as the list y, and voi--

--wait, 2 is the second element. o_O

Python and its spiritual progenitors C and C++ are known as zero-indexed languages. This means when you're dealing with lists or arrays, the index of the first element is always 0.

This stands in contrast with languages such as Julia and Matlab, where the index of the first element of a list or array is, indeed, 1. Preference for one or the other tends to covary with whatever you were first taught, though in scientific circles it's generally preferred that languages be 0-indexed$^[citation needed]$.

So what is in the 0 index of our list?



In [9]:

    
print(y[0])
print(y)









    



1
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Much better.

This little caveat is usually the main culprit of errors for new programmers. Give yourself some time to get used to Python's 0-indexed lists. You'll see what I mean when we get to loops.

In addition to elements 0 and 1, we can also directly index elements at the end of the list.



In [10]:

    
print(y[-1])
print(y)









    



[]
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Yep, there's our inception-list, the last element of y.

You can think of this indexing strategy as "wrapping around" the list to the end of it. Similarly, you can also negate other numbers to access the second-to-last element, third-to-last element...



In [11]:

    
print(y[-2])
print(y[-3])
print(y)









    



4.2
this is perfectly legal
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Using more indexing voodoo, you can also index slices of lists. Let's say we want to create a new list that consists of the integer elements of y, which are the first three. We could pull them out one by one, or use slicing:



In [12]:

    
int_elements = y[0:3]  # Slicing!
print(y)
print(int_elements)









    



[1, 2, 3, 'this is perfectly legal', 4.2, []]
[1, 2, 3]

That y[0:3] notation is the slicing. The first number, 0, indicates the first index of values we want to keep. The colon : indicates slicing, and the second number, 3, indicates the last index of values.

You could even say this out loud: "With list y, slice starting at index 0 to index 3." The colon is the "to".

The astute reader will notice that index 3 in y is actually the string!



In [13]:

    
print(y[3])









    



this is perfectly legal

Python, why do you torment me so?!

When you slice an array, the first (starting) index is inclusive; the second (ending) index, however, is exclusive. In mathematical notation, it would look something like this:

$[ starting : ending )$

Therefore, the end index is one after the last index you want to keep.

One more thing about lists

You don't always have to start with empty lists. You can pre-define a full list; just use brackets!



In [14]:

    
z = [42, 502.4, "some string", 0]

Part 2: Sets and Tuples

If you understood lists, sets and tuples are easy-peasy. They're both exactly the same as lists...except:

Tuples:

Immutable. Once you construct a tuple, it cannot be changed.

Sets:

Distinct. Sets cannot contain two identical elements.
Unordered. Sets don't index the same way lists do.

Other than these two rules, pretty much anything you can do with lists can also be done with tuples and sets.

Tuples

Whereas we used square brackets to create a list



In [15]:

    
x = [3, 64.2, "some list"]
print(type(x))









    



<class 'list'>

we use regular parentheses to create a tuple!



In [16]:

    
y = (3, 64.2, "some tuple")
print(type(y))









    



<class 'tuple'>

With lists, if you wanted to change the item at index 2, you could go right ahead:



In [17]:

    
x[2] = "a different string"
print(x)









    



[3, 64.2, 'a different string']

Can't do that with tuples, sorry.



In [50]:

    
y[2] = "does this work?"









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-1347e878f386> in <module>()
----> 1 y[2] = "does this work?"

TypeError: 'tuple' object does not support item assignment

Like list, there is a method for building an empty tuple. Any guesses?



In [19]:

    
z = tuple()

And like lists, you have (almost) all of the other methods at your disposal, such as slicing and len:



In [20]:

    
print(y[0:2])
print(len(y))









    



(3, 64.2)
3

Sets

Sets are interesting buggers, in that they only allow you to store a particular element once.



In [21]:

    
x = list()
x.append(1)
x.append(2)
x.append(2)  # Add the same thing twice.

s = set()
s.add(1)
s.add(2)
s.add(2)  # Add the same thing twice...again.



In [22]:

    
print(x)









    



[1, 2, 2]



In [23]:

    
print(s)









    



{1, 2}

There are certain situations where this can be very useful. It should be noted that sets can actually be built from lists, so you can build a list and then turn it into a set:



In [24]:

    
x = [1, 2, 3, 3]
s = set(x)  # Take the list x as the starting point.
print(s)









    



{1, 2, 3}

Sets also don't index the same way lists and tuples do:



In [49]:

    
print(s[0])









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-919b5fe16240> in <module>()
----> 1 print(s[0])

TypeError: 'set' object does not support indexing

If you want to add elements to a set, you can use the add method.

If you want to remove elements from a set, you can use the discard or remove methods.

But you can't index or slice a set.

Part 3: Dictionaries

Dictionaries deserve a section all to themselves.

Are you familiar with key-value stores? Associative arrays? Hash maps?

The basic idea of all these data type abstractions is to map a key to a value, in such a way that if you have a certain key, you always get back the value associated with that key.

You can also think of dictionaries as lists with more interesting indices.

A few important points on dictionaries before we get into examples:

Mutable. Dictionaries can be changed and updated.
Unordered. Elements in dictionaries have no concept of ordering.
Keys are distinct. The keys of dictionaries are unique; no key is ever copied. The values, however, can be copied as many times as you want.

Dictionaries are created using the dict() method, or using curly braces:



In [26]:

    
d = dict()
# Or...
d = {}

New elements can be added to the dictionary in much the same way as lists:



In [27]:

    
d["some_key"] = 14.3

Yes, you can use strings as keys! In fact, string is probably the most common data type to use as a key in dictionaries. That way, you can treat dictionaries as "look up" tables--maybe you're storing information on people in a beta testing program. You can store their information by name:



In [28]:

    
d["shannon_quinn"] = ["some", "personal", "information"]
print(d)









    



{'shannon_quinn': ['some', 'personal', 'information'], 'some_key': 14.3}

Since dictionaries do not maintain any kind of ordering of elements, using integers as indices won't give us anything useful. However, dictionaries do have a keys() method that gives us a list of all the keys in the dictionary:



In [29]:

    
print(d.keys())









    



dict_keys(['shannon_quinn', 'some_key'])

and a values() method for (you guessed it) the values in the dictionary:



In [30]:

    
print(d.values())









    



dict_values([['some', 'personal', 'information'], 14.3])

To further induce Inception-style headaches, dictionaries also have a items() method that returns a list of tuples where each tuple is a key-value pair in the dictionary!



In [31]:

    
print(d.items())









    



dict_items([('shannon_quinn', ['some', 'personal', 'information']), ('some_key', 14.3)])

(it's basically the entire dictionary, but this method is useful for looping)

Isn't this fun?!

Part 4: Loops

Looping, like lists, is a critical component in programming and data science. When we're training models on data, we'll need to loop over each data point, examining it in turn and adjusting our model accordingly regardless of how many data points there are. This kind of repetitive task is ideal for looping.

Let's define for ourselves the following list:



In [32]:

    
ages = [21, 22, 19, 19, 22, 21, 22, 31]

This is a list containing the ages of some group of students. Any group. Any group of students. And we want to compute the average. How do we compute averages?

We know an average is some total quantity divided by number of elements. Well, the latter is easy enough to compute:



In [33]:

    
number_of_elements = len(ages)
print(number_of_elements)

The total quantity is a bit trickier. You could certainly sum them all manually--



In [34]:

    
age_sum = ages[0] + ages[1] + ages[2] # + ... and so on

...but that seems really, really tedious. Plus, how do you even know how many elements your list has?

Loop structure

Here's the full block of code to compute the average age:



In [35]:

    
age_sum = 0.0  # Set age_sum to be a float. Why a float? This becomes important at the end.
for age in ages:  # 1
    age_sum += age  # 2
avg = age_sum / number_of_elements  # Compute the average using the formula we know and love!
print("Average age: {:.2f}".format(avg))









    



Average age: 22.12

1: This is the loop header. It specifies what, exactly, we're looping through--in this case, the list ages. Each element of ages is pulled out and, in turn, stored in the variable age. The loop automatically terminates when there are no more elements of ages to pull out.

2: This is the loop body. This part specifies what happens on each loop. In this case, we want to increment our sum variable age_sum with the age we were given in the current loop, age. We can do this with the fancy += operator we saw in L2.

You can loop through sets and tuples the same way.



In [36]:

    
s = set([1, 1, 2, 3, 5])
for item in s:
    print(item)



In [37]:

    
t = tuple([1, 1, 2, 3, 5])
for item in t:
    print(item)

IMPORTANT: INDENTATION MATTERS

You'll notice in these loops that the loop body is distinctly indented relative to the loop header. This is intentional and is indeed how it works! If you fail to indent the body of the loop, Python will complain:



In [48]:

    
some_list = [3.14159, "random stuff", 4200]
for item in some_list:
print(item)









    



  File "<ipython-input-48-e6ab552bd1f0>", line 3
    print(item)
        ^
IndentationError: expected an indented block

With loops, whitespace in Python really starts to matter. If you want many things to happen inside of a loop, you'll need to indent every line!

Looping with indices

Let's say in some future homework assignment, I ask you to write a loop computing the squares of the numbers 1-10. How would you do it?

Well, you could manually write it out, I suppose...



In [39]:

    
squares = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

...but that's awfully boring. Plus, it's of no help if I come back with a "Part 2" that asks for the squares for numbers 11-20. And a "Part C" for 21-30. Can we make this loop work for any numbers?

No. At least, not yet; that will have to wait for functions.

But we can define a list automatically within a certain range! Think of it as slicing, but on the infinite, ethereal range of integers rather than a concrete Python object.

To do this, we use the range() function:



In [40]:

    
numbers = range(10)
print(numbers)









    



range(0, 10)

You can think of this as a kinda-sorta list, or even convert it to a list if you want:



In [41]:

    
numbers = list(numbers)
print(numbers)









    



[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

range(end): A useful method which hands back a list[-like thing] containing all the numbers from 0 (inclusive) to end (exclusive!). So using this list of numbers, I can compute the square of each:



In [42]:

    
squares = []  # Empty list for all our squares
for num in numbers:
    squared_number = num ** 2  # Exponent operation!
    squares.append(squared_number)  # Add to our list.
print(squares)









    



[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Looping through dictionaries

This gets its own subsection because it pulls together pretty much all the concepts we've discussed so far: lists, tuples, dictionaries, and looping.

Let's start by defining a dictionary. In this case, we'll set up a dictionary that maps people to their favorite programming language.



In [43]:

    
favorite_languages = {
    'jen': 'python',
    'sarah': 'c',
    'edward': 'ruby',
    'shannon': 'python'
}
# Notice the indentation, if you decide to define a dictionary this way!

Remember the super-useful methods for iterating through dictionaries? keys gives you a list of all the keys, values a list of all the values, and items a list of tuples of the key-value pairs. Here's the loop:



In [44]:

    
for key, value in favorite_languages.items():  # 1
    print("{} prefers {}.".format(key, value)) # 2









    



sarah prefers c.
shannon prefers python.
jen prefers python.
edward prefers ruby.

1: Notice how key, value are just out there floating! This is called unpacking and is a very useful technique in Python. If I have a list of a few items, and (critically) I know how many items there are, I can do this



In [45]:

    
some_list = ['a', 'b']
a, b = some_list

instead of this



In [46]:

    
some_list = ['a', 'b']
a = some_list[0]
b = some_list[1]

In the same vein, I could have just as easily written the loop like this:



In [47]:

    
for keyvalue in favorite_languages.items():  # 1
    key = keyvalue[0]
    value = keyvalue[1]
    print("{} prefers {}.".format(key, value)) # 2









    



sarah prefers c.
shannon prefers python.
jen prefers python.
edward prefers ruby.

and indeed, if that is easier for you to understand, by all means do it! This is to illustrate all the concepts at play at once:

the loop header iterates through a list provided by favorite_languages.items()
each iteration, items() provides a tuple: a key-value pair from the dictionary
we can "unpack" these variables using shorthand, but it's also perfectly valid to do it the "regular" way

Review Questions

Some questions to discuss and consider:

1: Without knowing the length of the list some_list, how would you slice it so only the first and last elements are removed?

2: Provide an example use-case where the properties of sets and tuples would come in handy over lists.

3: Would it be possible to convert a list to a dictionary? How? Would anything change?

4: Write a loop that computes the maximum element of a list. Write a loop that computes the minimum element of a list.

5: Create a dictionary of lists, where the lists contain numbers. For each key-value pair, compute an average.

Course Administrivia

How's A1 going?

Ready for some list / set / tuple / dictionary / looping awesomeness in A2?

Additional Resources

Matthes, Eric. Python Crash Course. 2016. ISBN-13: 978-1593276034
Grus, Joel. Data Science from Scratch. 2015. ISBN-13: 978-1491901427