Lecture 4: Data Structures, Loops, and Conditionals

CBIO (CSCI) 4835/6835: Introduction to Computational Biology

Overview and Objectives

In this lecture, we'll go over the different collections of objects that Python natively supports. In previous lectures we dealt only with single strings, ints, and floats; now we'll be able to handle arbitrarily large numbers of them. By the end of the lecture, you should be able to

Describe the differences between sets, tuples, lists, and dictionaries
Perform basic arithmetic operations using arbitrary-length collections
Describe the differences between the separate kinds of loops
Build arbitrary conditional hierarchies to test a variety of possible circumstances
Construct elementary boolean logic statements

Part 1: Beyond Numerical Variables

Lists are probably the most basic data structure in Python; they contain ordered elements and can be of abitrary length. Other languages may refer to this basic structure as an "array", and indeed there are similarities. But for our purposes and anytime you're coding in Python, we'll use the term "list."

When I say "data structures," I mean any type that is more sophisticated than the ints and floats we've been working with up until now. I purposefully omitted strs, because they are in fact a data structure unto themselves: they build on the single character, and are effective a "list of characters." In much the same way, we'll see in this lecture how to define a "list of ints", a "list of floats", and even a "list of strs"!

Anything in Python that holds an arbitrary number of simpler objects is known as a collection. Today we'll discuss four different kinds of collections: lists, sets, tuples, and dictionaries.

Lists in Python have a few core properties:

Ordered. This means the list structure maintains an instrinic ordering of the elements held inside.
Mutable. This means the structure of the list can change; elements can be added, removed, or changed in-place.



In [1]:

    
x = list()
print(x)

[]

Here I've defined an empty list, called x. Like our previous variables, this has both a name (x) and a type (list). However, it doesn't have any actual value beyond that; it's just an empty list. Imagine a filing cabinet with nothing in it.

So how do we add things? Lists, as it turns out, have a few methods we can invoke (methods are pieces of functionality that we'll cover more when we get to functions). Here's a useful one:



In [2]:

    
x.append(1)

The append() method takes whatever argument I supply to the function, and inserts it into the next available position in the list. Which, in this case, is the very first position (since the list was previously empty).

What does our list look like now?



In [3]:

    
print(x)

[1]

It's tough to tell that there's really anything going on, but those square brackets [ and ] are the key: those denote a list, and anything inside those brackets is an element of the list.

Let's look at another list, a bit more interesting this time.



In [4]:

    
y = list()
y.append(1)
y.append(2)
y.append(3)
print(y)









    



[1, 2, 3]

In this example, I've created a new list y, initially empty, and added three integer values. Notice the ordering of the elements when I print the list at the end: from left-to-right, you'll see the elements in the order that they were added.

You can put any elements you want into a list. If you wanted, you could add strings, floats, and even other lists!



In [5]:

    
y.append("this is perfectly legal")
print(y)

y.append(4.2)
print(y)

y.append(list())  # Inception BWAAAAAAAAAA
print(y)









    



[1, 2, 3, 'this is perfectly legal']
[1, 2, 3, 'this is perfectly legal', 4.2]
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Indexing

So I have these lists and I've stored some things in them. I can print them out and see what I've stored...but so far they seem pretty unwieldy. How do I remove things? If someone asks me for whatever was added 3$^{rd}$, how do I give that to them without giving them the whole list?

The answers to these questions involve indexing. Indexing is what happens when you refer to an existing element in a list. For example, in our hybrid list y with lots of random stuff in it, what's the first element?



In [6]:

    
first_element = y[0]
print(first_element)
print(y)









    



1
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Python and its spiritual progenitors C and C++ are known as zero-indexed languages. This means when you're dealing with lists or arrays, the index of the first element is always 0.

This stands in contrast with languages such as Julia and Matlab, where the index of the first element of a list or array is, indeed, 1. Preference for one or the other tends to covary with whatever you were first taught, though in scientific circles it's generally preferred that languages be 0-indexed$^{[\text{citation needed}]}$.

This little caveat is usually the main culprit of errors for new programmers. Give yourself some time to get used to Python's 0-indexed lists. You'll see what I mean when we get to loops.

In addition to elements 0 and 1, we can also directly index elements at the end of the list.



In [7]:

    
print(y[-1])
print(y)









    



[]
[1, 2, 3, 'this is perfectly legal', 4.2, []]

You can think of this indexing strategy as "wrapping around" the list to the end of it. Similarly, you can also negate other numbers to access the second-to-last element, third-to-last element...



In [8]:

    
print(y[-2])
print(y[-3])
print(y)









    



4.2
this is perfectly legal
[1, 2, 3, 'this is perfectly legal', 4.2, []]

Another very useful method is len, which tells you how many elements are in your list.



In [9]:

    
num = len(y)
print(num)

Because lists in Python are 0-indexed, what would the positive integer index of the last element in a list be?

Slicing

Using more indexing voodoo, you can also index slices of lists. Let's say we want to create a new list that consists of the integer elements of y, which are the first three. We could pull them out one by one, or use slicing:



In [10]:

    
int_elements = y[0:3]  # Slicing!
print(y)
print(int_elements)









    



[1, 2, 3, 'this is perfectly legal', 4.2, []]
[1, 2, 3]



In [11]:

    
int_elements = y[0:3]  # Slicing!
print(y)
print(int_elements)









    



[1, 2, 3, 'this is perfectly legal', 4.2, []]
[1, 2, 3]

That y[0:3] notation is the slicing. The first number, 0, indicates the first index of values we want to keep. The colon : indicates slicing, and the second number, 3, indicates the last index of values.

You could even say this out loud: "With list y, slice starting at index 0 to index 3." The colon is the "to".

When you slice an array, the first (starting) index is inclusive; the second (ending) index, however, is exclusive. In mathematical notation, it would look something like this:

$[ starting : ending )$

Therefore, the end index is one after the last index you want to keep.

One more thing about lists

You don't always have to start with empty lists. You can pre-define a full list; just use brackets!



In [12]:

    
z = [42, 502.4, "some string", 0]
print(z)









    



[42, 502.4, 'some string', 0]

Sets and Tuples

If you understood lists, sets and tuples are easy-peasy. They're both exactly the same as lists...except:

Tuples:

Immutable. Once you construct a tuple, it cannot be changed.

Sets:

Distinct. Sets cannot contain two identical elements.
Unordered. Sets don't index the same way lists do.

Other than these two rules, pretty much anything you can do with lists can also be done with tuples and sets.

Whereas we used square brackets to create a list, we use regular parentheses to create a tuple!



In [13]:

    
x = [3, 64.2, "some list"]
print(type(x))

y = (3, 64.2, "some tuple")
print(type(y))









    



<class 'list'>
<class 'tuple'>

With lists, if you wanted to change the item at index 2, you could go right ahead. But with tuples, you'll get an error.



In [14]:

    
x[2] = "a different string"
print(x)

#y[2] = "does this work?"









    



[3, 64.2, 'a different string']

Like list, there is a method for building an empty tuple. And like lists, you have (almost) all of the other methods at your disposal, such as slicing and len:



In [15]:

    
z = tuple()

print(y[0:2])
print(len(y))









    



(3, 64.2)
3

Sets

Sets are interesting buggers, in that they only allow you to store a particular element once.



In [16]:

    
x = list()
x.append(1)
x.append(2)
x.append(2)  # Add the same thing twice.

s = set()
s.add(1)
s.add(2)
s.add(2)  # Add the same thing twice...again.



In [17]:

    
print(x)

print(s)









    



[1, 2, 2]
{1, 2}

There are certain situations where this can be very useful. It should be noted that sets can actually be built from lists, so you can build a list and then turn it into a set:



In [18]:

    
x = [1, 2, 3, 3]
s = set(x)  # Take the list x as the starting point.
print(s)









    



{1, 2, 3}

Sets also don't index the same way lists and tuples do:



In [19]:

    
# Literally causes an error:
#s[0]

If you want to add elements to a set, you can use the add method.

If you want to remove elements from a set, you can use the discard or remove methods.

But you can't index or slice a set.

So why is a `set` useful?

It's useful for checking if you've seen a particular kind of thing at least once. This is known as membership testing--and we'll wait just a few more slides before delving into it. First, one more data structure.

Dictionaries

The basic idea of all these data type abstractions is to map a key to a value, in such a way that if you have a certain key, you always get back the value associated with that key.

You can also think of dictionaries as unordered lists with more interesting indices.

A few important points on dictionaries before we get into examples:

Mutable. Dictionaries can be changed and updated.
Unordered. Elements in dictionaries have no concept of ordering.
Keys are distinct. The keys of dictionaries are unique; no key is ever copied. The values, however, can be copied as many times as you want.

Dictionaries are created using the dict() method, or using curly braces:



In [20]:

    
d = dict()
# Or...
d = {}

New elements can be added to the dictionary in much the same way as lists:



In [21]:

    
d["some_key"] = 14.3

d["shannon_quinn"] = ["some", "personal", "information"]
print(d)









    



{'shannon_quinn': ['some', 'personal', 'information'], 'some_key': 14.3}

Since dictionaries do not maintain any kind of ordering of elements, using integers as indices won't give us anything useful. However, dictionaries do have a keys() method that gives us a list of all the keys in the dictionary:

and a values() method for (you guessed it) the values in the dictionary:



In [22]:

    
print(d.keys())

print(d.values())









    



dict_keys(['shannon_quinn', 'some_key'])
dict_values([['some', 'personal', 'information'], 14.3])

To further induce Inception-style headaches, dictionaries also have a items() method that returns a list of tuples where each tuple is a key-value pair in the dictionary!

(it's basically the entire dictionary, but this method is useful for looping)



In [23]:

    
print(d.items())









    



dict_items([('shannon_quinn', ['some', 'personal', 'information']), ('some_key', 14.3)])

Now, back to why sets--or any data structure, really--are useful for testing if we've seen something before.



In [24]:

    
s = set([1, 3, 6, 2, 5, 8, 8, 3, 2, 3, 10])

print(10 in s)  # Basically asking: is 10 in our set?

print(11 in s)









    



True
False

Membership testing

in and not in can be used to see if an object or particular value is in a collection.



In [25]:

    
l = [1,2,3]



In [26]:

    
1 in l









    Out[26]:





True



In [27]:

    
345 in l









    Out[27]:





False



In [28]:

    
"1" not in l









    Out[28]:





True



In [29]:

    
"good" in "goodness"  # Yep, strings are considered "collections"!









    Out[29]:





True

Questions on Data Structures?

Lists
Sets
Tuples
Dictionaries
Membership testing

Part 2: Loops

Looping, like lists, is a critical component in programming and data science. When we're training models on data, we'll need to loop over each data point, examining it in turn and adjusting our model accordingly regardless of how many data points there are. This kind of repetitive task is ideal for looping.

The structure of loops is pretty simple:

some collection of "things" to iterate over
a placeholder for the current "thing" we're working on
a chunk of code describing what to do with the current "thing"



In [30]:

    
letters = ['a','b','c','d','e','f','g']
for i in letters: #for every item in this collection...
    print(i) #...execute this block of code with i set to the object









    



a
b
c
d
e
f
g

There are two main parts to the loop: the header and the body.

The header contains 1) the collection we're iterating over (in this example, the list), and 2) the "placeholder" we're using to hold the current value (in this example, i).
The body is the chunk of code under the header (indented!) that executes on each iteration.

`for` and `while`

There are two types of loops. for loops iterate over a collection of items; in contrast, while loops iterate while some condition evaluates to True.



In [31]:

    
i = 0
while i < 3: #as long as this condition is true...
    print(i) #...execute this block of code
    i += 1

Word of warning with while loops: Don't forget to update the condition variable!

Looping with indices (indexes?)

The preferred method of looping is to use a for loop. There are a number of builtin functions to help create collections to iterate over.



In [32]:

    
range(3)









    Out[32]:





range(0, 3)



In [33]:

    
range(1,10)









    Out[33]:





range(1, 10)



In [34]:

    
range(1,10,3)









    Out[34]:





range(1, 10, 3)

What's the value of val?



In [35]:

    
val = 0
for i in range(100):
    val += i



In [36]:

    
print(val)

`break` and `continue`

break will exit out of the entire loop, while continue will skip to the next iteration.



In [37]:

    
i = 0
while True:
    i += 1
    if i < 6:
        continue
    print(i)
    if i > 4:
        break

Part 3: Conditionals

Also known as "if statements". These conditionally execute a block of code.

The condition is some Boolean value
A block of code is delineated by consistent indentation.
Blank lines and comments will not end a block
Whitespace is significant - do not mix tabs and spaces

In an if..elif..else statement, only one block of code will be executed



In [38]:

    
if False:
    print("1")
elif True:
    print("2")
else:
    print("3")

Comparison Operators

Remember these? They return a Boolean value: <,>,!=,==,<=,>=



In [39]:

    
1 < 3









    Out[39]:





True



In [40]:

    
"hello" != "hi"









    Out[40]:





True



In [41]:

    
[1,2,3] == [1,2,3]









    Out[41]:





True



In [42]:

    
x = 3
y = 4
x >= y









    Out[42]:





False

What is the value of val?



In [43]:

    
val = 0
if val >= 0:
    val += 1
elif val < 1:
    val += 2
elif True:
    val += 3
else:
    val += 5
    
    val += 7



In [44]:

    
print(val)

Logical Operators

Logical operators are used to join multiple "sub-conditions" together, each evaluating to True or False, into one large condition that evaluates to one final True or False for the whole thing.

and, or, and not

and: All sub-conditions joined with and must be True for the full condition to be True.
or: Only one sub-condition joined with or needs to be True for the full condition to be True.
not: Flips the condition from True to False or from False to True.
Use parentheses to avoid confusion!



In [45]:

    
if x > 3 and y == 4:
    print("a")
elif x > 3 or y == 4:
    print("b")
elif not (x > 3 or y == 4):
    print("c")