Chapter 02: Data Structures and Functions

This lab sheet will move on to understanding how to write functions (very similar to mathematical functions) and lists (a way of holding data).

Tutorial

Work through the following:


1. Lists

A video describing the concept.

A video demo.

Another type of variable in Python (we have already seen numeric and character variables) is the list. This type acts as a container that can hold multiple other items in an ordered way.

Here is a list of my favourite numbers:


In [1]:
favourite_numbers = [9, 12, 13, 7]
favourite_numbers


Out[1]:
[9, 12, 13, 7]

In [2]:
type(favourite_numbers)


Out[2]:
list

We can do various things to the items in the list:


In [3]:
sum(favourite_numbers)  # Adding all the elements of our list


Out[3]:
41

In [4]:
max(favourite_numbers)  # Getting the largest element of a list


Out[4]:
13

In [5]:
min(favourite_numbers)  # Getting the minimum element of a list


Out[5]:
7

In [6]:
favourite_numbers.append(-100)  # Add another element to a list
favourite_numbers


Out[6]:
[9, 12, 13, 7, -100]

We can also go in to our lists and get specific items. This works just as it did with strings:


In [7]:
favourite_numbers[0]  # Getting the first element of a list


Out[7]:
9

In [8]:
favourite_numbers[1]  # Getting the second element of a list


Out[8]:
12

In [9]:
favourite_numbers[-1]  # Getting the last element of a list


Out[9]:
-100

In [10]:
favourite_numbers[2:4]  # Getting the 3rd till the 4th element of a list


Out[10]:
[13, 7]

Strings (see in in the previous lab sheet) and lists are similar in that they are 'containers' for items. A lot of the manipulation that works with lists also works with strings:


In [11]:
firstname = "Vince"
len(firstname)  # How many characters are in the variable


Out[11]:
5

In [12]:
firstname[0]  # We can point at individual characters, 0 is the first


Out[12]:
'V'

In [13]:
firstname[4]


Out[13]:
'e'

In [14]:
firstname[-1]  # We can use negative number to start counting from the end


Out[14]:
'e'

In [15]:
firstname[0:4]  # We can 'slice' strings


Out[15]:
'Vinc'

Experiment by creating lists and manipulating them.

2. For loops.

A video describing the concept.

A video demo.

In the previous sheet, we saw that a while loop can be used to repeat a code chunk until a boolean condition is False. It is also possible to repeat an action for all elements of a list.


In [16]:
items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # Creating a list
total = 0
for item in items:
    total += item
total


Out[16]:
55

A quick way to get a list of integers is the range command:


In [17]:
for k in range(10):
    print(k)


0
1
2
3
4
5
6
7
8
9

Experiment by summing (or perhaps multiplying?) over a different list of items.

3. List comprehension

A video describing the concept.

A video demo.

Here is a list of squares, done using the append method and a for loop.


In [18]:
total = 0
square_of_favourite_numbers = []  # Create an empty list
for n in favourite_numbers:
    square_of_favourite_numbers.append(n ** 2)
square_of_favourite_numbers


Out[18]:
[81, 144, 169, 49, 10000]

We can however do this using some nice Python syntax called a list comprehension:


In [19]:
square_of_favourite_numbers = [x ** 2 for x in favourite_numbers]
square_of_favourite_numbers


Out[19]:
[81, 144, 169, 49, 10000]

This is familiar, as it replicates mathematical notation. For example here is how to get the sum of the elements in the following set:

$$ \{n \in S \;| \text{ if } n \text{ is divisible by 2}\} $$

(This is mathematical notation for "the set of all things in $S$ that are divisible by 2.)


In [20]:
sum([n for n in favourite_numbers if n % 2 == 0])


Out[20]:
-88

Experiment by modifying this code to create a different list.

4. Writing simple functions.

A video describing the concept.

A video demo.

Often we want to be able to use code more than once. The way to do this is to write a function. Here is a very simple example. This creates a function that returns a string saying "Hello world":


In [21]:
def say_hi():
    return "Hello world"

Now, to use that function we need to call the function, which we do using a pair of brackets: ().


In [22]:
say_hi()


Out[22]:
'Hello world'

It is good practice to break down code in to smaller functions that make it easier to read.

Experiment with changing what the say_hi function says.

5. Functions with variables.

A video describing the concept.

A video demo.

It is more useful to include variables in our functions (in the exact same way as for mathematical functions!).

Let us revisit the mathematical function we described in the previous lab sheet:

$$ f(n)=\begin{cases} 1&\text{ if } n\leq 5\\ 2&\text{ if } n> 5\text{ and } n \text{ even}\\ 3&\text{ otherwise }\\ \end{cases} $$

Here is the code that defines this function (compare it to the code we wrote in the previous lab sheet):


In [23]:
def f(n):
    """
    This is text in between triple quoatation marks is a doc string.
    We use it to describe what a function does. For example here we would
    write: This function returns f(n) as described above.
    """
    if n <= 5:
        return 1
    elif n % 2 == 0:  # Otherwise if (else if)
        return 2
    else:  # Otherwise
        return 3
f(11)


Out[23]:
3

We can also have functions with more than 1 variable:


In [24]:
def simple_sum(a, b):
    """Returns the sum of a and b"""
    return a + b
simple_sum(5, 7)


Out[24]:
12

It is also possible to have default variables:


In [25]:
def simple_sum(a=5, b=1):
    """Returns the sum of a and b"""   
    return a + b

In [26]:
simple_sum()


Out[26]:
6

In [27]:
simple_sum(b=2)


Out[27]:
7

In [28]:
simple_sum(a=3)


Out[28]:
4

Experiment by creating your own function.


Worked example

A video describing the concept.

A video demo.

The Fibonacci numbers are defined by the following mathematical formula:

$$ f_n = \begin{cases} 1,\text{ if } n \in \{0, 1\}\\ f_{n - 1} + f_{n - 2}, \text{ otherwise} \end{cases} $$

The goal of this question is to verify the following theorem about the sum of the first Fibonacci numbers:

$$ \sum_{i=0}^n f_i = f_{n + 2} - 1 $$

As an example with $n=3$ we have: $f_0=f_1=1, f_2=2, f_3=3$ and $f_5=8$. We indeed have:

$$ \sum_{i=0}^nf_i = 1 + 1 + 2 + 3 = 7 = 8 - 1 = f_{3 + 2} - 1 $$

We are going to automate checking the formula using Python. Let us first write a function for the Fibonacci sequence itself:


In [29]:
def fibonacci(n):
    """Returns the n th fibonacci number"""
    if n in [0, 1]:  # The special case of n being 0 or 1
        return 1
    a = 1  # Setting the starting values
    b = 1
    for i in range(n - 1):  # Going through the first n - 1 terms
        temp = b  # A temporary variable to remember the value of b
        b = a + b  # New value of b
        a = temp  # New value of a (old value of b)
    return b

Let us now call this function to check that it is correct:


In [30]:
for n in range(5):
    print(fibonacci(n))


1
1
2
3
5

We can now obtain a list of the first $K$ fibonacci numbers and check the theorem:


In [31]:
K = 3
list_of_fib = [fibonacci(n) for n in range(K + 1)]
sum(list_of_fib) == fibonacci(K + 2) - 1


Out[31]:
True

We can put this code that checks if a relationship is true in to a new function:


In [32]:
def check_theorem(K):
    """
    Check the relationship for the sum of the first K fibonacci numbers
    """
    list_of_fib = [fibonacci(n) for n in range(K + 1)]
    return  sum(list_of_fib) == fibonacci(K + 2) - 1

Let us now check that our theorem can be checked for the first 200 values of $K$:


In [33]:
checks = [check_theorem(K) for K in range(200)]
all(checks)  # `all` combines all booleans in a list


Out[33]:
True

Exercises

Here are a number of exercises that are possible to carry out using the code concepts discussed:

  • Lists, for example: odd_numbers = [1, 3, 5, 7]
  • For loops, for example:
    for i in odd_numbers:
          print(i ** 2)
    
  • List comprehensions, for example: square_of_odd_numbers = [n ** 2 for n in odd_numbers]
  • Functions, for example:
    def division(a=4, b=1):
          return a / b
    

Exercise 1

Debugging exercise

The following is an attempt to write $n!$ as a function. Find and fix all the bugs.

def factorial(n):
       """A function that returns factorial n"""
       for i in range(n)
           prod *= i

In [34]:
def factorial(n):
    """A function that returns factorial n"""
    prod = 1  # Need to initiate this dummy variable.
    #for i in range(n)  # Missing the `:`, not going in correct range
    for i in range(1, n + 1):
        prod *= i
    return prod  # Not returning

A quick check:


In [35]:
for n in range(5):
    print(n, factorial(n))


0 1
1 1
2 2
3 6
4 24

Exercise 2

Write a function that returns the triangular numbers $T_n$:

$$ T_n=\frac{n(n+1)}{2} $$

In [36]:
def triangular(n):
    """Return the nth triangular number"""
    return n * (n + 1) / 2

In [37]:
for n in range(10):
    print(triangular(n))


0.0
1.0
3.0
6.0
10.0
15.0
21.0
28.0
36.0
45.0

Exercise 3

Use code to check that the following relationship is true:

$$ \sum_{i=0}^{n}T_i=\frac{n(n+1)(n+2)}{6} $$

In [38]:
def check_theorem(K):
    """
    Check the relationship for the sum of the first K triangular numbers
    """
    list_of_tri = [triangular(n) for n in range(K + 1)]
    return  sum(list_of_tri) == K * (K + 1) * (K + 2) / 6

In [39]:
check_theorem(4)


Out[39]:
True

Here we are checking the first 200 numbers:


In [40]:
checks = [check_theorem(K) for K in range(200)]
all(checks)  # `all` combines all booleans in a list


Out[40]:
True

Exercise 4

Create a list with the first 1300 integers divisible by 3. What is the largest such number? What is the smallest such number? What is the mean of these numbers?


In [41]:
divisibles = []
upper_limit = 1300
n = 0
while len(divisibles) < upper_limit:
    n += 1
    if n % 3 == 0:
        divisibles.append(n)

In [42]:
max(divisibles)  # largest number


Out[42]:
3900

In [43]:
min(divisibles)  # smallest number


Out[43]:
3

In [44]:
sum(divisibles) / len(divisibles)  # mean number


Out[44]:
1951.5

Exercise 5

Investigate the use of the Python random library. Use this to simulate the Monty hall problem:

  • Write a function that simulates the play of the game when you 'stick' with the initial choice.
  • Write a function that simulates the play of the game when you 'change' your choice.
  • Repeat the play of the game using both those functions and compare the probability of winning.

In [45]:
import random  # importing the random library

In [46]:
random.random()  # Thi gives a random number between 0 and 1


Out[46]:
0.3174674416502159

Note that these are not true random numbers: they actually follow a sequence that is not easy to predict and behaves similarly to random numbers. It is possible to state that we want the same sequence, we do this by setting a random seed:


In [47]:
random.seed(1)  # This takes any integer
random.random(), random.random()


Out[47]:
(0.13436424411240122, 0.8474337369372327)

Let us call the above again:


In [48]:
random.seed(1)  # We use the same seed as before
random.random(), random.random()  # We get the same numbers


Out[48]:
(0.13436424411240122, 0.8474337369372327)

We can also use this to make random choices from a list:


In [49]:
doors = ['A', 'B', 'C']
random.choice(doors)


Out[49]:
'A'

In [50]:
def stick():
    """A function to simulate a play of the game when we stick"""
    wins = 0 

    doors = ['Car', 'Goat', 'Goat']
        
    initial_choice = random.choice(doors)  # make a choice
    return initial_choice == 'Car'

In [51]:
stick()


Out[51]:
False

In [52]:
def switch():
    """A function to simulate a play of the game when we swap"""
    doors = ['Car', 'Goat', 'Goat']
    
    initial_choice = random.choice(doors)  # make a choice
    doors.remove(initial_choice)  # Switching: remove initial choice
    doors.remove('Goat')  # The game show host shows us a goat
    
    new_choice = doors[0]  # We choose our one remaining option
            
    return new_choice == 'Car'

In [53]:
switch()


Out[53]:
False

We can now run a number of repetitions of the above and get the probability of winning:


In [54]:
random.seed(0)
repetitions = 10000
prob_win_stick = sum([stick() for rep in range(repetitions)]) / repetitions
prob_win_switch = sum([switch() for rep in range(repetitions)]) / repetitions
prob_win_stick, prob_win_switch


Out[54]:
(0.3346, 0.6649)

Exercise 6

A data type that we have not considered are dictionaries. These are a specific type of what is generally called a 'hash table'. Find information about dictionaries and experiment with them.

Dictionaries are a data structure that let us map keys to values. This is useful when need to go for one set of values to another. This is in effect an programming implementation of a bijective function.

As an example let's say we wanted a data structure that mapped names of people to their student number:


In [55]:
names_to_numbers = {"Vince": 1, "Zoe": 2, "Julien": 3, "Marie": 4}
names_to_numbers


Out[55]:
{'Vince': 1, 'Zoe': 2, 'Julien': 3, 'Marie': 4}

We can get the value of a particular key:


In [56]:
names_to_numbers['Marie']


Out[56]:
4

We can manipulate that value as always:


In [57]:
names_to_numbers['Marie'] += 1
names_to_numbers


Out[57]:
{'Vince': 1, 'Zoe': 2, 'Julien': 3, 'Marie': 5}

If we try and get a key that is not in the dictionary we get an error but we can add key/value pairs to dictionaries:


In [58]:
names_to_numbers["Jon"] = 5
names_to_numbers


Out[58]:
{'Vince': 1, 'Zoe': 2, 'Julien': 3, 'Marie': 5, 'Jon': 5}