Chapter 03: Reading and writing to file, recursion and algorithms

This lab sheet will allow us to read and write to text files (which lets us store information even after we close down Python) and introduce recursion.

Tutorial

Work through the following:


1. Recursion.

A video describing the concept.

A video demo.

It is possible to define functions recursively. This is similar to inductive proofs in mathematics. To do this we need to consider two things:

  1. A recursive rule;
  2. A base case (that the recursive rule reduces to).

As an example let us code the following function:

$$ f(n) = \sum_{i=0}^ni $$

We can see the 'recursive rule':

$$ f(n) = f(n-1) + n $$

We also know that the 'base case' is:

$$ f(0) = 0 $$

Now let us program this function using recursion:


In [1]:
def sum_of_integers(n):
    """Sum of integers from 0 to n"""
    if n == 0:  # Base case
        return 0
    return sum_of_integers(n - 1) + n  # Recursive rule
sum_of_integers(3)


Out[1]:
6

Here is another example. The factorial function $n!=n\times(n-1)\times(n-2)...2\times 1$ can be defined as:

  1. The base case: $0!=1$
  2. The recursive rule: $n!=n\times (n-1)!$

Here is how to code this:


In [2]:
def factorial(n):
    """Returns n! using recursion"""
    if n == 0:  # Base case
        return 1
    return n * factorial(n - 1)

In [3]:
for n in range(6):
    print(factorial(n))


1
1
2
6
24
120

Experiment with modifying the above code.

2. Writing to file.

A video describing the concept.

A video demo.

All of the data we handle with variables, lists and dictionaries lives in the ‘memory’ of a computer when our Python code is running. When the program stops running the data is lost. There will be occasions when we want to write our data to a file on the hard drive of a computer (so that it is always available even when we turn the computer off).

To do this we need Python to open a file (usually a basic text file), write strings to the text file and then close the file. The following code opens (or creates a) text file in ‘write mode’ (that’s what the w is for) and writes the numbers 1 to 10 to it:


In [4]:
textfile = open('mytextfile.txt', 'w')  # Open the file in write mode
for i in range(1, 11):
    textfile.write(str(i) + "\n")
textfile.close()  # Close the file

It is also possible to do it as below (so that we don't have to worry about closing the file).


In [5]:
with open('mytextfile.txt', 'w') as textfile:
    for i in range(1, 11):
        textfile.write(str(i) + "\n")

Modify the above code to write something different to file.

3. Reading files.

A video describing the concept.

A video demo.

To read data from a file, we need to open the file in ‘read mode’:


In [6]:
with open('mytextfile.txt', 'r') as textfile:
    string = textfile.read()

In [7]:
string


Out[7]:
'1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n'

This string is not particularly helpful. To transform the string to a list we can use the split method which separates a string on a given character:


In [8]:
data = string.split('\n')
data


Out[8]:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '']

We see that we have a list of strings and also have a last empy item. Here we clean that up (converting the strings to integers and ignoring the last item):


In [9]:
data = [int(e) for e in data[:-1]]
data


Out[9]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Worked example

A video describing the concept.

A video demo.

The Fibonacci sequence can be thought of as:

  1. The base case: $f(0)=f(1)=1$
  2. The recursive rule: $f(n)=f(n-1)+f(n-2)$

Here is how to code this:


In [10]:
def fibonacci(n):
    """Returns the nth fibonacci number using recursion"""
    if n in [0, 1]:  # Base case
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)  # Recursive rule

In [11]:
for n in range(10):
     print(fibonacci(n))


1
1
2
3
5
8
13
21
34
55

It can be shown that the Fibonacci sequence can also be given by:

$$ f(n) = \frac{\phi^{(n + 1)} - (-\phi)^{-(n + 1)}}{\sqrt{5}} $$

Where $\phi$ is the golden ratio:

$$ \phi = \frac{1 + \sqrt{5}}{2} $$

Let us use the recursive function to verify that this formula is correct for the first 30 values of $n$:


In [12]:
import math  # This is a library that has various helpful function
phi = (1 + math.sqrt(5)) / 2
def alt_fibonacci(n):
    """Return the nth fibonacci number using the golden ratio formula"""
    return (phi ** (n + 1) - (-phi) ** (-(n + 1))) / (math.sqrt(5))

As we did in the previous sheet let us write a function to check the formula:


In [13]:
def check_formula(K):
    """Check the golden ratio formula"""
    return fibonacci(K) == round(alt_fibonacci(K), 1)
check_formula(3)


Out[13]:
True

Let us check this for the first 30 values:


In [14]:
checks = [check_formula(K) for K in range(30)]
all(checks)  # `all` combines all booleans in a list


Out[14]:
True

We can then use the alternative formula to write a large file with the first 500 Fibonacci numbers:


In [15]:
with open('fibonacci.txt', 'w') as textfile:
    for n in range(500):
        fib_number = int(round(alt_fibonacci(n), 0))  # Rounding the number
        textfile.write(str(fib_number) + "\n")  # Writing it

Exercises

Here are a number of exercises that are possible to carry out using the code concepts discussed:

  • Recursion, writing code that matchs mathematical recursive definitions. For example $a(0)=1$ and $a(n)=2a(n-1)$:

    def sequence_a(n):
          if n == 0:  # base case
             return 1  
          return 2 * a(n - 1)  # recursive relationship
    
  • Writing to file with open("file.txt", "w") and reading from file with open("file.txt", "r")


Exercise 1

Debugging exercise

The following is an attempt to write $n!$ as a recursive function. Find and fix all the bugs.

def factorial(n):
    """A function that returns factorial n"""
    return n * factoial(n)

In [16]:
def factorial(n):
    """A function that returns factorial n"""
    # Need a base case:
    if n == 0:
        return 1
    #return n * factoial(n)  # Typo in 'factorial' and not correct value.
    return n * factorial(n - 1)

A quick check:


In [17]:
for n in range(5):
    print(n, factorial(n))


0 1
1 1
2 2
3 6
4 24

Exercise 2

Write a recursive function that returns the Catalan numbers $C(n)$ defined by the following:

  1. Base case: $C(0)=1$;
  2. Recursive rule: $C(n)=\sum_{i=0}^{n-1}C(i)C(n-1-i)$

In [18]:
def catalan(n):
    """Returns the nth Catalan number using recursion"""
    if n == 0:
        return 1
    return sum([catalan(i) * catalan(n - i - 1) for i in range(n)])

Here is the first 5 terms:


In [19]:
for n in range(5):
    print(catalan(n))


1
1
2
5
14

Exercise 3

Verify for the first 15 values of $n$ that the following alternative formula also gives the Catalan numbers (this is in fact the more commonly used formula):

$$ C(n) = \frac{(2n)!}{(n+1)!n!} $$

Write the first 500 catalan numbers to file.

You can use the factorial function we defined in this lab sheet (Exercise 1) or you can use the math library: math.factorial.


In [20]:
import math  # Use the factorial from the math library

def alt_catalan(n):
    """An alternative formula for the Catalan numbers"""
    return math.factorial(2 * n) / (math.factorial(n + 1) * math.factorial(n))

Here is a function to check the formula:


In [21]:
def check_formula(n):
    """
    Check the value for the nth Cataln number
    """
    return  catalan(n) == alt_catalan(n)

Here we are checking the first 15 numbers:


In [22]:
checks = [check_formula(n) for n in range(15)]
all(checks)  # `all` combines all booleans in a list


Out[22]:
True

Exercise 4

The file primes.csv (download) contains the first million prime numbers. Use it to attempt to verify Goldbach's conjecture.

Let us first check Goldbachs conjecture using a small list of primes.


In [23]:
primes = [2, 3, 5, 7, 11]

In [24]:
def check_goldbach(n, primes):
    """
    Checks that n (if it is even) can be written as 
    the sum of two numbers in the list `primes`
    """
    if n % 2 == 1:
        # If n is odd than the conjecture is still true
        return True
    for p1 in primes:
        for p2 in primes:
            if p1 + p2 == n:
                return True, p1, p2
    # If we loop over all primes and do not obtain 
    # the required sum then the conjecture is false
    return False

In [25]:
check_goldbach(6, primes)


Out[25]:
(True, 3, 3)

In [26]:
check_goldbach(10, primes)


Out[26]:
(True, 3, 7)

We can check a few numbers as above but after a while we won't have enough prime numbers, for example with $n=20$ we get a failure:


In [27]:
check_goldbach(20, primes)


Out[27]:
False

Let us read in the large number of primes:


In [28]:
with open("primes.csv", 'r') as outfile:
    data = [int(prime) for prime in outfile.read().split()]

Let us look at the last 10 entries:


In [29]:
data[-10:]


Out[29]:
[15485761,
 15485773,
 15485783,
 15485801,
 15485807,
 15485837,
 15485843,
 15485849,
 15485857,
 15485863]

We can now confirm that $n=20$ is not a counter example to Goldbach's conjecture:


In [30]:
check_goldbach(20, data)


Out[30]:
(True, 3, 17)

Let us check the first 200 integers that are even:


In [31]:
upper_count = 200
count = 0 
n = 2
bools = []
while count < upper_count:
    n += 1
    if n % 2 == 0:
        count += 1
        bools.append(check_goldbach(n, data)[0])  # Appending the first element

Let us check all our booleans:


In [32]:
all(bools), len(bools)


Out[32]:
(True, 200)

The approach we have taken is called a 'brute force' approach. Can you think of a smarter way of writing check_goldbach?