Lesson 3: Loops and File Reading/Writing


Table of Contents

  1. Conditionals II: Loops
  2. The while loop
  3. The for loop
  4. File reading
  5. File writing
  6. Test your understanding: practice set 3

1. Conditionals II: Loops


In the last lesson, we looked at one way our code can make decisions for us using conditional if / else statements. Today, we'll look at the other main type of conditional available in Python: loops. Loops do pretty much exactly what you'd expect based on the name -- they let you "loop" over a piece of code over and over until a certain condition has been met, or we run out of things to compute on.

Let's look at an example to start off. Let's say you wanted to generate 10 random numbers -- how would you do this? Based on what we've learned so far, all we can really do is copy and paste random.randint(0,1) ten times, like so:


In [ ]:
import random
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)
print random.randint(0,1)

But as you'll see today, we can accomplish the same thing like this:


In [ ]:
import random

count = 0
while count < 10:
    print random.randint(0,1)
    count = count + 1

..or this:


In [ ]:
import random

for i in range(10):
    print random.randint(0,1)

These two code blocks are called the while loop and for loop, respectively. Much like the if / else statement, these blocks allow our code to make decisions for us based on some condition. The reason we have two different types of loops is because they are designed with different situations in mind. The for loop is specialized for cases where you know exactly how many times you want to loop in advance, and makes it very easy to "loop over" the elements of almost any iterable object (more on that in a minute). while loops, on the other hand, are more general and are better for when you want to loop indefinitely until some condition is met.

2. The while loop


The while loop allows us to repetitively execute the same code until some condition is met. The syntax is similar to the if / else statement, in that we're going to test some conditional statement to see if it is True or False. As long as the conditional is True, we keep looping!


[ Definition ] The while loop

Purpose: execute a block of code until the conditional statement becomes False.

Syntax:

while conditional:
    this indented code will execute until the conditional becomes False

Example:


In [ ]:
x = 0
while x < 4:
    print "Looped!"
    x = x + 1

Any kind of conditional that works with an if statement is also valid for a while loop. However, we have to be careful when we design our loops to make sure that at some point the conditional will become False. If it doesn't, then our code will literally keep looping until the end of the universe (or we manually terminate the program!). This is called an "endless loop" (see the Lesson3 extra material notebook for some examples).

while loops in action

Below are several examples of code using while loops. For each code block, first try to guess what the output will be, and then run the block to see the answer.


In [ ]:
x = 0
while x < 2:
    print "pizza"
    x = x + 1

In [ ]:
x = 0
while x < 4:
    print x
    x = x + 1

In [ ]:
x = 0
while x < 4:
    x = x + 1
    print x

In [ ]:
x = 0
while x < 4:
    x = x + 1
print x

In [ ]:
import random

x = True
while x:
    print "x is still True..."
    
    if random.randint(0,4) == 0:
        x = False

A more useful example: the number guessing game

Here's an example of how we could use a while loop to create a number guessing game. The reason a while loop is useful for this situation is because we don't know ahead of time how many guesses the user will need before they guess the number. The loop makes it easy to handle this:


In [ ]:
secretNumber = 56
notGuessed = True

while (notGuessed):
    guess = int(raw_input("What number am I thinking of (between 0 and 100)? "))
    if (guess == secretNumber):
        print "Wow, you got it!"
        notGuessed = False
    else:
        print "Wrong, guess again."

Let's break down what's happening here:

  • notGuessed is initially True, so we enter the loop.
  • We ask the user to input a number, which we convert to an int and compare against our "secret" number.
  • If the user guessed correctly, we simply set notGuessed to False. This makes the loop's conditional False, so we will stop looping once the conditional is checked again (this will happen once the rest of the code in the block is completed).
  • If the user guessed wrong, we leave notGuessed as True, and therefore repeat the loop again.

3. The for loop


The for loop works fairly differently. The main purpose of the for loop is to provide an easier way to (A) loop a specific number of times, and (B) loop over some iterable object and run the same code on each item in the iterable (e.g. every character in a string, every element in a list, or every line in a file). In fact, we can do both of these things with a while loop, too, but the for loop makes it a little more convenient.


[ Definition ] The for loop

Purpose: execute a block of code a specific number of times or loop over an iterable object.

Syntax:

for var in iterable:
    this indented code will execute until we run out of things in the iterable

Notes:

  • iterable - Anything that you can iterate over, including lists, strings, files, and dictionaries. The actual unit of iteration (i.e. the subunits of the iterable) is pre-defined by Python and depends on the type of the iterable. For strings, the unit is characters. For lists, the unit is elements of the list. We'll talk about others as they come up.
  • var - A variable. You can name this anything you want, and you do not need to declare/define it beforehand.
  • What happens here is that var takes on each value in the iterable, one at a time. We can then use that value in the code of the loop by using var. Each time the loop loops, the previous value of var is overwritten with the next value in the iterable. When there are no more values in the iterable, the loop ends.

Examples:


In [ ]:
for i in [1, "A", 45, True]:
    print i

In [ ]:
for i in "Hello!":
    print i

In [ ]:
for i in range(5):
    print i

Ways of using the for loop

Unfortunately, we can't really make full use of for loops just yet, since we've only actually discussed one type of iterable so far -- strings. So for now, we'll stick to a discussion of using strings and a simple new function called range() to build our for loops. Later in this lesson, we'll also talk about how to use for loops to read from files, and in Lesson 4 we'll talk about the helpful data structures called lists (arrays) and dictionaries (hash tables).

Iterating over strings

As mentioned above, Python considers any string to be a valid "iterable" object. The unit of iteration for a sting is the character. This gives us a super easy way to operate on the characters of any string. For example, in the context of genomics, we often need to look at each nucleotide in a DNA string:


In [ ]:
for nt in "ATGCCTAG":
    print nt

Or, to give a more useful example, we may want to do something like count the number of times a certain nucleotide appears:


In [ ]:
sequence = "ATGGTCGATCGGTCGGGCTCGGGATATTACCGCGCGCGCGATGGCTAGGGGGG"
count = 0

for nt in sequence:
    if nt == "G":
        count = count + 1

print "Found", count, "G's"

We can also use this as a roundabout way of just looping a certain number of times:


In [ ]:
stringOfLength5 = "AAAAA"

for char in stringOfLength5:
    print "This will print 5 times."

Here, we just ignored the value of char in the loop, and just made use of the fact that the loop would loop a pre-defined number of times (equal to the number of characters in the string).

There are a lot of cases where doing this sort of thing could be quite useful. For example, at the very beginning of the lesson, we wanted to generate a certain number of random numbers. Luckily, there's an easier way to do this that doesn't involve generating strings of different lengths!

Iterating a certain number of times

Now it's time to introduce a new function called range(). You've already seen it in action a few times in this lesson, but now we'll look at it a little more closely. On the most superficial level, all you need to know is that "range(x)" will cause the for loop to repeat x times. For example:


In [ ]:
for i in range(5):
    print "This will print 5 times."

Digging down a little deeper, though, what range(x) actually does is create a list of numbers from 0 to x-1:


In [ ]:
print range(5)

We haven't talked about lists yet, so don't worry about this too much yet. All you need to know is that a list is an iterable, so we can use it in a for loop. The unit of iteration in a list is the elements of the list. So this means that in addition to just using range() to loop a certain number of times, we can also use it to generate numbers to use in our loop:


In [ ]:
for i in range(5):
    print i

for loops in action

Below are several examples of code using for loops. For each code block, first try to guess what the output will be, and then run the block to see the answer.


In [ ]:
for i in range(4):
    print i

In [ ]:
for i in range(4):
    print i * 2

In [ ]:
count = 0
for i in range(4):
    count = count + 1
print count

Here we've essentially creating a counter. We tend to want to do a lot of counting in programming, so keep this example in mind.


In [ ]:
count = 0
for i in range(4):
    count = count + i
print count

This is similar to the counter example above, but instead of incrementing by 1 every time, we're summing up various numbers. This is sometimes called an accumulator. Like the counter, it's useful in many different contexts, so keep it in mind.


In [ ]:
count = 0
for nt in "CTCCAGGG":
    if nt == "C":
        count = count + 1
print count

In [ ]:
oldSeq = "ATG"
newSeq = ""
for nt in oldSeq:
    newSeq = newSeq + nt + "*"
print newSeq

This is sort of like an accumulator for strings. We can build up a string in a loop by repeatedly concatenating characters to an existing string.

Important: Don't concatenate onto the original string as you iterate over it. This is bad form and could cause weird results. Just create a new string. (So in this example, we should not modify oldSeq while inside the loop. Modifying newSeq is fine, though.)

Loop wrap-up: So which kind of loop should I use???

You can usually use either type of loop, but one will feel a lot more natural and be easier to code. That's the one you should go with. In general:

Use a for loop when:

  • You know exactly how many times you need to loop (use range())
  • You want to process each character of a string, item in a list, or line of a file (as we'll see next!)

Use a while loop when:

  • You need to loop until some condition is fulfilled, but you don't know when that will happen

4. File reading


Reading from and writing to files is something you'll probably be doing a lot in your work. Luckily, Python makes it super easy using the for loop.

There are 3 basic steps of file reading:

  1. Open the input file
  2. Read in data line by line, do some processing
  3. Close the input file

Let's start with an example.

In the same directory as this notebook, you should have a file called "genes.txt". This file contains a simple list of 7 gene names, as follows:

uc007afd.1
uc007aln.1
uc007afr.1
uc007atn.1
uc007bcd.1
uc007bmh.1
uc007byr.1

Often, we will want to read a file like this into our code so that we can use the data. This can be done with a simple for loop, like so:


In [ ]:
# Read and print genes.txt
fileName = "genes.txt"

inFile = open(fileName, 'r')
for line in inFile:
    print line
inFile.close()

A little explanation of what this code is doing:

inFile = open(fileName, 'r')

open() returns a link to the indicated file. We store this link in a variable (here, we called this variable inFile, but you could call it anything) so that we can use it to read from the file. The 'r' indicates that we want to open this file in read mode (as opposed to write mode, which is 'w' as we'll see in a minute).

for line in inFile:
    print line

A file is considered an iterable by Python, so we can loop over it directly with a for loop! The unit of iteration in files is a line, so each time we loop, a single line is copied to the loop variable (here, "line"). We can then do some processing of that line before we move on to the next one, but for now we'll just print it out to keep things simple. The lines of the file will always be read from first to last.

inFile.close()

This closes the link to the file. It is considered good programming practice to always close files when you are done with them. Make sure to put this outside of the for loop block -- otherwise you'll close the file while still looping through it!

Important side note: The newline character (\n)

You may have noticed that in the output of the code above, there was an extra space printed between each line. Why did this happen?

To understand this, we first need to introduce the newline character, '\n'.


[ Definition ] \n

Purpose: The newline character \n marks the end of each line of text in a file.

Notes:

  • Although \n is made up of two characters ( \ and n), when put together they are considered a single character by most programming language and text editors.
  • When viewing a file in a text editor, the \n characters will usually all be invisible, so you won't see them. However, the editor will use the locations of the \n characters to figure out where the end of each line should be. If you somehow removed all the \n from your original file, your editor would think everything should be on the same line!
  • Different operating systems use different line endings. Linux and modern Macs use \n, so this is what you'll see most often. Windows uses \r\n as the line ending, however, which can cause issues sometimes.
  • Important: When we use the Python print statement, it automatically adds a \n character to the end of whatever we print. That's why every print statement always prints to a separate line!

Examples:


In [ ]:
print "Hello\nWorld"

In [ ]:
print "Hello\n\nWorld"

Now that we understand \n... what was going on in the previous example?

When we read each line of the file, there is a \n on the end of each line. Like a text editor, Python uses the location of each \n to figure out where one line ends and the next begins. Importantly, though, when Python reads in a given line, it includes the \n (or \r\n) with it as part of the string. Then, when we go to print out the line using print, the print statement adds another \n on the end! This is what causes the double spacing – we technically have \n\n on the end of each string.

As we'll see later, it can be very problematic to have these pesky \n characters at the end of every line we read in. Therefore, we'll almost always want to remove them before do any computations or processing of the line. To do this, we'll add the following line to our code:

line = line.rstrip('\r\n')

What rstrip() does is strip the indicated characters from the right side of the string, if and only if they are present (so if for some reason there was no \r or \n, this would do nothing). It will strip all the characters indicated, even if they are not in the indicated order. We'll strip off both \r and \n, just to be safe.

Here's the same code, but with \n removal. You can use this as a general template for almost all file reading!


In [ ]:
# Read and print genes.txt
fileName = "genes.txt"

inFile = open(fileName, 'r')
for line in inFile:
    line = line.rstrip('\r\n')
    print line
inFile.close()

5. File writing


File writing follows a similar pattern to reading. The steps are:

  1. Open the output file (automatically created if it doesn't exist)
  2. Write to the file
  3. Close the output file

Here is a simple example of file writing:


In [ ]:
# print some text to a new file
fileName = "output.txt"
outFile = open(fileName, 'w')

outFile.write("This is me,")
outFile.write("printing to \n a file.")

outFile.close()

If you run this code, it should print a new file called output.txt to your current directory (probably the directory where this notebook is stored). Go and take a look at it now. It should look like this:

This is me,printing to 
 a file.

Let's go over what this code is doing, line by line.

fileName = "output.txt"
outFile = open(fileName, 'w')

Here, we opened output file called "output.txt". Note that this looks just like file reading, except we used 'w' instead of 'r' in the open() function. This tells the function that you want this to be an output file. Note that if this file does not yet exist, Python will create it for you. More importantly, though, if the file does already exist, Python will completely overwrite it! So be careful.

outFile.write("This is me,")
outFile.write("printing to \n a file.")

The .write() function is what allows us to specify what text should be written. We can call this function as many times as we want while the output file is open to print things. Important note: unlike print, .write() does not automatically add a newline \n to the end of each line you print! You have to manually add these yourself. If you look at the output of this code, you'll see that the line break occurs only where we specified the \n, and nowhere else!

outFile.close()

Finally, as with file reading, it's good practice to always close your file connection when you are finished. Strange things can happen if you don't, such as incomplete printing.

You can only print strings to a file!

If you want to print numerical data (ints or floats), you will need to convert them to strings before using .write(). For example:


In [ ]:
fileName = "output2.txt"
outFile = open(fileName, 'w')

outFile.write(25)

outFile.close()

In [ ]:
# The simple fix: use str()
fileName = "output2.txt"
outFile = open(fileName, 'w')

outFile.write(str(25))

outFile.close()

6. Test your understanding: practice set 3


For the following blocks of code, first try to guess what the output will be, and then run the code yourself. These examples may introduce some ideas and common pitfalls that were not explicitly covered in the text above, so be sure to complete this section.


In [ ]:
for i in range(1, 10, 2):
    print i

In [ ]:
for i in range (5, 1, -1):
    print i

In [ ]:
count = 0
while (count < 5):
    print count
    count = count + 1

In [ ]:
total = 0
for i in range(4):
    total = total + i
print total

In [ ]:
name = "Mits"
for letter in name:
    print letter

In [ ]:
name = "Wilfred"
newName = ""
for letter in name:
    newName = newName + letter
print newName

In [ ]:
name = "Wilfred"
newName = ""
for letter in name:
    newName = letter + newName
print newName