Unit 2: Programming Design

Lesson 13: Data Files & File Input and Output (I/O)

Notebook Authors

(fill in your two names here)

Facilitator: (fill in name)
Spokesperson: (fill in name)
Process Analyst: (fill in name)
Quality Control: (fill in name)

If there are only three people in your group, have one person serve as both spokesperson and process analyst for the rest of this activity.

At the end of this Lesson, you will be asked to record how long each Model required for your team. The Facilitator should keep track of time for your team.

Computational Focus: File I/O

File input and output (file i/o) is often the best way to have multiple programs work together to manipulate a large amount of data both in one large file and in many smaller files. For other applications, a scientific instrument might automatically store data to a file, which your program will need to be able to open, read, and analyze.

Context Question

0. After checking that everyone in your group agrees on the same answers for the pre-activity (has the same file in the same format), share what other scientific data might be written to a file to be manipulated by a program. Write at least three examples here (besides DNA/RNA/protein data).

Model 1: Formatting Strings

In the past, you may have felt limited in controlling the format of the Python output, especially the output of the print() function. This model introduces the use of the format method, which specifies a particular output format for a value. The { } operators are used to substitute parts of a string with the corresponding arguments to the format method.

Type the following in separate Jupyter code cells:

print("Carbon =", 12, "grams/mole")
sentence = "Carbon = {:d} grams/mole"
print(sentence)
print(sentence.format(12))
print("Carbon = {:d} grams/mole".format(12))
print("Carbon = {:f} grams/mole".format(12))
print("Carbon = {:E} grams/mole".format(12))
print("{:d} moles of carbon = {:f} g".format(2,24.0))

volume = 22.8
print("This number, 123456789")
print("This number, {:6.0f}, has width of 6".format(volume))
print("This number, {:6.2f}, has width of 6 and a precision of 2".format(volume))
print("This number, {:8.4f}, lines up at the first digit".format(volume))
print(104/volume)
print("Density = mass {:.4f} / volume {:.3f} = {:0.2f}".format(104, volume, 104/volume))

Critical Thinking Questions

NOTE: For each answer below be sure to provide justification for your answer (reason, specifying a particular line of code, example output…)

1. Examine how the format method is used in the above model:
1a. What data type must precede format method?

1b. What does the argument of the format method replace in the printed output?

1c. What data type is the variable volume?

1d. Notice the format method is being called, and its return value is being printed by the print method. What is the return type of the format method?

2. Now consider the {:d}, {:f} and {:E} operators:
2a. Describe how each of the three operators will display a given number differently from the other operators.

2b. What is the default number of decimal places specified by the {:f} or the {:E} operators?

2c. Does the {:f} operator round or truncate? Explain and provide evidence.

2d. When the {:d} operator is used, can the format method argument type be a float?

3. How does the variable volume display when formatted as {:6.0f}, when volume = 22.8? Explain both the 6 and the 0.

4. How many spaces are in front of the variable volume formatted as {:6.0f}, when volume = 22.8?

5. Explain why the number of spaces in front of the variable volume formatted as {:6.2f} and {:8.4f} are the same when volume = 22.8?

6. If volume were 2.8, how many spaces are in front of the value of volume formatted as {:10.2f}?

More info on String Formatting

Model 2: Writing Output to a File

In order for the computer to access the contents of a text file, you must first open the file and assign this object to a variable. The open function creates and returns a file object. In this model we will demonstrate how to write output to a file by calling methods on the file object.

Define model_two()
Run the next two Jupyter code cells to define the model_two() function and execute it.


In [1]:
def model_two():
    outfile = open("out.txt", "w")
    outfile.write("Example ")
    outfile.write("output ")
    outfile.write("text file\n")
    outfile.write("xyz Coordinates\n")
    outfile.write("MODEL\n")
    xyz = range(3)
    outfile.write("ATOM {:3d}".format(1))
    seq = "N {:5.1f}{:5.1f}{:5.1f}".format(xyz[0],xyz[1],xyz[2])
    outfile.write(seq)
    outfile.write("\n")
    outfile.close()

In [2]:
model_two()

After running this module, you will notice that a new file out.txt appears in your current working folder. Use your new command line skills to open and view this file created by the model_two() function. (OK, you can also just find it and double click it the easy way if you must).

Critical Thinking Questions

Examine the code of the model_two() function...

7. How many arguments are passed to the open function and what are their types?

8. What is the variable name of the file object returned by the open function?

9. Identify the names of all methods used on this file object in the model_two function.

10. What data type (integer, string, etc.) does the write method require for its argument?

11. Consider the text in the out.txt file:
11a. How many times was the write method called to create the first line of text?

11b. How many times was the write method called to create the second line of text?

11c. What does the "\n" character do?

11d. What type of program structure would be useful to write a pdb file with a large number of atom lines? Explain your answer.

Model 3: Appending File Output

When writing output to a file, there are two basic file modes:

  • the append (“a”) mode will add data to the end of an existing file, while
  • the write (“w”) mode will overwrite the file.

Either mode will create the file automatically if it does not already exist.

Type the following in separate Jupyter code cells (it won't work correctly if you type them in a single Jupyter cell):

afile.write("new line\n")
afile = open("out.txt", "a")
afile.write("new line\n")
two = 2.0
afile.write(two)
afile.write(str(two))
afile.close()
afile.write("new line\n")

Critical Thinking Questions

12a. Explain what happens as a result of this line of code:

afile = open("out.txt", "a")

12b. How do the arguments passed to the open function differ for writing a new file and appending an existing file?

12c. What code would you run if you wanted to overwrite a file called stuff.txt that already existed? (answer in text, no need to run code)

13. Explain the reason for the error observed after entering the command:
13a. the first time at the start of the model:

afile.write("new line\n")

13b. the same command, the last time at the end of the model.

14. Explain the reason for the error observed after entering the command:

afile.write(two)

Model 4: Reading Input From A File

In many cases your program may require input data from an external file source, (i.e. a pdb file). In this case we can use methods to “read” the contents of the file.

Type the following in separate Jupyter code cells:

infile = open("out.txt", "r")
infile.readline()
infile.readline()
infile.readlines()
infile.readline()
last = infile.readline()
last == ""
infile.close()
infile = open("out.txt", "r")

# these 2 lines need to go together in a cell
for i in range(3):
    infile.readline()

line1 = infile.readline()
print(line1[0])
print(line1[0:5])
line1
line2 = line1.split()
print(line2[0])
print(line2)
infile.close()

Critical Thinking Questions

15. Is it possible to read the same line twice with two sequential calls to the readline method?

16. Consider the type of the return values for the three file methods used in this model:
16a. What data type (integer, string, etc.) does the readline method return?

16b. What data type (integer, string, etc.) does the readlines method return?

16c. What data type (integer, string, etc.) does the close method return? How do you know?

17. Now consider line1 and line2.
17a. How do the values of line1 and line2 differ?

17b. What are the data types of line1 and line2?

18. Type code that reads past the last line of infile.

18a. What happens if you don't open the file again?

18b. What happens when we try to read past the end of the file?

18c. If we didn’t know how many lines there were in the file, what boolean condition could you use to read each line of the file (and then stop when you reach the end of the file)?

More info on File I/O

Pseudocode

19. Write pseudocode for a function that will read in a PDB file and return three different lists corresponding to the x, y, and z atomic coordinates, respectively. The contents of each list should be floating point numbers for all atoms in the first model (only the first if there is more than one).
For example, the x-coordinate list for trpcage.pdb would be [11.030, 9.640, 8.650, 8.185, ...].
Do not proceed to the next step without showing your pseudocode to the instructor.

Temporal Analysis Report

How much time did it require for your team to complete each Model?

Model 1:

Model 2:

Model 3:

Model 4: