(fill in your two names here)
Facilitator: (fill in name)
Spokesperson: (fill in name)
Process Analyst: (fill in name)
Quality Control: (fill in name)
If there are only three people in your group, have one person serve as both spokesperson and process analyst for the rest of this activity.
At the end of this Lesson, you will be asked to record how long each Model required for your team. The Facilitator should keep track of time for your team.
File input and output (file i/o) is often the best way to have multiple programs work together to manipulate a large amount of data both in one large file and in many smaller files. For other applications, a scientific instrument might automatically store data to a file, which your program will need to be able to open, read, and analyze.
0. After checking that everyone in your group agrees on the same answers for the pre-activity (has the same file in the same format), share what other scientific data might be written to a file to be manipulated by a program. Write at least three examples here (besides DNA/RNA/protein data).
In the past, you may have felt limited in controlling the format of the Python output, especially the output of the print()
function. This model introduces the use of the format
method, which specifies a particular output format for a value. The { }
operators are used to substitute parts of a string with the corresponding arguments to the format method.
Type the following in separate Jupyter code cells:
print("Carbon =", 12, "grams/mole")
sentence = "Carbon = {:d} grams/mole"
print(sentence)
print(sentence.format(12))
print("Carbon = {:d} grams/mole".format(12))
print("Carbon = {:f} grams/mole".format(12))
print("Carbon = {:E} grams/mole".format(12))
print("{:d} moles of carbon = {:f} g".format(2,24.0))
volume = 22.8
print("This number, 123456789")
print("This number, {:6.0f}, has width of 6".format(volume))
print("This number, {:6.2f}, has width of 6 and a precision of 2".format(volume))
print("This number, {:8.4f}, lines up at the first digit".format(volume))
print(104/volume)
print("Density = mass {:.4f} / volume {:.3f} = {:0.2f}".format(104, volume, 104/volume))
NOTE: For each answer below be sure to provide justification for your answer (reason, specifying a particular line of code, example output…)
1. Examine how the format method is used in the above model:
1a. What data type must precede format
method?
1b. What does the argument of the format
method replace in the printed output?
1c. What data type is the variable volume
?
1d. Notice the format
method is being called, and its return value is being printed by the print
method. What is the return type of the format
method?
2. Now consider the {:d}
, {:f}
and {:E}
operators:
2a. Describe how each of the three operators will display a given number differently from the other operators.
2b. What is the default number of decimal places specified by the {:f}
or the {:E}
operators?
2c. Does the {:f}
operator round or truncate? Explain and provide evidence.
2d. When the {:d}
operator is used, can the format
method argument type be a float
?
3. How does the variable volume
display when formatted as {:6.0f}
, when volume = 22.8
? Explain both the 6 and the 0.
4. How many spaces are in front of the variable volume
formatted as {:6.0f
}, when volume = 22.8
?
5. Explain why the number of spaces in front of the variable volume
formatted as {:6.2f}
and {:8.4f}
are the same when volume = 22.8
?
6. If volume
were 2.8, how many spaces are in front of the value of volume
formatted as {:10.2f}
?
.format()
: https://docs.python.org/3.5/library/string.html#format-string-syntaxIn order for the computer to access the contents of a text file, you must first open the file and assign this object to a variable. The open function creates and returns a file object. In this model we will demonstrate how to write output to a file by calling methods on the file object.
Define model_two()
Run the next two Jupyter code cells to define the model_two()
function and execute it.
In [1]:
def model_two():
outfile = open("out.txt", "w")
outfile.write("Example ")
outfile.write("output ")
outfile.write("text file\n")
outfile.write("xyz Coordinates\n")
outfile.write("MODEL\n")
xyz = range(3)
outfile.write("ATOM {:3d}".format(1))
seq = "N {:5.1f}{:5.1f}{:5.1f}".format(xyz[0],xyz[1],xyz[2])
outfile.write(seq)
outfile.write("\n")
outfile.close()
In [2]:
model_two()
After running this module, you will notice that a new file out.txt appears in your current working folder. Use your new command line skills to open and view this file created by the model_two()
function. (OK, you can also just find it and double click it the easy way if you must).
7. How many arguments are passed to the open
function and what are their types?
8. What is the variable name of the file object returned by the open
function?
9. Identify the names of all methods
used on this file object in the model_two
function.
10. What data type (integer, string, etc.) does the write
method require for its argument?
11. Consider the text in the out.txt
file:
11a. How many times was the write
method called to create the first line of text?
11b. How many times was the write
method called to create the second line of text?
11c. What does the "\n"
character do?
11d. What type of program structure would be useful to write a pdb file with a large number of atom lines? Explain your answer.
When writing output to a file, there are two basic file modes:
Either mode will create the file automatically if it does not already exist.
Type the following in separate Jupyter code cells (it won't work correctly if you type them in a single Jupyter cell):
afile.write("new line\n")
afile = open("out.txt", "a")
afile.write("new line\n")
two = 2.0
afile.write(two)
afile.write(str(two))
afile.close()
afile.write("new line\n")
12a. Explain what happens as a result of this line of code:
afile = open("out.txt", "a")
12b. How do the arguments passed to the open
function differ for writing a new file and appending an existing file?
12c. What code would you run if you wanted to overwrite a file called stuff.txt
that already existed? (answer in text, no need to run code)
13. Explain the reason for the error observed after entering the command:
13a. the first time at the start of the model:
afile.write("new line\n")
13b. the same command, the last time at the end of the model.
14. Explain the reason for the error observed after entering the command:
afile.write(two)
In many cases your program may require input data from an external file source, (i.e. a pdb file). In this case we can use methods to “read” the contents of the file.
Type the following in separate Jupyter code cells:
infile = open("out.txt", "r")
infile.readline()
infile.readline()
infile.readlines()
infile.readline()
last = infile.readline()
last == ""
infile.close()
infile = open("out.txt", "r")
# these 2 lines need to go together in a cell
for i in range(3):
infile.readline()
line1 = infile.readline()
print(line1[0])
print(line1[0:5])
line1
line2 = line1.split()
print(line2[0])
print(line2)
infile.close()
15. Is it possible to read the same line twice with two sequential calls to the readline
method?
16. Consider the type of the return values for the three file methods used in this model:
16a. What data type (integer, string, etc.) does the readline
method return?
16b. What data type (integer, string, etc.) does the readlines
method return?
16c. What data type (integer, string, etc.) does the close
method return? How do you know?
17. Now consider line1
and line2
.
17a. How do the values of line1
and line2
differ?
17b. What are the data types of line1
and line2
?
18. Type code that reads past the last line of infile
.
18a. What happens if you don't open
the file again?
18b. What happens when we try to read past the end of the file?
18c. If we didn’t know how many lines there were in the file, what boolean condition could you use to read each line of the file (and then stop when you reach the end of the file)?
open()
function: https://docs.python.org/3.5/library/functions.html#open readline()
method (and readlines()
just below it): https://docs.python.org/3.5/library/io.html#io.IOBase.readline19. Write pseudocode for a function that will read in a PDB file and return three different lists corresponding to the x, y, and z atomic coordinates, respectively. The contents of each list should be floating point numbers for all atoms in the first model (only the first if there is more than one).
For example, the x-coordinate list for trpcage.pdb would be [11.030, 9.640, 8.650, 8.185, ...]
.
Do not proceed to the next step without showing your pseudocode to the instructor.
How much time did it require for your team to complete each Model?
Model 1:
Model 2:
Model 3:
Model 4: