This notebook builds upon the Python for Biologists and Software Carpentry materials.
A text file is a file that contains charaster or string data. Openning a text file in a text editor will display the file. Some exmaples of bioinformatics text documents include:
In contrast, many other files will be binary files – ones which are not made up of characters and lines, but of bytes. Examples include:
In [1]:
sequence_description = ">gb:AF333238|A/Brevig Mission/1/1918(H1N1)|Segment:8|Subtype:H1N1|Host:Human"
sequence = "ATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGCTTTCTTTGGCATGTCCGCAAACGGTTTGCAG\n\
ACCAAGAACTGGGTGATGCCCCATTCCTTGATCGGCTTCGCCGAGATCAGAAGTCCCTAAGAGGAAGAGG\n\
CAGCACTCTTGGTCTGGACATCGAGACAGCCACCCGTGCTGGAAAGCAGATAGTGGAGCGGATTCTGAAG\n\
GAAGAATCCGATGAGGCACTTAAAATGACCATTGCCTCTGTACCTGCTTCGCGCTACCTAACTGACATGA\n\
CTCTTGAGGAGATGTCAAGGGACTGGTTCATGCTCATGCCCAAGCAGAAAGTGGCAGGCTCTCTTTGTAT\n\
CAGAATGGACCAGGCGATCATGGATAAGAACATCATACTGAAAGCGAACTTCAGTGTGATTTTCGACCGG\n\
CTGGAGACTCTAATACTACTAAGGGCTTTCACCGAAGAGGGAGCAATTGTTGGCGAAATTTCACCATTGC\n\
CTTCTCTTCCAGGACATACTGATGAGGATGTCAAAAATGCAGTTGGGGTCCTCATCGGAGGACTTGAATG\n\
GAATGATAACACAGTTCGAGTCTCTGAAACTCTACAGAGATTCGCTTGGAGAAGCAGTAATGAGAATGGG\n\
AGACCTCCACTCCCTCCAAAACAGAAACGGAAAATGGCGAGAACAATTAAGTCAGAAGTTTGAAGAAATA\n\
AGATGGTTGATTGAAGAAGTGAGACATAGACTGAAGATAACAGAGAATAGTTTTGAGCAAATAACATTTA\n\
TGCAAGCCTTACAACTATTGCTTGAAGTGGAGCAAGAGATAAGAACTTTCTCGTTTCAGCTTATTTAA"
Lets say that we want to save the information above into a FASTA file. How do we do it?
In [ ]:
help(open)
So, these python commands are the same:
open("new_file.txt")
open("new_file.txt", 'r')
open(file="new_file.txt", mode='r')
In [3]:
# Now that we have the file name we will create a few file,
# which RETURNS a _file ahdnle_
my_file_connector = open("flu_seg_8.fasta", 'w')
In [38]:
# Now that we ahve a file handle we can write to the file handle
my_file_connector.write
Out[38]:
In [19]:
my_file_connector.write(sequence)
Out[19]:
In [20]:
len(sequence)
Out[20]:
What is the number that is returned? It is the number of characters written by the write() function.
In [27]:
# No lets close our file taht we have written to
my_file_connector.close()
In [ ]:
# %load flu_seg_8.fasta
>gb:AF333238|A/Brevig Mission/1/1918(H1N1)|Segment:8|Subtype:H1N1|Host:HumanATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGCTTTCTTTGGCATGTCCGCAAACGGTTTGCAG
ACCAAGAACTGGGTGATGCCCCATTCCTTGATCGGCTTCGCCGAGATCAGAAGTCCCTAAGAGGAAGAGG
CAGCACTCTTGGTCTGGACATCGAGACAGCCACCCGTGCTGGAAAGCAGATAGTGGAGCGGATTCTGAAG
GAAGAATCCGATGAGGCACTTAAAATGACCATTGCCTCTGTACCTGCTTCGCGCTACCTAACTGACATGA
CTCTTGAGGAGATGTCAAGGGACTGGTTCATGCTCATGCCCAAGCAGAAAGTGGCAGGCTCTCTTTGTAT
CAGAATGGACCAGGCGATCATGGATAAGAACATCATACTGAAAGCGAACTTCAGTGTGATTTTCGACCGG
CTGGAGACTCTAATACTACTAAGGGCTTTCACCGAAGAGGGAGCAATTGTTGGCGAAATTTCACCATTGC
CTTCTCTTCCAGGACATACTGATGAGGATGTCAAAAATGCAGTTGGGGTCCTCATCGGAGGACTTGAATG
GAATGATAACACAGTTCGAGTCTCTGAAACTCTACAGAGATTCGCTTGGAGAAGCAGTAATGAGAATGGG
AGACCTCCACTCCCTCCAAAACAGAAACGGAAAATGGCGAGAACAATTAAGTCAGAAGTTTGAAGAAATA
AGATGGTTGATTGAAGAAGTGAGACATAGACTGAAGATAACAGAGAATAGTTTTGAGCAAATAACATTTA
TGCAAGCCTTACAACTATTGCTTGAAGTGGAGCAAGAGATAAGAACTTTCTCGTTTCAGCTTATTTAA
In [25]:
my_file_connector.write(sequence_description + '\n')
my_file_connector.write(sequence)
Out[25]:
In [39]:
# We ahve closed the stream to our file
my_file_connector = open('flu_seg_8.fasta', 'w')
In [4]:
%load flu_seg_8.fasta
In Python, as in the physical world, we have to open a file before we can read what’s inside it. The Python function that carries out the job of opening a file is very sensibly called open. It takes one argument – a string which contains the name of the file – and returns a file object:
sequence_file = open('flu_seg_8.fasta")
In [40]:
# Type the above command here and run it
When learning to work with files it’s very easy to get confused between a file handle, a file name, and the contents of a file. Take a look at the following bit of code:
my_file_name = "flu_seg_8.fasta"
my_file_handle = open(my_file_name)
my_file_contents = my_file_handle.read()
In [29]:
# Type the above commands here and run it
What type of object is each variable above? Try:
type(my_file_name)
type(my_file_handle)
type(my_file_contents)
In [41]:
# Type the above command here and run it
A common error is to try to use the read method on the wrong thing. Recall that read is a method that only works on file objects. If we try to use the read method on the file name:
my_file_name = "flu_seg_8.fasta"
my_contents = my_file_name.read()
we’ll get an AttributeError – Python will complain that strings don’t have a read method3 :
AttributeError: 'str' object has no attribute 'read'
Another common error is to use the file object when we meant to use the file contents. If we try to print the file object:
my_file_name = "flu_seg_8.fasta"
my_file = open('/Users/squiresrb/Doc/BIOF309/week3/my_file_name')
print(my_file)
we won’t get an error, but we’ll get an odd-looking line of output:
<open file 'flu_seg_8.fasta', mode 'r' at 0x7fc5ff7784b0>
What happens if we try to read a file that doesn’t exist?
my_file = open("nonexistent.txt")
We get a new type of error that we’ve not seen before:
IOError: [Errno 2] No such file or directory: 'nonexistent.txt'
Ideally, we’d like to be able to check if a file exists before we try to open it. To do this we use
We now have almost everything we need to process all our data files. The only thing that’s missing is a library with a rather unpleasant name:
import glob
The glob library contains a function, also called glob, that finds files and directories whose names match a pattern. We provide those patterns as strings: the character * matches zero or more characters, while ? matches any one character. We can use this to get the names of all the CSV files in the current directory:
print(glob.glob('*'))
In [42]:
import glob
print(glob.glob('*.fasta'))
In [ ]: