To read a file, you must first open a file. This returns a file handle which you can used to then get the contents of a file. If the file doesn't exist this will throw an error.
file_handle = open('filename.txt')
Once you are done with a file, you need to close it. Bad things can happen if you don't close your files, particularly on locking filesystems.
file_handle.close()
In [ ]:
# Run these to get some of the files we will be using today.
# These are the salaries of public workers in California from the website transparentcalifornia
# The last line is downloading a short story for the project
import urllib.request, urllib.parse, urllib.error
urllib.request.urlretrieve("http://transparentcalifornia.com/export/san-francisco-2014.csv", "san-francisco-2014.csv")
urllib.request.urlretrieve("http://transparentcalifornia.com/export/san-francisco-2013.csv", "san-francisco-2013.csv")
urllib.request.urlretrieve("http://www.gutenberg.org/cache/epub/1952/pg1952.txt", "theyellowwallpaper.txt")
In [ ]:
# Opening a file
fh = open('san-francisco-2014.csv')
print(fh)
fh.close()
In [ ]:
# Opening a non-existent file
fh = open('i_dont_exist.txt')
print(fh)
fh.close()
In [ ]:
A text file is just a sequence of lines, in fact if you read it in all at once it is returns a list of strings.
Each line is separated by the new line character "\n". This is the special character that is inserted into text files when you hit enter (or you can deliberately put it into strings by using the special \n syntax).
In [ ]:
print("Golden\nGate\nBridge")
In [ ]:
There are two common ways to read through the file, the first (and usually better way) is to loop through the lines in the file.
for line in file_handle:
print line
The second is to read all the lines at once and store as a string or list.
lines = file_handle.read() # stores as a single string
lines = file_handle.readlines() # stores as a list of strings (separates on new lines)
Unless you are going to process the lines in a file several times, use the first method. It uses way less memory which will be useful if you ever have big files
In [ ]:
fh = open('thingstodo.txt')
for line in fh:
print(line.rstrip())
fh.close()
In [ ]:
fh = open('thingstodo.txt')
contents = fh.read()
fh.close()
print(contents)
print(type(contents))
fh = open('thingstodo.txt')
lines = fh.readlines()
fh.close()
print(lines)
print(type(lines))
In [ ]:
In [ ]:
# Looking for a line that starts with something
# I want to see salary data of women with my first name
fh = open('san-francisco-2014.csv')
for line in fh:
if line.startswith('Charlotte'):
print(line)
fh.close()
In [ ]:
# Looking for lines that contain a specific string
fh = open('san-francisco-2014.csv')
# Looking for all the department heads
for line in fh:
# Remember if find doesn't find the string, it returns -1
if line.find('Dept Head') != -1:
print(line)
fh.close()
In [ ]:
# Counting lines that match criteria
fh = open('san-francisco-2014.csv')
num_trainees = 0
for line in fh:
# Remember if find doesn't find the string, it returns -1
if line.find('Trainee') != -1:
num_trainees += 1
fh.close()
print("There are {0} trainees".format(num_trainees))
In [ ]:
# Splitting lines, this is great for excel like data (tsv, csv)
# I want to see salary data of women with my name
fh = open('san-francisco-2014.csv')
for line in fh:
if line.startswith('Emily'):
cols = line.split(',')
print(cols)
# Salary is 3rd column
print(cols[1], cols[2], cols[-1])
fh.close()
* Note that sometimes you get a quoted line, instead of the title and salary. If a csv file has a comma inside a cell, the line is quoted. Thus, splitting is not the proper way to read a csv file, but it will work in a pinch. We'll learn about the csv
module as well as other ways to read in tabular (excel-like) data in the second half of the class.
In [ ]:
# Skipping lines
fh = open('thingstodo.txt')
for line in fh:
if line.startswith('Golden'):
continue
print(line)
fh.close()
In [ ]:
# Opening a non-existent file
try:
fh = open('i_dont_exist.txt')
print(fh)
fh.close()
except:
print("File does not exist")
#exit()
You can write to files very easily. You need to give open a second parameter 'w' to indicate you want to open the file in write mode.
fh_write = open('new_file.txt', 'w')
Then you call the write method on the file handle. You give it the string you want to write to the file. Be careful, write
doesn't add a new line character to the end of strings like print
does.
fh_write.write('line to write\n')
Just like reading files, you need to close your file when you are done.
fh_write.close()
In [ ]:
fh = open('numbers.txt', 'w')
for i in range(10):
fh.write(str(i) + '\n')
fh.close()
# Now let's prove that we actaully made a file
fh = open('numbers.txt')
lines = fh.readlines()
print(lines)
fh.close()
In [ ]:
You can use with to open a file and it will automatically close the file at the end of the with block. This is the python preferred way to open files. (Sorry it took me so long to show you)
with open('filename.txt') as file_handle:
for line in file_handle:
print line
# You don't have to close the file
In [ ]:
with open('thingstodo.txt') as fh:
for line in fh:
print((line.rstrip()))
You can also use with statements to write files
In [ ]:
with open('numbers2.txt', 'w') as fh:
for i in range(5):
fh.write(str(i) + '\n')
with open('numbers2.txt') as fh:
for line in fh:
print((line.rstrip()))
Refactor this code to use a with statement:
# Counting lines that match criteria
fh = open('san-francisco-2014.csv')
num_trainees = 0
for line in fh:
# Remember if find doesn't find the string, it returns -1
if line.find('Trainee') != -1:
num_trainees += 1
fh.close()
print "There are {0} trainees".format(num_trainees)
In [ ]:
We will calculate the average length of the first word in sentences in the short story "The Yellow Wallpaper" by Charlotte Perkins Gilman. (Feel free to use a different story, Project Gutenberg has many free ones. https://www.gutenberg.org/) This method works because in the text file, each sentence is on a separate line. If you are using another story, you may just want to go by paragraph or you can try spliting sentences on punctuation.
in
keyword):
In [ ]: