Reading (and Writing) Files (without Pandas)

This notebook is going to help introduce how to work with text files in Python without relying on the underlying magic of Pandas. In addition to being interesting and useful in many cases, understanding what is happening behind the scenes inside Pandas can help you identify what may be going wrong if you run into trouble while reading csv or txt files.

We will proceed as follows:

  • First, we will show how to read a file into Python.
  • Secondly, we will show how to interact (read/write) with its contents.
  • Finally, we will show how this can be useful in getting and cleaning data from the internet

Interacting with Files in Python

Python uses the function open to open a file so that Python is able to see it. Many of the things that you are able to do to the file correspond directly to what you might do if you had opened the file in a text editor or in Excel. When you open a file, you will need to specify the level of access that you need to the file. Python has the following access levels:

  • Reading (r): Open the file with only enough permissions to read it. This is the default level of permission and will probably be the most used
  • Writing (w or a): Open the file with enough permissions for me to change the file.
  • Creating and Writing (x): Create a new file with the specified name and open it for me with write access.

When you finish interacting with a file, it is important to remember to close it so that you don't accidentally do anything to the file after you're done with it. A workflow for interacting with a file should look like this:

f = open('myfile.txt', 'r')

# Do Stuff to file

f.close()

We will give an example of how to use each of these permissions.

Reading

We will first illustrate how we can read a file. When a file is read by Python, it brings the file in with a variety of methods. Our typically use will be to read a file line-by-line, so that is how we will start.

Python will allow us to iterate over the lines of a file within a for-loop. We illustrate this below.


In [12]:
f = open('filespython.txt', 'r')

for line in f:
    print(line)
    
f.close()


First line of file

Second line of file

Third line of file

Last line of file

The output of each iteration of the for-loop is a string which contains the entire line of the file.

Writing

We gave two letters for writing access: a and w. The difference between these two lies in where anything we write to the file will be placed. If we use w then anything in the file will be deleted when we write new material to the file. If we use a (for append) then anything written will be placed at the end of the file. We can illustrate this below


In [15]:
f = open('filespython.txt', 'w')
f.write('This is another line\n')
f.close()

f = open('filespython.txt', 'r')
print(f.read())
f.close()


This is another line

Notice that all of the file contents that were printed before ("First line of file", "Second line of file", etc...) were deleted because we used w when we wrote. Notice what happens when we simply append to our file.


In [17]:
f = open('filespython.txt', 'a')
f.write('This is another line\n')
f.close()

f = open('filespython.txt', 'r')
print(f.read())
f.close()


This is another line
This is another lineThis is another line

Notice that when we used a as our permissions that it simply added new text to the end of the file when we wrote. We will now return the file to its original state so that we can run this file again.


In [19]:
# Open file
f = open('filespython.txt', 'w')

# Will use this string in each line so create it first
lof = " line of file\n"
for currline in ["First", "Second", "Third", "Last"]:
    f.write(currline + lof)

f.close()

f = open('filespython.txt', 'r')
print(f.read())
f.close()


First line of file
Second line of file
Third line of file
Last line of file

Creating and Writing

The steps to do this are very similar to writing it just allows you to create the file that you want to write to. There isn't much use in repeating exactly what we just did except putting an x where the a or w was.

Requests


In [ ]: