Up until now, we dealt with very little data created by us or asked from user, and we stored them in lists or variables. In this lecture you will be introduced to file handling in Python.

After this lecture you will be able to;

  • Open/Close files
  • Read/Write files
  • Utilize text files in Python
  • Perform string processing in Python

Text File vs Binary File

A text file is a file containing characters, structured as individual lines of text. In addition to printable characters, text files also contain the nonprinting newline character, \n, to denote the end of each text line.

Text files can be directly viewed and created using a text editor.

In contrast, binary files can contain various types of data, such as numerical values, and are therefore not structured as lines of text. Such files can only be read and written via a computer program.

Handling Files

Fundamental operations of all types of files include opening a file, reading from a file, writing to a file, and closing a file. Next we discuss each of these operations when using text files in Python.

All files must first be opened before they can be used. In Python, when a file is opened, a file object is created that provides methods for accessing the file.

The open function opens the given file with r reading access.


In [2]:
input_file = open('sample.txt', 'r') # IOError occured because we should put the write true direction.


---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-2-dd7c263673c5> in <module>()
----> 1 input_file = open('file.txt', 'r')

IOError: [Errno 2] No such file or directory: 'file.txt'

In [11]:
input_file = open('data/sample.txt', 'r') # True direction, with folder inside.
print(input_file)
input_file.close()


<open file 'data/sample.txt', mode 'r' at 0x7ff40c1fa540>

When you try to print the variable created, you will not get what you want. It is just an object created to use in later statements.


To open a file the open function used with parameter 'w'


In [9]:
output_file = open('data/mynewfile.txt', 'w') # I used data/name because I want to save in data folder

We won't get any error because we are creating a new file only error we might get is harddisk full error from system. After we are done manipulating the file we have to close the file. with close() function.


In [10]:
output_file.close()

Reading Files

The readline method returns as a string the next line of a text file, including the end-of-line character, \n.When the end-of-file is reached, it returns an empty string:


In [12]:
input_file = open('data/sample.txt', 'r')
empty_str = ''
line = input_file.readline() 
while line != empty_str:
    print(line)
    line = input_file.readline()

input_file.close()


Line one

Line two

Line three

I used while loop to show the logic behind the reading, however for loop gives us a more elegant way.


In [13]:
input_file = open('data/sample.txt', 'r')
for line in input_file:
    print(line)


Line one

Line two

Line three

Writing Files

The write method is used to write strings to a file:


In [16]:
empty_str= ''
input_file = open('data/sample.txt', 'r')
output_file = open('data/newfile.txt', 'w')
line = input_file.readline()

while line != empty_str:
    output_file.write(line)
    line = input_file.readline()
    
output_file.close()

The write method does not add a newline character to the output string . Thus, a newline character will be output only if it is part of the string being written. But in the example above line variable comes with \n at the end.

String Processing

The information in a text file, as with all information, is most likely going to be searched, analyzed, and/or updated. Collectively, the operations performed on strings is called string processing.

We have learned some basic operations on strings such as

  • accessing elements: name[k]
  • getting the length: len(str)

String Traversal: The characters in a string can be easily traversed, without the use of an explicit index variable, using the for chr in string form of the for statement.


In [17]:
space = ' '
num_spaces = 0
line = input_file.readline()
for k in range(0, len(line)):
    if line[k] == space:
        num_spaces = num_spaces + 1

In [18]:
num_spaces


Out[18]:
0

There are a number of methods specific to strings in addition to the general sequence operations.

Checking the Contents of a String


In [19]:
s = 'Hello World!'

str.isalpha(): Returns True if str contains only letters


In [21]:
s.isalpha() #


Out[21]:
False

str.isdigit() Returns True if str contains only digits.


In [22]:
s.isdigit()


Out[22]:
False

In [26]:
"1".isdigit()


Out[26]:
True

str.islower() and str.isupper() : Returns True if str contains only lower/upper case letters


In [27]:
s.islower()


Out[27]:
False

In [28]:
s


Out[28]:
'Hello World!'

In [29]:
s.isupper()


Out[29]:
False

In [31]:
"HELLO WORLD".isupper()


Out[31]:
True

str.lower() and str.upper(): Returns lower/upper case version of str


In [32]:
s


Out[32]:
'Hello World!'

In [33]:
s.upper()


Out[33]:
'HELLO WORLD!'

In [34]:
s.lower()


Out[34]:
'hello world!'

In [36]:
s # Does not change... You have to assign it to an new variable or overwrite


Out[36]:
'Hello World!'

In [37]:
s = s.lower()

In [38]:
s


Out[38]:
'hello world!'

Searching the Contents of a String

str.find(w): Returns the index of the first occurrence of w in str, Returns -1 if not found


In [45]:
s


Out[45]:
'hello world!'

In [41]:
s.find('d')


Out[41]:
10

In [43]:
s.find('x')


Out[43]:
-1

Replacing the Contents of a String

str.replace(w,t): Returns a copy of str wita ll occurrences of w replaced with t.


In [46]:
s


Out[46]:
'hello world!'

In [51]:
s.replace("l", "*")


Out[51]:
'he**o wor*d!'

Removing the Contents of a String

str.strip(w): Returns a copy of str with all leading and trailing characters that appear in w removed.


In [55]:
s


Out[55]:
'hello world!'

In [57]:
s.strip('!')


Out[57]:
'hello world'

Splitting a String

str.split(w): Returns a list containing all strings in str delimited by w:


In [58]:
s


Out[58]:
'hello world!'

In [59]:
s.strip('!').split(" ")


Out[59]:
['hello', 'world']

In [62]:
s[:-4]


Out[62]:
'hello wo'

Apply It!

Write a program the removes all occurrences of the letter ‘e’ from a text file. To be able to get the text file copy paste a paragraph from internet into a file and use that file as a text file. Output should be similar to this:

    This program will display the contents of a provided text file
    with all occurrences of the letter 'e' removed
    Enter file name (including file extension): data/totc_1.txt


    Th Priod

    It was th bst of tims, it was th worst of tims,
    it was th ag of wisdom, it was th ag of foolishnss,
    it was th poch of blif, it was th poch of incrdulity,
    it was th sason of Light, it was th sason of Darknss,
    it was th spring of hop, it was th wintr of dspair,
    w had vrything bfor us, w had nothing bfor us,
    w wr all going dirct to Havn, w wr all going dirct
    th othr way--in short, th priod was so far lik th prsnt
    priod, that som of its noisist authoritis insistd on its
    bing rcivd, for good or for vil, in th suprlativ dgr
    of comparison only.

    Thr wr a king with a larg jaw and a qun with a plain fac,
    on th thron of ngland; thr wr a king with a larg jaw and
    a qun with a fair fac, on th thron of Franc.  In both
    countris it was clarr than crystal to th lords of th Stat
    prsrvs of loavs and fishs, that things in gnral wr
    sttld for vr.


    379 occurrences of the letter 'e' removed
    Percentage of data lost: 6%
    Modified text in file data/totc_1_e.txt