Reading and writing files is one of the essential functions of computers as this is the connection between volatile data in the computer's memory and permanent data on disk, in the cloud.
The three forms of reading text files are shown, and we'll illustrate handling its contents. We'll take the README.md file as an example as it is readily available on this site.
First steps are finding the file in the directory structure. We'll start with an illutration how this can be done using some functions in the os module, which deal with directories.
In [1]:
import os
In [2]:
os.listdir() # make a list of the files in the current directory, so that we may handle them.
Out[2]:
Turn the relative refference of the current directory, '.' into an absolute path.
Then specify the file name that we're looking for.
And in a while-loop look upward in an ever higher diectory in the directory tree until we found that with our file.
Walking upward in the tree is done by cutting off the tail of the path in each step.
In [3]:
apth = os.path.abspath('.')
fname = 'README.md'
print("Starting in folder <{}>,\n to look for file <{}> ...".format(apth, fname))
print()
print("I'm searching: ...")
while not fname in os.listdir(apth):
apth = apth.rpartition(os.sep)[0]
print(apth)
if fname in os.listdir(apth):
print("... Yep, got'm!")
else:
print("... H'm, missed him")
print("\nOk, file <{}> is in folder: ".format(fname), apth)
print('\nHere is the list of files in this folder:')
os.listdir(apth)
Out[3]:
When we know where the file is, we can open it for reading.
We have to open it, which yields a reader object, by which we can read the file.
reader = open(path, 'r')
s = reader.read()
reader.close()
Problem with this, is that when we are exploring the reader, we may easily reach the end of file after which nothing more is read and s is an empty string. Furthermore, when we experiment, we may easily open the same file many times and forget to close it.
The with statement is a solution to that, because it automatically closes the file when we finish its block.
With the with statement we may read the entire file into a string like so.
In [4]:
with open(os.path.join(apth, fname), 'r') as reader:
s = reader.read()
It's the read that swallows the entire file at once and dumps its contents in the string s.
Check if the reader in, indeed, closed after we finished the with block:
In [5]:
reader.closed
Out[5]:
Then show the contenct of the sring s:
In [6]:
print(s)
In [7]:
print("There are {} words in file {}".format(len(s.split(sep=' ')), fname))
We might estimate the number of sentences by counting the number of periods '.'
One way is to use the . as a separator:
In [8]:
nPhrases = len(s.split(sep='.')) # also works without the keyword sep
print("We find {} phrases in file {}".format(nPhrases, fname))
We could just as wel count the number of dots in s directly, using one of the string methods, in this case s.count()
In [9]:
print("There are {} dots in file {}".format(s.count('.'), fname))
In [14]:
s1 = "".join(s.split()).lower() # also make all letters in lowerface()
print(s1)
In [15]:
list(s1)
Out[15]:
In [16]:
set(s1)
Out[16]:
To count the frequency of each character we could use those from the set as keys in a dict. We can generate the dict with the frequency if each character in a dict comprehension that combines the unique letter as a key with the method count(key) applied on s1, the string without whitespace:
In [17]:
ccnt = {c : s1.count(c) for c in set(s1)}
pprint(ccnt)
Lets order the letters after their frequency of occurrence in the file:
We can do so in one line, but this needs some explanaion.
First we generate a list from the dict in which each item is a list of 2 itmes namely [char, number]
Second we apply sorted on that list to get a sorted list. But we don't want it to be sorted based on the character, but based on the number. Therfore, we use the key argument. It tels that each item has to be compared on the second value (lambda x: x[1]).
Finally, this yields the list that we want, but with the largest frequency at the bottom. So we turn this list upside down by using the slice [::-1] at the end.
Here it is:
In [159]:
sorted([[k, ccnt[k]] for k in ccnt.keys()], key=lambda x: x[1])[::-1]
Out[159]:
In [18]:
with open(os.path.join(apth, fname), 'r') as reader:
s = reader.readlines()
type(s)
Out[18]:
In [19]:
pprint(s)
From this point onward, you can analyse each line in sequence, pick out lines, etc.
Often you don't want to read the entire file into memory (into a single character) at once. It might blow up the computer's memory if the file size were gigabits, as can easily the case with output of some models. And if it wouldn't crash the memory, your pc may still become very slow with large files. So a better and more generally applied way to read in a file is line by line, based on the newline characters that are embedded in them.
In that case you can read the file in line by line, one at a time, not using reader.read() or reader.readlines() but reader.readline()
In [191]:
with open(os.path.join(apth, fname), 'r') as reader:
s = reader.readline()
type(s)
Out[191]:
In [192]:
print(s)
Which yields a string, the first string of the file in this case.
The problem is now, that no more lines can be read from this file, because with the with statement, the file closes automatically as soon as the python reaches the end of its block:
In [193]:
s = reader.readline()
Therefore, we should not use the with statement and hand-close the file when we're done, or put anything that we do with the strings that we read inside the with block.
We may be tempted to put the reader in a while-loop like so
s=[] while True: s.append(reader.readline())
But don't do that, becaus the while-loop will never end
In [194]:
with open(os.path.join(apth, fname), 'r') as reader:
lines = []
while True:
s = reader.readline()
if s=="":
break
lines.append(s)
pprint(lines)
In [195]:
reader.readline?
Of course, there is much, much more, but this is probably the most important base knowledge about file reading. File writing of textfile is straightforward. You open a file with open( fname, 'w') for writing or open(fname,'a') for appending and you can start writing lines to it. Don't forget to close it when done. Still better, use the with statement to make sure that the file is automatically closed when its block is done.
In [ ]: