Reading and Writing Data

Basic I/O

Reading and writing to files is referred to as I/O (Input/Output) in some circles.

In Python the open function is used to open a file for reading and writing. The write function is used to write strings to the file. Finally the close function is used to close files.


In [1]:
f = open("basicOutput.txt", 'w') # Open/create the basicOutput.txt file for writing ('w')
f.write("Hello World\n") # Write the string to the basicOutput.txt file.
f.write("Goodbye World\n")
f.close() # Close the string to the file.

It's a good idea to close every file you open. It is recommended that you deal with files using the with keyword which ensures that the file is closed after use.


In [2]:
with open('withFile.txt', 'w') as f:
    f.write("A better way of opening a file.\n")

Opening a file with 'w' will overwrite the existing contents of a file. If you want to append more information to the file use 'a'.


In [3]:
with open("basicOutput.txt", 'a') as f:
    f.write("Are you... ")
    f.write("still here?\n")

You can read the contents of a file as a single string using the read function.


In [4]:
with open('basicOutput.txt', 'r') as f:  # Use 'r' for reading.
    contents = f.read()
    print(contents)


Hello World
Goodbye World
Are you... still here?

Notice how Are you... still here? is all on a single line even though it was written using two separate calls to write. This happened because the "Are you... " string didn't terminate in \n (known as the newline character).

You can also read each line of a file into a list directly.


In [5]:
with open('basicOutput.txt', 'r') as f:
    lines = f.readlines()
    print(lines)


['Hello World\n', 'Goodbye World\n', 'Are you... still here?\n']

or read them in a for loop


In [6]:
with open('basicOutput.txt', 'r') as f:
    for line in f:
        print(len(line))


12
14
23

This last method is useful when dealing with very large files. In such cases reading the whole file into Python may be very slow.

Comma Separate Values (CSV)

CSV files are a very basic way of of storing data. For example the following mock file has two columns, one for Time and another for Temperature. Each row represents an entry, with commas used to separate values.

Time (s), Temperature (K)
0,300
10,314
20,323 
30,331

Python has a csv library that can be used to read and write csv files. It's a very general purpose library and works in many different scenarios. For our purposes though it's more straightforward to use two Numpy functions np.savetxt and np.readtxt.


In [7]:
import numpy as np

time = [0, 10, 20, 30]
temperature = [300, 314, 323, 331]
zippedData = zip(time, temperature)

np.savetxt('temperatureData.csv', zippedData,
           delimiter=',', header="Time (s), Temperature (K)")
zippedData


Out[7]:
[(0, 300), (10, 314), (20, 323), (30, 331)]

The above writes the time and temperature data to a file, along with a header describing the data. We can verify this by opening and reading the file


In [8]:
with open("temperatureData.csv", 'r') as f:
    print(f.read())


# Time (s), Temperature (K)
0.000000000000000000e+00,3.000000000000000000e+02
1.000000000000000000e+01,3.140000000000000000e+02
2.000000000000000000e+01,3.230000000000000000e+02
3.000000000000000000e+01,3.310000000000000000e+02

The np.loadtxt function can be used to read in csv files


In [9]:
data = np.loadtxt("temperatureData.csv", delimiter=",", skiprows=1)
data


Out[9]:
array([[   0.,  300.],
       [  10.,  314.],
       [  20.,  323.],
       [  30.,  331.]])

The skiprows keyword tells loadtxt to skip the first line (which contains the header). It's also possible to read the columns in directly to variables.


In [10]:
timeFromFile, temperatureFromFile = np.loadtxt("temperatureData.csv", delimiter=",", skiprows=1,
                                        unpack=True)
timeFromFile, temperatureFromFile


Out[10]:
(array([  0.,  10.,  20.,  30.]), array([ 300.,  314.,  323.,  331.]))

Delimiters

The CSV format doesn't actually have a standard associated with it. This means that some csv files you encounter won't use commas as a separator for columns. The delimiter keyword in both the np.savetx and np.loadtxt can be used to set the separator used. Common separators include:

  • commas (',')
  • spaces (' ')
  • tabs ('\t')