COP3990C - Python Programming



In [2]:

    
# this block is just for the style sheet for the notebook
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[2]:



In [4]:

    
# file seperator - your program needs to be multiplatfom
import os

file_name = 'data' + os.sep + 'test'
print file_name
fin = open(file_name,'rb')
lines = fin.readlines()
fin.close()









    



data\test



In [5]:

    
print lines









    



[' Volume in drive C has no label.\r\n', ' Volume Serial Number is 92E0-AA74\r\n', '\r\n', ' Directory of C:\\Users\\User\\Dropbox\\UWF\\spring2014\\Python\\UWF_2014_spring_COP3990C-2507\\notebooks\\data\r\n', '\r\n', '01/29/2014  03:02 PM    <DIR>          .\r\n', '01/29/2014  03:02 PM    <DIR>          ..\r\n', '01/27/2014  05:07 PM                29 csv_file.csv\r\n', '01/27/2014  05:13 PM               565 csv_file2.csv\r\n', '01/26/2014  10:06 AM               567 csv_file3.csv\r\n', '01/27/2014  05:13 PM                33 csv_out.csv\r\n', '01/27/2014  05:18 PM               115 csv_output_2.csv\r\n', '01/27/2014  05:04 PM               105 out_file.txt\r\n', '01/29/2014  03:02 PM                 0 test\r\n', '01/27/2014  04:31 PM               807 test-linux\r\n', '01/27/2014  04:22 PM               140 test.py\r\n', '01/26/2014  09:18 AM                45 text_file2.txt\r\n', '              10 File(s)          2,406 bytes\r\n', '               2 Dir(s)  312,003,645,440 bytes free\r\n']



In [6]:

    
#file_name = 'data' + os.sep + 'test'
fin = open(r'data\test','r')
lines = fin.readlines()
fin.close()



In [7]:

    
lines









    Out[7]:





[' Volume in drive C has no label.\n',
 ' Volume Serial Number is 92E0-AA74\n',
 '\n',
 ' Directory of C:\\Users\\User\\Dropbox\\UWF\\spring2014\\Python\\UWF_2014_spring_COP3990C-2507\\notebooks\\data\n',
 '\n',
 '01/29/2014  03:02 PM    <DIR>          .\n',
 '01/29/2014  03:02 PM    <DIR>          ..\n',
 '01/27/2014  05:07 PM                29 csv_file.csv\n',
 '01/27/2014  05:13 PM               565 csv_file2.csv\n',
 '01/26/2014  10:06 AM               567 csv_file3.csv\n',
 '01/27/2014  05:13 PM                33 csv_out.csv\n',
 '01/27/2014  05:18 PM               115 csv_output_2.csv\n',
 '01/27/2014  05:04 PM               105 out_file.txt\n',
 '01/29/2014  03:02 PM                 0 test\n',
 '01/27/2014  04:31 PM               807 test-linux\n',
 '01/27/2014  04:22 PM               140 test.py\n',
 '01/26/2014  09:18 AM                45 text_file2.txt\n',
 '              10 File(s)          2,406 bytes\n',
 '               2 Dir(s)  312,003,645,440 bytes free\n']



In [8]:

    
len(lines[0])









    Out[8]:





33

Types of files we'll look at

In this course we'll look at the following file types:

plain text

comma separated values (CSV)

binary

pickled

panda's frames, hdf5, Matlab mat files, etc ...

</ul> Later we'll use numpy functions to read text/csv/binary files into arrays and data structures directly

Plain Text Files

Data can be stored in plain text files with arbitrary delimiters (space, tabs, commas). Data does not need to be organized into an equal number of elements per "record" or line. Plain text files (also CSV files) are structured as a set of lines terminated by end of line characters, '\n'. We don't need to import any libraries to start working with files. The following functions are used frequently:

open(). This created a file object (handle) that makes the files contents accessible to the program. A file can be opened in different modes. The default mode is "read" ('r'). Other modes are "write" ('w'), "append" ('a'), "read binary" ('rb'), "write binary" ('wb'), and "read and write" ('r+')
close() . This function closes the access to the file. If you forget to issue this command Python will take care of the garbage collection and close the stream. It's however strongly encouraged to close the stream yourself when you're done reading the necessary data.
read(). This function reads the contents of the file as one (huge) string. You can also specify the number of characters to read by passing that number to the function.
read(n) . This reads $n$ bytes from the file.
readline(). This functions reads the contents of a file one line at a time as a string. Remember that lines are terminated by a the end of line character '\n'.
readlines(). This function reads the whole file as a list of lines as strings.
write(): writes one string at a time to the file.
seek(offset, initial location or refernce point): moves the pointer by the specified offset from the initial location. The initial location or the reference point can take on 3 values: 0 - the beginning of the file, 1 - the current location, 2 - the end of the file.
tell() : tell the location of the pointer in the file.

Examples



In [9]:

    
# open an existing  file for reading

import os # needed for the file separation

file_name = 'data' + os.sep + 'text_file2.txt'
print file_name
my_file = open(file_name, 'rb')

# read the content of the whole file then close
file_content = my_file.read()
my_file.close()

# print the conent of the file
print file_content









    



data\text_file2.txt
123456789
123456789!
123456789*
123456789-



In [10]:

    
my_file = open(file_name, 'rb')



In [11]:

    
# read 10 bytes
characters = my_file.read(10)
print 'the first 10 bytes:', characters









    



the first 10 bytes: 123456789



In [12]:

    
# read the next 10 bytes - recall that the end of line characted is counted as 
# part of the string
characters = my_file.read(10)
print 'the second 10 bytes', characters
ord_list = [ord(x) for x in characters]
print 'length of list is: ', len(ord_list)
print 'list is ', ord_list









    



the second 10 bytes 
123456789
length of list is:  10
list is  [10, 49, 50, 51, 52, 53, 54, 55, 56, 57]



In [13]:

    
characters = my_file.read(3)
print [ord(x) for x in characters]









    



[33, 13, 10]



In [14]:

    
# the pointer's location after reading 20 bytes
print 'the pointer is at poistion ', my_file.tell()









    



the pointer is at poistion  23



In [15]:

    
# read the next byte
print 'the 24tht byte is ', my_file.read(1)









    



the 24tht byte is  1



In [16]:

    
# move to the beginning of the file by using the seek function
my_file.seek(0,0)



In [17]:

    
# skip 20 bytes (charactes) and read the one right after
print 'after seeking to the beginning of the file we are at location:', my_file.tell()
my_file.seek(20, 1)
print 'the 20th byte is:', my_file.read(1)









    



after seeking to the beginning of the file we are at location: 0
the 20th byte is: !



In [18]:

    
# skip the first 3 bytes from the beginning of the file
my_file.seek(3, 0)
print 'i am @ ', my_file.tell()









    



i am @  3



In [19]:

    
# read the 4th byte
print 'the 4th byte is ', my_file.read(1)
print 'we are @ : ', my_file.tell()









    



the 4th byte is  4
we are @ :  4



In [20]:

    
# skip 10 byte from this location
my_file.seek(10, 1)



In [21]:

    
# get the current location and the character there
print 'we are at ', my_file.tell()
print my_file.read(1)









    



we are at  14
4



In [22]:

    
# read the rest of the file from this point on and print it
print 'the rest of the file is:\n', my_file.read()









    



the rest of the file is:
56789!
123456789*
123456789-



In [23]:

    
# read the last 3 bytes before the end of the file
my_file.seek(-3, 2)
print my_file.read()

89-



In [24]:

    
# read a byte now that we've reached the end of the file
print 'the charachter at the end of the file is ', my_file.read()









    



the charachter at the end of the file is



In [25]:

    
# close the file
my_file.close()



In [26]:

    
# now let's read one line at a time

# open an existing  file for reading
file_name = 'data' + os.sep + 'text_file2.txt'
my_file = open(file_name, 'rb')

# read the first line
first_line = my_file.readline()
print first_line

# read the second line
second_line = my_file.readline()
print second_line

# close file
my_file.close()



In [27]:

    
# note: when readline() is called, a line is read and the "pointer" is 
# moved to the next line within the file; reading is done sequentially. 
# So if you want the third line you have to go through the first two and ignoring them.

# open an existing  file for reading
file_name = 'data' + os.sep + 'text_file2.txt'
my_file = open(file_name, 'r')

# read the first and second lines and ignore them - this can go in a loop if you like
line = my_file.readline()
line = my_file.readline()

# read the third line
line = my_file.readline()
print line

# close the file
my_file.close()



In [28]:

    
# read the whole file as a list contating each line as an element


# open an existing  file for reading
file_name = 'data' + os.sep + 'text_file2.txt'
my_file = open(file_name, 'rb')

# read the file as a list and close
data = my_file.readlines()
my_file.close()


# show the list - notice the end of line character in each element
print data









    



['123456789\r\n', '123456789!\r\n', '123456789*\r\n', '123456789-']



In [29]:

    
# loop though the list and print the lines - the EOL character will be printed as a 
# new line so in the addition to the new line provided by the print statement we also 
# get one from  the string itself
for line in data:
    print line



In [30]:

    
# loop though the list and print the lines without the extra EOL charachter
for line in data:
    print line.strip()



In [31]:

    
write some data to a file - if the file does not exit then it gets created 
# otherwise it gets overwritten
out_file = 'data' + os.sep + 'out_file.txt'
test_out = open(out_file, 'w')









    



  File "<ipython-input-31-657fa0b8a45a>", line 1
    write some data to a file - if the file does not exit then it gets created
    ^
IndentationError: unexpected indent



In [32]:

    
# define a string
str1 = 'my first line'
str2 = 'my second line'

# write the strings
test_out.write(str1)
test_out.write(str2)

# close the file
test_out.close()









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-c2d983a6b2e3> in <module>()
      4 
      5 # write the strings
----> 6 test_out.write(str1)
      7 test_out.write(str2)
      8 

NameError: name 'test_out' is not defined



In [33]:

    
# if you open the file you'd see that the lines are written back to back with out 
# an EOL character - to avoid this ammend a EOL to the string
out_file = 'data' + os.sep + 'out_file.txt'
test_out = open(out_file, 'w')

# define a string
str1 = 'my first line\n'
str2 = 'my second line'

# write the strings
test_out.write(str1)
test_out.write(str2)

# close the file
test_out.close()



In [34]:

    
# another way to write the two lines
out_file = 'data' + os.sep + 'out_file.txt'
test_out = open(out_file, 'w')

# define a string
str1 = 'my first line'
str2 = 'my second line'

# write the strings
test_out.write(str1 + '\n' +  str2)

# close the file
test_out.close()



In [35]:

    
# append a line to the file
out_file = 'data' + os.sep + 'out_file.txt'
test_out = open(out_file, 'a')

# define a string
str1 = '\nmy 3rd line\n'
str2 = 'my 4th line'

# write the strings
test_out.write(str1 + str2)

# close the file
test_out.close()



In [36]:

    
# another way to open and parse a file is by using the "with" keyword. the 
# advanatage of this is that it handles file closing automatically.

with open(out_file, 'rb') as fin:
    read_data = fin.readlines()

    
# no closing the file as we did above
print read_data

for line in read_data:
    print line.strip().split()









    



['my first line\r\n', 'my second line\r\n', 'my 3rd line\r\n', 'my 4th line']
['my', 'first', 'line']
['my', 'second', 'line']
['my', '3rd', 'line']
['my', '4th', 'line']

CSV files

CSV files are popular since they can be open in a spreadsheet or a text editor. They are organized in a table format. Here we'll work with rectangular csv files (no data missing) and later we'll work with general text files where some data is missing, data and comments are mixed together, where headers are part of the file to let the user know what kind of data he/she is processing.

CSV file access functions

csv.reader
csv.writer
csv.DictReader
csv.DictWriter
csv.register_dialect
csv.unregister_dialect
csv.get_dialect
csv.list_dialects
csv.field_size_limit



In [37]:

    
# create a csv file the old fashioned way (just like another text file)


# file name - look for it in the data folder
import os
csv_file = 'data' + os.sep + 'csv_file.csv'

# define lines
line1 = 'a,b,c,d,e\n'
line2 = 'f,g,h,i,j\n'
line3 = 'h,i,j,k,l'

# write the data
with open(csv_file, 'wb') as fout:
    fout.write(line1 + line2 + line3)



In [38]:

    
# let's read the file
import csv

with open(csv_file, 'rb') as fin:
    csv_reader = csv.reader(fin)
    
    for row in csv_reader:
        print row









    



['a', 'b', 'c', 'd', 'e']
['f', 'g', 'h', 'i', 'j']
['h', 'i', 'j', 'k', 'l']



In [39]:

    
# this shows the difference between reading the file using the previous methods
# we learned above and the csv way
with open(csv_file, 'rb') as fin:
    read_data = fin.readlines()
  
# no closing the file as we did above
print read_data









    



['a,b,c,d,e\n', 'f,g,h,i,j\n', 'h,i,j,k,l']



In [40]:

    
# as you can we we got one list with each row as a string element. 
# this is not what we want from a csv file.



In [41]:

    
# let's read the file and store the data in a list the persists after the file
# is closed
data_list = []
with open(csv_file, 'rb') as fin:
    csv_reader = csv.reader(fin)
    
    for row in csv_reader:
        data_list.append(row)



In [42]:

    
# show the data
print data_list









    



[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j'], ['h', 'i', 'j', 'k', 'l']]



In [43]:

    
# here is what we have when we reverse the list - we're going to write 
# this back to another file

for row in reversed(data_list):
    print row









    



['h', 'i', 'j', 'k', 'l']
['f', 'g', 'h', 'i', 'j']
['a', 'b', 'c', 'd', 'e']



In [44]:

    
# let's write the data to a file in reverse order for the list and for each 
# element in the list

import csv
import os

csv_out_file = 'data' + os.sep + 'csv_out.csv'

fout = open(csv_out_file, 'wb')

csv_writer = csv.writer(fout)

for row in reversed(data_list):
    csv_writer.writerow(row)
    
fout.close()



In [45]:

    
# here is another example
# creat another file with headers

# file name - look for it in the data folder
csv_file = 'data' + os.sep + 'csv_file2.csv'

# for generating random numbers
import random


# define a random 2d array
random_nums = [[str(random.random()) for _ in range(6)] for _ in range(6)]

# generate a string with all the numbers and new line characters
line = 'lon,lat,alt,roll,pitch,yaw\n'
for row in random_nums:
    line += ','.join(row)+'\n'


# strip the last EOL character 
line = line.strip() # line = line[:-1] is another way of doing it   


# write the file
with open(csv_file, 'wb') as fout:
    fout.write(line)



In [46]:

    
# now that we have the file, we can read it in one swoop into a dictionary


import csv

# the csv file we want to read
csv_file = 'data' + os.sep + 'csv_file2.csv'


# read the file into a dictionary
fin = open(csv_file, 'rb')
csv_data = csv.DictReader(fin, delimiter=',')

# let's see how the structure looks
for line in csv_data:
    print line
    

    
# check if the file is closed    
print fin.closed

# close it
fin.close()

# check again
print fin.closed

# let's see of we can access the csv_data again
for line in csv_data:
    print line









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-c3fea55e0bea> in <module>()
     28 
     29 # let's see of we can access the csv_data again
---> 30 for line in csv_data:
     31     print line

C:\Python27\lib\csv.pyc in next(self)
    102             # Used only for its side effect.
    103             self.fieldnames
--> 104         row = self.reader.next()
    105         self.line_num = self.reader.line_num
    106 

ValueError: I/O operation on closed file





    



{'yaw': '0.528824226743', 'lon': '0.263092270025', 'pitch': '0.148076408617', 'lat': '0.662308453532', 'alt': '0.332629503006', 'roll': '0.1783775897'}
{'yaw': '0.238819730976', 'lon': '0.867796935112', 'pitch': '0.56615998845', 'lat': '0.371069637451', 'alt': '0.849238153107', 'roll': '0.305663415603'}
{'yaw': '0.690815139565', 'lon': '0.362820091896', 'pitch': '0.340039066539', 'lat': '0.181811986991', 'alt': '0.864088615179', 'roll': '0.65358821448'}
{'yaw': '0.867859846349', 'lon': '0.0257943331321', 'pitch': '0.7771720543', 'lat': '0.224272358026', 'alt': '0.558781266787', 'roll': '0.553860900387'}
{'yaw': '0.858457154515', 'lon': '0.319157113471', 'pitch': '0.839665099121', 'lat': '0.935834737188', 'alt': '0.485786356564', 'roll': '0.891366550168'}
{'yaw': '0.923332637644', 'lon': '0.172463762615', 'pitch': '0.0590637085953', 'lat': '0.268639628313', 'alt': '0.591748494576', 'roll': '0.876590865835'}
False
True



In [47]:

    
# as you can see above, the DictReader parsed the file and organized the data 
# into a dictionary per line with the header as the key and the data as the value
# however this data is live, which means it's a pointer to the file only so as soon
# as the file is closed we lose the data
# it also means that once we loop to the structure once we can access the data
# again
# read the file into a dictionary
fin = open(csv_file, 'rb')
csv_data = csv.DictReader(fin, delimiter=',')

# let's see how the structure looks
for line in csv_data:
    print line
    
# this will give us nothing
for line in csv_data:
    print line['yaw']

    
fin.close()









    



{'yaw': '0.528824226743', 'lon': '0.263092270025', 'pitch': '0.148076408617', 'lat': '0.662308453532', 'alt': '0.332629503006', 'roll': '0.1783775897'}
{'yaw': '0.238819730976', 'lon': '0.867796935112', 'pitch': '0.56615998845', 'lat': '0.371069637451', 'alt': '0.849238153107', 'roll': '0.305663415603'}
{'yaw': '0.690815139565', 'lon': '0.362820091896', 'pitch': '0.340039066539', 'lat': '0.181811986991', 'alt': '0.864088615179', 'roll': '0.65358821448'}
{'yaw': '0.867859846349', 'lon': '0.0257943331321', 'pitch': '0.7771720543', 'lat': '0.224272358026', 'alt': '0.558781266787', 'roll': '0.553860900387'}
{'yaw': '0.858457154515', 'lon': '0.319157113471', 'pitch': '0.839665099121', 'lat': '0.935834737188', 'alt': '0.485786356564', 'roll': '0.891366550168'}
{'yaw': '0.923332637644', 'lon': '0.172463762615', 'pitch': '0.0590637085953', 'lat': '0.268639628313', 'alt': '0.591748494576', 'roll': '0.876590865835'}



In [48]:

    
# so to get the data by column, we do this
fin = open(csv_file, 'rb')
csv_data = csv.DictReader(fin, delimiter=',')

# let's see how the structure looks
for line in csv_data:
    print line['lat']









    



0.662308453532
0.371069637451
0.181811986991
0.224272358026
0.935834737188
0.268639628313



In [49]:

    
# let's write a csv file with headers using DictWriter

# from above we saw that the structure generated from DictReader was a list
# of dictionaries - so generate a list of dictionaries

import os
import csv

# define the list
rows = []
rows.append({'name': 'mal', 'dob': 2468, 'role': 'captain'})
rows.append({'name': 'zoe', 'dob': 2484, 'role': 'first mate'})
rows.append({'name': 'wash', 'dob': 2468, 'role': 'pilot'})
rows.append({'name': 'inara', 'dob': 2460, 'role': 'companion'})
rows.append({'name': 'jayne', 'dob': 2463, 'role': 'mercenary'})

# define the header
header = ['name', 'dob', 'role']

# open the file
with open('data' + os.sep + 'csv_output_2.csv', 'wb') as fout:
    csv_writer = csv.DictWriter(fout, header)
    csv_writer.writeheader()
    csv_writer.writerows(rows)



In [ ]: