File handling

Input/Output


In [2]:
# raw input function reads from standard in and returns a string
answer = raw_input("enter your name:")
print answer

# input function takes python code as input !attention!
answer = input("python input:")
print answer


enter your name:Jannis
Jannis
python input:range(12)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Simple file operations

writing to files

In [4]:
# files can be opened using the open function, which creates a file object
f = open( 'new_file.txt', 'w' ) # attention overwrites existing file
# importan functions: read, write, readlines, writelines
#dir(f) 
f.write("Hallo Welt!")
f.close()

In [5]:
# writelines
lines  = []
for i in range(12):
    lines.append("Number: " + str(i) + '\n')
print lines


f = open( 'new_file.txt', 'a' )  # open file and append to it
f.writelines(lines)
f.close()


['Number: 0\n', 'Number: 1\n', 'Number: 2\n', 'Number: 3\n', 'Number: 4\n', 'Number: 5\n', 'Number: 6\n', 'Number: 7\n', 'Number: 8\n', 'Number: 9\n', 'Number: 10\n', 'Number: 11\n']
reading from files

In [6]:
# usage of with to open files is recommended in python
with open('new_file.txt', 'r') as f: # open file for reading
    content = f.read() # get the whole content of a file into a string

print content


Hallo Welt!Number: 0
Number: 1
Number: 2
Number: 3
Number: 4
Number: 5
Number: 6
Number: 7
Number: 8
Number: 9
Number: 10
Number: 11


In [7]:
with open('new_file.txt', 'r') as f: # open file for reading
    lines = f.readlines()
print lines
f.close()


['Hallo Welt!Number: 0\n', 'Number: 1\n', 'Number: 2\n', 'Number: 3\n', 'Number: 4\n', 'Number: 5\n', 'Number: 6\n', 'Number: 7\n', 'Number: 8\n', 'Number: 9\n', 'Number: 10\n', 'Number: 11\n']

Parsing data


In [8]:
with open('data.txt', 'r') as f:
    lines = f.readlines()
#print lines

data = {}

# iterate over all lines in the file
for line in lines:
    if line.startswith('#'): # skip comments
        continue
    left, right = line.split(':') # split splits a string at the occurence of the keyword
    data[ left.strip() ] = float(right) # strip removes leading and tailing spaces
print data


---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-8-e2931ceb2032> in <module>()
----> 1 with open('data.txt', 'r') as f:
      2     lines = f.readlines()
      3 #print lines
      4 
      5 data = {}

IOError: [Errno 2] No such file or directory: 'data.txt'

Object serialization using pickle


In [9]:
# the pickle class is used to serialize python variables (convert them to bytestrings)
# cPickle is written in C and a lot faster as normal pickle, but cannot be subclassed
import cPickle as pickle
how pickle converts objects to strings

In [10]:
d = { 1: 'green', 2: 'blue', 3: 'red' }
pickle.dumps( d )


Out[10]:
"(dp1\nI1\nS'green'\np2\nsI2\nS'blue'\np3\nsI3\nS'red'\np4\ns."
how to save objects to a file

In [11]:
with open('save.p', 'w') as f:
    pickle.dump(d, f)

In [12]:
with open('save.p', 'r') as f:
    loaded_data = pickle.load(f)
print loaded_data


{1: 'green', 2: 'blue', 3: 'red'}

Organizing files in folders


In [13]:
# create a folder for the data files
import os

# get current working directory
work_path = os.getcwd()
print work_path


/home/uhlendorf/Documents/python_course/fachkurs_2015/hu_bp_python_course/05_dataio_matplotlib

In [14]:
# define path for data files
data_path = os.path.join(work_path, 'data/')

# check if folder exists already
if not os.path.exists(data_path): 
    os.mkdir(data_path)

Comma separated value (CSV) files

reading tabular data using pandas


In [17]:
with open('real_estate.csv', 'r') as f:
    f.readlines()

In [18]:
import pandas as pd
df = pd.read_csv( 'real_estate.csv' )
print df
#print df.values.tolist()


                              street             city    zip state  beds  \
0                       3526 HIGH ST       SACRAMENTO  95838    CA     2   
1                        51 OMAHA CT       SACRAMENTO  95823    CA     3   
2                     2796 BRANCH ST       SACRAMENTO  95815    CA     2   
3                   2805 JANETTE WAY       SACRAMENTO  95815    CA     2   
4                    6001 MCMAHON DR       SACRAMENTO  95824    CA     2   
5                 5828 PEPPERMILL CT       SACRAMENTO  95841    CA     3   
6                6048 OGDEN NASH WAY       SACRAMENTO  95842    CA     3   
7                      2561 19TH AVE       SACRAMENTO  95820    CA     3   
8    11150 TRINITY RIVER DR Unit 114   RANCHO CORDOVA  95670    CA     2   
9                       7325 10TH ST        RIO LINDA  95673    CA     3   
10                  645 MORRISON AVE       SACRAMENTO  95838    CA     3   
11                     4085 FAWN CIR       SACRAMENTO  95823    CA     3   
12                   2930 LA ROSA RD       SACRAMENTO  95815    CA     1   
13                     2113 KIRK WAY       SACRAMENTO  95822    CA     3   
14               4533 LOCH HAVEN WAY       SACRAMENTO  95842    CA     2   
15                    7340 HAMDEN PL       SACRAMENTO  95842    CA     2   
16                       6715 6TH ST        RIO LINDA  95673    CA     2   
17           6236 LONGFORD DR Unit 1   CITRUS HEIGHTS  95621    CA     2   
18                   250 PERALTA AVE       SACRAMENTO  95833    CA     2   
19                   113 LEEWILL AVE        RIO LINDA  95673    CA     3   
20                6118 STONEHAND AVE   CITRUS HEIGHTS  95621    CA     3   
21                 4882 BANDALIN WAY       SACRAMENTO  95823    CA     4   
22                   7511 OAKVALE CT  NORTH HIGHLANDS  95660    CA     4   
23                      9 PASTURE CT       SACRAMENTO  95834    CA     3   
24                3729 BAINBRIDGE DR  NORTH HIGHLANDS  95660    CA     3   
25                3828 BLACKFOOT WAY         ANTELOPE  95843    CA     3   
26                   4108 NORTON WAY       SACRAMENTO  95820    CA     3   
27                  1469 JANRICK AVE       SACRAMENTO  95832    CA     3   
28                     9861 CULP WAY       SACRAMENTO  95827    CA     4   
29             7825 CREEK VALLEY CIR       SACRAMENTO  95828    CA     3   
..                               ...              ...    ...   ...   ...   
955                  2100 BEATTY WAY        ROSEVILLE  95747    CA     3   
956              6920 GILLINGHAM WAY  NORTH HIGHLANDS  95660    CA     3   
957                 82 WILDFLOWER DR             GALT  95632    CA     3   
958                  8652 BANTON CIR        ELK GROVE  95624    CA     4   
959              8428 MISTY PASS WAY         ANTELOPE  95843    CA     3   
960                7958 ROSEVIEW WAY       SACRAMENTO  95828    CA     3   
961                    9020 LUKEN CT        ELK GROVE  95624    CA     3   
962              7809 VALLECITOS WAY       SACRAMENTO  95828    CA     3   
963               8445 OLD AUBURN RD   CITRUS HEIGHTS  95610    CA     3   
964                  10085 ATKINS DR        ELK GROVE  95757    CA     3   
965              9185 CERROLINDA CIR        ELK GROVE  95758    CA     3   
966                 9197 CORTINA CIR        ROSEVILLE  95678    CA     3   
967                  5429 HESPER WAY       CARMICHAEL  95608    CA     4   
968                 1178 WARMWOOD CT             GALT  95632    CA     4   
969                    4900 ELUDE CT       SACRAMENTO  95842    CA     4   
970                    3557 SODA WAY       SACRAMENTO  95834    CA     0   
971             3528 SAINT GEORGE DR       SACRAMENTO  95821    CA     3   
972                7381 WASHBURN WAY  NORTH HIGHLANDS  95660    CA     3   
973             2181 WINTERHAVEN CIR     CAMERON PARK  95682    CA     3   
974                 7540 HICKORY AVE       ORANGEVALE  95662    CA     3   
975              5024 CHAMBERLIN CIR        ELK GROVE  95757    CA     3   
976                2400 INVERNESS DR          LINCOLN  95648    CA     3   
977                  5 BISHOPGATE CT       SACRAMENTO  95823    CA     4   
978                 5601 REXLEIGH DR       SACRAMENTO  95823    CA     4   
979                 1909 YARNELL WAY        ELK GROVE  95758    CA     3   
980               9169 GARLINGTON CT       SACRAMENTO  95829    CA     4   
981                  6932 RUSKUT WAY       SACRAMENTO  95823    CA     3   
982                7933 DAFFODIL WAY   CITRUS HEIGHTS  95610    CA     3   
983                 8304 RED FOX WAY        ELK GROVE  95758    CA     4   
984              3882 YELLOWSTONE LN  EL DORADO HILLS  95762    CA     3   

     baths  sq__ft         type                     sale_date   price  \
0        1     836  Residential  Wed May 21 00:00:00 EDT 2008   59222   
1        1    1167  Residential  Wed May 21 00:00:00 EDT 2008   68212   
2        1     796  Residential  Wed May 21 00:00:00 EDT 2008   68880   
3        1     852  Residential  Wed May 21 00:00:00 EDT 2008   69307   
4        1     797  Residential  Wed May 21 00:00:00 EDT 2008   81900   
5        1    1122        Condo  Wed May 21 00:00:00 EDT 2008   89921   
6        2    1104  Residential  Wed May 21 00:00:00 EDT 2008   90895   
7        1    1177  Residential  Wed May 21 00:00:00 EDT 2008   91002   
8        2     941        Condo  Wed May 21 00:00:00 EDT 2008   94905   
9        2    1146  Residential  Wed May 21 00:00:00 EDT 2008   98937   
10       2     909  Residential  Wed May 21 00:00:00 EDT 2008  100309   
11       2    1289  Residential  Wed May 21 00:00:00 EDT 2008  106250   
12       1     871  Residential  Wed May 21 00:00:00 EDT 2008  106852   
13       1    1020  Residential  Wed May 21 00:00:00 EDT 2008  107502   
14       2    1022  Residential  Wed May 21 00:00:00 EDT 2008  108750   
15       2    1134        Condo  Wed May 21 00:00:00 EDT 2008  110700   
16       1     844  Residential  Wed May 21 00:00:00 EDT 2008  113263   
17       1     795        Condo  Wed May 21 00:00:00 EDT 2008  116250   
18       1     588  Residential  Wed May 21 00:00:00 EDT 2008  120000   
19       2    1356  Residential  Wed May 21 00:00:00 EDT 2008  121630   
20       2    1118  Residential  Wed May 21 00:00:00 EDT 2008  122000   
21       2    1329  Residential  Wed May 21 00:00:00 EDT 2008  122682   
22       2    1240  Residential  Wed May 21 00:00:00 EDT 2008  123000   
23       2    1601  Residential  Wed May 21 00:00:00 EDT 2008  124100   
24       2     901  Residential  Wed May 21 00:00:00 EDT 2008  125000   
25       2    1088  Residential  Wed May 21 00:00:00 EDT 2008  126640   
26       1     963  Residential  Wed May 21 00:00:00 EDT 2008  127281   
27       2    1119  Residential  Wed May 21 00:00:00 EDT 2008  129000   
28       2    1380  Residential  Wed May 21 00:00:00 EDT 2008  131200   
29       2    1248  Residential  Wed May 21 00:00:00 EDT 2008  132000   
..     ...     ...          ...                           ...     ...   
955      2    1371  Residential  Thu May 15 00:00:00 EDT 2008  208250   
956      1    1310  Residential  Thu May 15 00:00:00 EDT 2008  208318   
957      2    1262  Residential  Thu May 15 00:00:00 EDT 2008  209347   
958      2    1740  Residential  Thu May 15 00:00:00 EDT 2008  211500   
959      2    1517  Residential  Thu May 15 00:00:00 EDT 2008  212000   
960      2    1450  Residential  Thu May 15 00:00:00 EDT 2008  213000   
961      2    1416  Residential  Thu May 15 00:00:00 EDT 2008  216000   
962      1     888  Residential  Thu May 15 00:00:00 EDT 2008  216021   
963      2    1882  Residential  Thu May 15 00:00:00 EDT 2008  219000   
964      2    1302  Residential  Thu May 15 00:00:00 EDT 2008  219794   
965      2    1418  Residential  Thu May 15 00:00:00 EDT 2008  220000   
966      2       0        Condo  Thu May 15 00:00:00 EDT 2008  220000   
967      2    1319  Residential  Thu May 15 00:00:00 EDT 2008  220000   
968      2    1770  Residential  Thu May 15 00:00:00 EDT 2008  220000   
969      2    1627  Residential  Thu May 15 00:00:00 EDT 2008  223000   
970      0       0  Residential  Thu May 15 00:00:00 EDT 2008  224000   
971      1    1040  Residential  Thu May 15 00:00:00 EDT 2008  224000   
972      1     960  Residential  Thu May 15 00:00:00 EDT 2008  224252   
973      2       0  Residential  Thu May 15 00:00:00 EDT 2008  224500   
974      1    1456  Residential  Thu May 15 00:00:00 EDT 2008  225000   
975      2    1450  Residential  Thu May 15 00:00:00 EDT 2008  228000   
976      2    1358  Residential  Thu May 15 00:00:00 EDT 2008  229027   
977      2    1329  Residential  Thu May 15 00:00:00 EDT 2008  229500   
978      2    1715  Residential  Thu May 15 00:00:00 EDT 2008  230000   
979      2    1262  Residential  Thu May 15 00:00:00 EDT 2008  230000   
980      3    2280  Residential  Thu May 15 00:00:00 EDT 2008  232425   
981      2    1477  Residential  Thu May 15 00:00:00 EDT 2008  234000   
982      2    1216  Residential  Thu May 15 00:00:00 EDT 2008  235000   
983      2    1685  Residential  Thu May 15 00:00:00 EDT 2008  235301   
984      2    1362  Residential  Thu May 15 00:00:00 EDT 2008  235738   

      latitude   longitude  
0    38.631913 -121.434879  
1    38.478902 -121.431028  
2    38.618305 -121.443839  
3    38.616835 -121.439146  
4    38.519470 -121.435768  
5    38.662595 -121.327813  
6    38.681659 -121.351705  
7    38.535092 -121.481367  
8    38.621188 -121.270555  
9    38.700909 -121.442979  
10   38.637663 -121.451520  
11   38.470746 -121.458918  
12   38.618698 -121.435833  
13   38.482215 -121.492603  
14   38.672914 -121.359340  
15   38.700051 -121.351278  
16   38.689591 -121.452239  
17   38.679776 -121.314089  
18   38.612099 -121.469095  
19   38.689999 -121.463220  
20   38.707851 -121.320707  
21   38.468173 -121.444071  
22   38.702792 -121.382210  
23   38.628631 -121.488097  
24   38.701499 -121.376220  
25   38.709740 -121.373770  
26   38.537526 -121.478315  
27   38.476472 -121.501711  
28   38.558423 -121.327948  
29   38.472122 -121.404199  
..         ...         ...  
955  38.737882 -121.308142  
956  38.694279 -121.373395  
957  38.259708 -121.311616  
958  38.444000 -121.370993  
959  38.722959 -121.347115  
960  38.467836 -121.410366  
961  38.451398 -121.366614  
962  38.508217 -121.411207  
963  38.715423 -121.246743  
964  38.390893 -121.437821  
965  38.424497 -121.426595  
966  38.793152 -121.290025  
967  38.665104 -121.315901  
968  38.289544 -121.284607  
969  38.696740 -121.350519  
970  38.631026 -121.501879  
971  38.629468 -121.376445  
972  38.703550 -121.375103  
973  38.697570 -120.995739  
974  38.703056 -121.235221  
975  38.389756 -121.446246  
976  38.897814 -121.324691  
977  38.467936 -121.445477  
978  38.445342 -121.441504  
979  38.417382 -121.484325  
980  38.457679 -121.359620  
981  38.499893 -121.458890  
982  38.708824 -121.256803  
983  38.417000 -121.397424  
984  38.655245 -121.075915  

[985 rows x 12 columns]

JavaScript Object Notation (JSON)

JSON (/ˈdʒeɪsən/ JAY-sən),[1] or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

Although originally derived from the JavaScript scripting language, JSON is a language-independent data format. Code for parsing and generating JSON data is readily available in many programming languages.

https://en.wikipedia.org/wiki/JSON


In [34]:
cat employees.json


{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}

In [41]:
import json

d = json.load( open('employees.json') ) 

d['employees'][1]


Out[41]:
{u'firstName': u'Anna', u'lastName': u'Smith'}