% Lecture 2 Python % Jonah M. Duckles (jduckles@ou.edu)

Review

  • Python is an interpreted language
  • Python is gaining ground as a "glue" language to connect scicentific software tools.
  • Python is free and Open Source software

Core data types

  • Numbers
  • Strings
  • Lists
  • Dictionaries
  • Tuples
  • Files
  • Sets
  • Boolean
  • None

In python types are "objects" and have associated type-specific methods that perform common usefull operations on data of that type

Lists

Lists are a basic data structure in python

Creating a list

Lists are created by placing list elements between []'s.


In [4]:
names = ['Tom', 'Dick', 'Hary']

In [9]:
names[0]


Out[9]:
'Tom'

In [14]:
names[1:3]


Out[14]:
['Dick', 'Hary']

slices use [start:end] notation where end is the first element NOT in the slice (a bit different than R's notation), also notice that the list index is based at 0, also a difference from R. More on slicing at Software Carpentry.

Caution

Be careful when making "copies" of mutable objects in Python.


In [120]:
a = [1,2,3]
b = a
a[1] = 10
print(b)


[1, 10, 3]

Assignment = in python is just placing a pointer to a particular memory location. There is a very good chance that something else might change that memory location if the type is mutable (changeable).

Iterating over lists


In [5]:
for name in names:
    print "Current name is " + name


Current name is Tom
Current name is Dick
Current name is Hary

Iterators

Notice that we don't have to use explicit indexes in for loops.

This is because a list is considered to be an "iterable" in python. Iterables are just objects with a built in definition of what iteration over entails.

List objects have a series of methods on them useful for operating on lists

Inspecting objects

In python everything is an object, objects have methods, it can be helpful to see what the methods of an object are. The dir() function can help with that.


In [6]:
dir(names)


Out[6]:
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

We see that the object has a series of __xxx__ methods and a few named methods. Advanced: The __xxx__ methods are acutally revealing how that object responds to various other operators and functions. Iteration is actually handled by the __iter__ method.

The methods that don't begin with __ are the ones we want to look at.

Lets try one.


In [7]:
names.append('John')

In [8]:
names


Out[8]:
['Tom', 'Dick', 'Hary', 'John']

In [11]:
names.sort() # in place sort

In [12]:
names


Out[12]:
['Dick', 'Hary', 'John', 'Tom']

In [15]:
names.remove('John')

In [16]:
names


Out[16]:
['Dick', 'Hary', 'Tom']

Getting Help


In [18]:
print help(list.remove)


Help on method_descriptor:

remove(...)
    L.remove(value) -- remove first occurrence of value.
    Raises ValueError if the value is not present.

None

In [19]:
print help(list.sort)


Help on method_descriptor:

sort(...)
    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
    cmp(x, y) -> -1, 0, 1

None

Making Help

Documenting functions in python is very easy. You place a "docstring" after the function definition line. Docstrings are denoted by either three quotes (""" or ''' are both valid) on the line following the function definition statement.


In [74]:
def somefunction(someparam):
    """ 
    A function that returns the parameter with some descriptive text
    """
    return('You sent the parameter {}'.format(someparam))

In [75]:
help(somefunction)


Help on function somefunction in module __main__:

somefunction(someparam)
    A function that returns the parameter with some descriptive text


In [77]:
somefunction('woooweee')


Out[77]:
'You sent the parameter woooweee'

Tuples

Tuples are like lists except they are immutable. Immutable just means they can't be changed once they're created.


In [23]:
names_tuple = ('Tom', 'Dick', 'Harry')

In [24]:
names_tuple


Out[24]:
('Tom', 'Dick', 'Harry')

Tuples just switch [] for (), but behavior is slightly different.

  1. Order never changes
  2. Can't be changed at all in-place

In [22]:
dir(names)


Out[22]:
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

In [27]:
names.index('Dick')


Out[27]:
1

Dictionaries

Python dictionaries are an implementation of what is called a hash-table. A hash table is a particular (and efficient) way of creating an associative data structure with key/value pairs. Each key has a value that it returns. Importantly, it requires that each key be unique.


In [43]:
bob1 = {'height': 76, 'first_name': 'Bob', 'last_name': 'Waldron', 'gender': 'M'}

dictionaries are created by placing {} and filling it with 'key': value pairs. The key must be a hashable, practically this means it should be a unique string for the dictionary { 'key1': value, 'key2': value2 }


In [33]:
bob2 = {'height': 76, 'height': 62 }

In [34]:
bob2


Out[34]:
{'height': 62}

Be carefull with dictionary keys. Duplicates keys will clober the previous value without warning.

Dictionaries deconstructed


In [19]:
print bob1.values() # Method to return a list of values
print bob1.keys() # Method to return list of keys


['M', 'Bob', 'Waldron', 76]
['gender', 'first_name', 'last_name', 'height']

You can put a dictionary's values and keys back together using zip() and dict()


In [22]:
tmp = zip(bob1.keys(),bob1.values()) # putting a dictionary back together
print tmp
print dict(tmp)


[('gender', 'M'), ('first_name', 'Bob'), ('last_name', 'Waldron'), ('height', 76)]
{'gender': 'M', 'first_name': 'Bob', 'last_name': 'Waldron', 'height': 76}

We see that another way to create a dictionary is with a list of tuples which we run the dict() function on.


In [28]:
for k,v in bob1.items():
    print "Key is: {} Value is: {}".format(k,v)


Key is: gender Value is: M
Key is: first_name Value is: Bob
Key is: last_name Value is: Waldron
Key is: height Value is: 76

Items method just places tuples for each key/value pair into a list.


In [45]:
bob1.items()


Out[45]:
[('gender', 'M'),
 ('first_name', 'Bob'),
 ('last_name', 'Waldron'),
 ('height', 76)]

String formatting

Special language for filling in strings (kind of like a templating language, but somewhat limited)


In [46]:
'Name: {first_name} {last_name}  gender: {gender}'.format(first_name='Bob', last_name='Waldron', gender='M')


Out[46]:
'Name: Bob Waldron  gender: M'

In [52]:
template = 'Name: {first_name} {last_name}  gender: {gender}'
template.format(**bob1) # Fill values with dictionary


Out[52]:
'Name: Bob Waldron  gender: M'

In [30]:
# Pretty numbers
population = 309000000
print '{:,}'.format(population)


309,000,000

Dictionaries can have their own internal structure as well. It is handled as a dictionary inside of a dictionary.


In [125]:
client1 = {'name': "James Goodyear", 
          'address': { 'street': '100 Goodguy lane', 
                      'city': 'Great Place',
                      'state': 'PA',
                      'zip': 15512
                      }
          }

In [126]:
client2 = {'name': "Paul Gibson", 
          'address': { 'street': '204 Shady lane', 
                      'city': 'Orlando',
                      'state': 'FL',
                      'zip': 68924
                      }
          }
customers = [client1, client2]

In [128]:
print customers


[{'name': 'James Goodyear', 'address': {'city': 'Great Place', 'state': 'PA', 'street': '100 Goodguy lane', 'zip': 15512}}, {'name': 'Paul Gibson', 'address': {'city': 'Orlando', 'state': 'FL', 'street': '204 Shady lane', 'zip': 68924}}]

Notice that hash-table collisions are not a problem across independent dictionaries.

Extracting data from Dictionaries

Dictionary keys can be accesses using a [] notation to pull out keys and subkeys.


In [129]:
client1['address']


Out[129]:
{'city': 'Great Place',
 'state': 'PA',
 'street': '100 Goodguy lane',
 'zip': 15512}

In [130]:
client1['address']['street']


Out[130]:
'100 Goodguy lane'

File IO


In [133]:
writer = open('example.txt', 'w')
writer.write('This is some text')
writer.write('Some more text')
writer.close()

In [135]:
reader = open('example.txt', 'r')
print reader.readline()
reader.close()


This is some textSome more text

Ooops, we just appended the text together...we need to send line feeds.


In [136]:
writer = open('example.txt','w')
writer.write('This is some text\n')
writer.write('Some more text\n')
writer.close()

In [137]:
reader = open('example.txt', 'r')
for line in reader:
    print line,


This is some text
Some more text

As part of the standard library there is a csv reader that should be used if you're reading/writing CSV data.


In [99]:
import csv
help(csv)


Help on module csv:

NAME
    csv - CSV parsing and writing.

FILE
    /usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py

DESCRIPTION
    This module provides classes that assist in the reading and writing
    of Comma Separated Value (CSV) files, and implements the interface
    described by PEP 305.  Although many CSV files are simple to parse,
    the format is not formally defined by a stable specification and
    is subtle enough that parsing lines of a CSV file with something
    like line.split(",") is bound to fail.  The module supports three
    basic APIs: reading, writing, and registration of dialects.
    
    
    DIALECT REGISTRATION:
    
    Readers and writers support a dialect argument, which is a convenient
    handle on a group of settings.  When the dialect argument is a string,
    it identifies one of the dialects previously registered with the module.
    If it is a class or instance, the attributes of the argument are used as
    the settings for the reader or writer:
    
        class excel:
            delimiter = ','
            quotechar = '"'
            escapechar = None
            doublequote = True
            skipinitialspace = False
            lineterminator = '\r\n'
            quoting = QUOTE_MINIMAL
    
    SETTINGS:
    
        * quotechar - specifies a one-character string to use as the 
            quoting character.  It defaults to '"'.
        * delimiter - specifies a one-character string to use as the 
            field separator.  It defaults to ','.
        * skipinitialspace - specifies how to interpret whitespace which
            immediately follows a delimiter.  It defaults to False, which
            means that whitespace immediately following a delimiter is part
            of the following field.
        * lineterminator -  specifies the character sequence which should 
            terminate rows.
        * quoting - controls when quotes should be generated by the writer.
            It can take on any of the following module constants:
    
            csv.QUOTE_MINIMAL means only when required, for example, when a
                field contains either the quotechar or the delimiter
            csv.QUOTE_ALL means that quotes are always placed around fields.
            csv.QUOTE_NONNUMERIC means that quotes are always placed around
                fields which do not parse as integers or floating point
                numbers.
            csv.QUOTE_NONE means that quotes are never placed around fields.
        * escapechar - specifies a one-character string used to escape 
            the delimiter when quoting is set to QUOTE_NONE.
        * doublequote - controls the handling of quotes inside fields.  When
            True, two consecutive quotes are interpreted as one during read,
            and when writing, each quote character embedded in the data is
            written as two quotes

CLASSES
    Dialect
        excel
            excel_tab
    DictReader
    DictWriter
    Sniffer
    exceptions.Exception(exceptions.BaseException)
        _csv.Error
    
    class Dialect
     |  Describe an Excel dialect.
     |  
     |  This must be subclassed (see csv.excel).  Valid attributes are:
     |  delimiter, quotechar, escapechar, doublequote, skipinitialspace,
     |  lineterminator, quoting.
     |  
     |  Methods defined here:
     |  
     |  __init__(self)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  delimiter = None
     |  
     |  doublequote = None
     |  
     |  escapechar = None
     |  
     |  lineterminator = None
     |  
     |  quotechar = None
     |  
     |  quoting = None
     |  
     |  skipinitialspace = None
    
    class DictReader
     |  Methods defined here:
     |  
     |  __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
     |  
     |  __iter__(self)
     |  
     |  next(self)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  fieldnames
    
    class DictWriter
     |  Methods defined here:
     |  
     |  __init__(self, f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
     |  
     |  writeheader(self)
     |  
     |  writerow(self, rowdict)
     |  
     |  writerows(self, rowdicts)
    
    class Error(exceptions.Exception)
     |  Method resolution order:
     |      Error
     |      exceptions.Exception
     |      exceptions.BaseException
     |      __builtin__.object
     |  
     |  Data descriptors defined here:
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from exceptions.Exception:
     |  
     |  __init__(...)
     |      x.__init__(...) initializes x; see help(type(x)) for signature
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from exceptions.Exception:
     |  
     |  __new__ = <built-in method __new__ of type object>
     |      T.__new__(S, ...) -> a new object with type S, a subtype of T
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from exceptions.BaseException:
     |  
     |  __delattr__(...)
     |      x.__delattr__('name') <==> del x.name
     |  
     |  __getattribute__(...)
     |      x.__getattribute__('name') <==> x.name
     |  
     |  __getitem__(...)
     |      x.__getitem__(y) <==> x[y]
     |  
     |  __getslice__(...)
     |      x.__getslice__(i, j) <==> x[i:j]
     |      
     |      Use of negative indices is not supported.
     |  
     |  __reduce__(...)
     |  
     |  __repr__(...)
     |      x.__repr__() <==> repr(x)
     |  
     |  __setattr__(...)
     |      x.__setattr__('name', value) <==> x.name = value
     |  
     |  __setstate__(...)
     |  
     |  __str__(...)
     |      x.__str__() <==> str(x)
     |  
     |  __unicode__(...)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors inherited from exceptions.BaseException:
     |  
     |  __dict__
     |  
     |  args
     |  
     |  message
    
    class Sniffer
     |  "Sniffs" the format of a CSV file (i.e. delimiter, quotechar)
     |  Returns a Dialect object.
     |  
     |  Methods defined here:
     |  
     |  __init__(self)
     |  
     |  has_header(self, sample)
     |  
     |  sniff(self, sample, delimiters=None)
     |      Returns a dialect (or None) corresponding to the sample
    
    class excel(Dialect)
     |  Describe the usual properties of Excel-generated CSV files.
     |  
     |  Data and other attributes defined here:
     |  
     |  delimiter = ','
     |  
     |  doublequote = True
     |  
     |  lineterminator = '\r\n'
     |  
     |  quotechar = '"'
     |  
     |  quoting = 0
     |  
     |  skipinitialspace = False
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from Dialect:
     |  
     |  __init__(self)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from Dialect:
     |  
     |  escapechar = None
    
    class excel_tab(excel)
     |  Describe the usual properties of Excel-generated TAB-delimited files.
     |  
     |  Method resolution order:
     |      excel_tab
     |      excel
     |      Dialect
     |  
     |  Data and other attributes defined here:
     |  
     |  delimiter = '\t'
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from excel:
     |  
     |  doublequote = True
     |  
     |  lineterminator = '\r\n'
     |  
     |  quotechar = '"'
     |  
     |  quoting = 0
     |  
     |  skipinitialspace = False
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from Dialect:
     |  
     |  __init__(self)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes inherited from Dialect:
     |  
     |  escapechar = None

FUNCTIONS
    field_size_limit(...)
        Sets an upper limit on parsed fields.
            csv.field_size_limit([limit])
        
        Returns old limit. If limit is not given, no new limit is set and
        the old limit is returned
    
    get_dialect(...)
        Return the dialect instance associated with name.
        dialect = csv.get_dialect(name)
    
    list_dialects(...)
        Return a list of all know dialect names.
        names = csv.list_dialects()
    
    reader(...)
        csv_reader = reader(iterable [, dialect='excel']
                                [optional keyword args])
            for row in csv_reader:
                process(row)
        
        The "iterable" argument can be any object that returns a line
        of input for each iteration, such as a file object or a list.  The
        optional "dialect" parameter is discussed below.  The function
        also accepts optional keyword arguments which override settings
        provided by the dialect.
        
        The returned object is an iterator.  Each iteration returns a row
        of the CSV file (which can span multiple input lines):
    
    register_dialect(...)
        Create a mapping from a string name to a dialect class.
        dialect = csv.register_dialect(name, dialect)
    
    unregister_dialect(...)
        Delete the name/dialect mapping associated with a string name.
        csv.unregister_dialect(name)
    
    writer(...)
        csv_writer = csv.writer(fileobj [, dialect='excel']
                                    [optional keyword args])
            for row in sequence:
                csv_writer.writerow(row)
        
            [or]
        
            csv_writer = csv.writer(fileobj [, dialect='excel']
                                    [optional keyword args])
            csv_writer.writerows(rows)
        
        The "fileobj" argument can be any object that supports the file API.

DATA
    QUOTE_ALL = 1
    QUOTE_MINIMAL = 0
    QUOTE_NONE = 3
    QUOTE_NONNUMERIC = 2
    __all__ = ['QUOTE_MINIMAL', 'QUOTE_ALL', 'QUOTE_NONNUMERIC', 'QUOTE_NO...
    __version__ = '1.0'

VERSION
    1.0


Functions

As we've seen python functions are declared with a def some name and zero or more parameters.


In [138]:
def add(a,b):
    return a + b
add(1,10)


Out[138]:
11

Arguments are either positional or keyword arguments. Positional arguments are specified in the order they're declared in the function definition. keyword arguments are specified using keyword= notation.

Functions can have keyword parameters as well


In [87]:
def gravitationalforce(m1,m2,r,G=6.67384e-11):
    """ 
        Compute gravitational force between two masses 
        using Newton's Law of Gravitation
        F = G * (m1 * m2) / r^2
    """
    return( G * (m1 * m2) / r**2 )

In [117]:
# Current gravitational force between Earth and Mars
mars_kg = 639e21
earth_kg = 5.972e24
mars_earth_102013 = 2.9772864e11
gravitationalforce(mars_kg,earth_kg,mars_earth_102013)


Out[117]:
2873129627542957.0

We could provide an alternate Gravitational constant for an alternat universe if necessary. Not likely, but possible.

  • Positional arguments must come first
  • keyword arguments can't be blank, they must have a default value, none can be an acceptable default value.

In [118]:
alternateG = 9.65e-11
gravitationalforce(mars_kg,earth_kg,mars_earth_102013,G=alternateG)


Out[118]:
4154385017589504.0

You can also allow functions to take an arbitrary number of arguments and/or keyword arguments using a special syntax.


In [103]:
def passthrough(*args,**kwargs):
    """ A function that takes an arbitrary number of arguments and kwargs and returns them """
    return({'args':args, 'kwargs': kwargs })

In [102]:
passthrough(1,2,3,4,5,foo="something",bar="another thing")


Out[102]:
{'args': (1, 2, 3, 4, 5),
 'kwargs': {'bar': 'another thing', 'foo': 'something'}}

Numpy

Numpy gives Python arrays/matrices and associated methods for manipulating arrays. It has to be installed on a system (doesn't come as part of the standard Python Library). Numpy is installed on EOMF systems we'll be using for this class.


In [90]:
import numpy
values = [1,2,3]
array = numpy.array(values)

In [91]:
array


Out[91]:
array([1, 2, 3])

Arrays are forced to have homogenious data types to improve performance and memory management.


In [95]:
values = [1,"2",3]
numpy.array(values)


Out[95]:
array(['1', '2', '3'], 
      dtype='|S1')

In [105]:
numpy.zeros((10,10))


Out[105]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [106]:
numpy.ones((10,5))


Out[106]:
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [108]:
numpy.identity(5)


Out[108]:
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

In [116]:
emptya = numpy.empty((4,4)) # Values are whatever was in memory location previously
print emptya
print emptya.shape


[[ -1.28822975e-231  -2.00389819e+000   9.43679304e-096   7.77663117e+160]
 [  4.76660077e+180   5.80836406e+180   7.26613223e+223   3.03667987e-152]
 [  9.08367203e+223   4.71651679e+257   7.35874515e+223   3.57872291e+161]
 [  1.33358377e+179   1.40020754e+195   5.43239540e-311   1.11253910e-308]]
(4, 4)

In [139]:
emptya.ravel() # unravel to 1-dimensional array


Out[139]:
array([ -1.28822975e-231,  -2.00389819e+000,   9.43679304e-096,
         7.77663117e+160,   4.76660077e+180,   5.80836406e+180,
         7.26613223e+223,   3.03667987e-152,   9.08367203e+223,
         4.71651679e+257,   7.35874515e+223,   3.57872291e+161,
         1.33358377e+179,   1.40020754e+195,   5.43239540e-311,
         1.11253910e-308])

Next time

  • Numpy with GDAL
  • Remote sensing data
  • Python multiprocessing