Core language

A. Variables

Variables are used to store and modify values.


In [3]:
a = 5
b = a + 3.1415
c = a / b

print(a, b, c)


5 8.1415 0.6141374439599582

Note, we did not need to declare variable types (like in fortran), we could just assign anything to a variable and it works. This is the power of an interpreted (as opposed to compiled) language. Also, we can add different types (a is an integer, and we add the float 3.1415 to get b). The result is 'upcast' to whatever data type can handle the result. I.e., adding a float and an int results in a float.

Variables can store lots of different kinds of data


In [4]:
s = 'Ice cream'            # A string
f = [1, 2, 3, 4]           # A list
d = 3.1415928              # A floating point number
i = 5                      # An integer
b = True                   # A boolean value

Side note: Anything followed by a # is a comment, and is not considered part of the code. Comments are useful for explaining what a bit of code does. USE COMMENTS

You can see what type a variable has by using the type function, like


In [5]:
type(s)


Out[5]:
str

Exercise

Use type to see the types of the other variables



In [ ]:


Exercise

What happens when you add variables of the same type? What about adding variables of different types?



In [ ]:

You can test to see if a variable is a particular type by using the isinstance(var, type) function.


In [6]:
isinstance(s, str)  # is s a string?


Out[6]:
True

In [7]:
isinstance(f, int)  # is s an integer?


Out[7]:
False

C. Tests for equality and inequality

We can test the values of variables using different operators. These tests return a Boolean value. Either True or False. False is the same as zero, True is nonzero. Note that assignment = is different than a test of equality ==.


In [8]:
a < 99


Out[8]:
True

In [9]:
a > 99


Out[9]:
False

In [10]:
a == 5.


Out[10]:
True

These statements have returned "booleans", which are True and False only. These are commonly used to check for conditions within a script or function to determine the next course of action.

NOTE: booleans are NOT equivalent to a string that says "True" or "False". We can test this:


In [11]:
True == 'True'


Out[11]:
False

There are other things that can be tested, not just mathematical equalities. For example, to test if an element is inside of a list or string (or any sequence, more on sequences below..), do


In [12]:
foo = [1, 2, 3, 4, 5 ,6]
5 in foo


Out[12]:
True

In [13]:
'this' in 'What is this?'


Out[13]:
True

In [14]:
'that' in 'What is this?'


Out[14]:
False

D. Intro to functions

We will discuss functions in more detail later in this notebook, but here is a quick view to help with the homework.

Functions allow us to write code that we can use in the future. When we take a series of code statements and put them in a function, we can reuse that code to take in inputs, perform calculations or other manipulations, and return outputs, just like a function in math.

Almost all of the code you submit in your homework will be within functions so that I can use and test the functionality of your code.

Here we have a function called display_and_capitalize_string which takes in a string, prints that string, and then returns the same string but with it capitalized.


In [15]:
def display_and_capitalize_string(input_str):
    '''Documentation for this function, which can span
    multiple 
    lines since triple quotes are used for this.
    
    Takes in a string, prints that string, and then returns the same string but with it capitalized.'''
    
    print(input_str)  # print out to the screen the string that was input, called `input_str`
    
    new_string = input_str.capitalize()  # use built-in method for a string to capitalize it
    
    return new_string

In [16]:
display_and_capitalize_string('hi')


hi
Out[16]:
'Hi'

This is analogous to the relationship between a variable and a function in math. The variable is $x$, and the function is $f(x)$, which changes the input $x$ in some way, then returns a new value. To access that returned value, you have to use the function -- not just define the function.


In [17]:
# input variable, x. Internal to the function itself, it is called 
# input_str.
x = 'hi'  

# function f(x) is `display_and_capitalize_string`
# the function returns the variable `output_string`
output_string = display_and_capitalize_string('hi')


hi

Exercise

Write your own functions that do the following:

1. Take in a number and return that number plus 10.
2. Take in a variable and return the `type` of the variable.


In [ ]:


In [ ]:

Equality checks are commonly used to test the outcome of a function to make sure it is performing as expected and desire. We can test the function we wrote before to see if it works the way we expect and want it to. Here are three different ways to test the outcome of the same input/output pair.


In [18]:
out_string = display_and_capitalize_string('banana')
assert(out_string == 'Banana')


banana

In [19]:
from nose.tools import assert_equal
assert_equal(out_string, "Banana")

In [20]:
assert(out_string[0].isupper())

We know that the assert statements passed because no error was thrown. On the other hand, the following test does not run successfully:


In [21]:
assert(out_string=='BANANA')


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-21-0182adb04c3b> in <module>
----> 1 assert(out_string=='BANANA')

AssertionError: 

Exercise

Write tests using assertions to check how well your functions from the previous exercise are working.



In [ ]:

E. Conditionals

Conditionals have a similar syntax to for statements. Generally, conditionals look like

if <test>:
    <Code run if...>
    <...test is valid>

or

if <first test>:
    <Code run if...>
    <...the first test is valid>
elif <second test>:
    <Code run if...>
    <...the second test is valid>
else:
    <Code run if...>
    <...neither test is valid>

In both cases the test statements are code segments that return a boolean value, often a test for equality or inequality. The elif and else statements are always optional; both, either, or none can be included.


In [22]:
x = 20
    
if x < 10:
    print('x is less than 10')
else:
    print('x is more than 10')


x is more than 10

Exercise

Rerun the code block above using different values for x. What happens if x=10?

Add an elif statement to the second block of code that will print something if x==10.



In [ ]:

F. Strings

Strings are made using various kinds of (matching) quotes. Examples:


In [23]:
s1 = 'hello'
s2 = "world"
s3 = '''strings can 
also go 'over'
multiple "lines".'''
s2


Out[23]:
'world'

In [24]:
print(s3)


strings can 
also go 'over'
multiple "lines".

You can also 'add' strings using 'operator overloading', meaning that the plus sign can take on different meanings depending on the data types of the variables you are using it on.


In [25]:
print( s1 + ' ' + s2)  # note, we need the space otherwise we would get 'helloworld'


hello world

We can include special characters in strings. For example \n gives a newline, \t a tab, etc. Notice that the multiple line string above (s3) is converted to a single quote string with the newlines 'escaped' out with \n.


In [26]:
s3.upper()


Out[26]:
'STRINGS CAN \nALSO GO \'OVER\'\nMULTIPLE "LINES".'

Strings are 'objects' in that they have 'methods'. Methods are functions that act on the particular instance of a string object. You can access the methods by putting a dot after the variable name and then the method name with parentheses (and any arguments to the method within the parentheses). Methods always have to have parentheses, even if they are empty.


In [27]:
s3.capitalize()


Out[27]:
'Strings can \nalso go \'over\'\nmultiple "lines".'

One of the most useful string methods is 'split' that returns a list of the words in a string, with all of the whitespace (actual spaces, newlines, and tabs) removed. More on lists next.


In [28]:
s3.split()


Out[28]:
['strings', 'can', 'also', 'go', "'over'", 'multiple', '"lines".']

Another common thing that is done with strings is the join method. It can be used to join a sequence of strings given a common conjunction


In [29]:
words = s3.split()
'_'.join(words)        # Here, we are using a method directly on the string '_' itself.


Out[29]:
'strings_can_also_go_\'over\'_multiple_"lines".'

G. Containers

Often you need lists or sequences of different values (e.g., a timeseries of temperature – a list of values representing the temperature on sequential days). There are three containers in the core python language. There are a few more specialized containers (e.g., numpy arrays and pandas dataframes) for use in scientific computing that we will learn much more about later; they are very similar to the containers we will learn about here.

Lists

Lists are perhaps the most common container type. They are used for sequential data. Create them with square brackets with comma separated values within:


In [30]:
foo = [1., 2., 3, 'four', 'five', [6., 7., 8], 'nine']
type(foo)


Out[30]:
list

Note that lists (unlike arrays, as we will later learn) can be heterogeneous. That is, the elements in the list don't have to have the same kind of data type. Here we have a list with floats, ints, strings, and even another (nested) list!

We can retrieve the individual elements of a list by 'indexing' the list. We do this with square brackets, using zero-based indexes – that is 0 is the first element – as such:


In [31]:
foo[0]


Out[31]:
1.0

In [32]:
foo[5]


Out[32]:
[6.0, 7.0, 8]

In [33]:
foo[5][1]  # Python is sequential, we can access an element within an element using sequential indexing.


Out[33]:
7.0

In [34]:
foo[-1]    # This is the way to access the last element.


Out[34]:
'nine'

In [35]:
foo[-3]    # ...and the third to last element


Out[35]:
'five'

In [36]:
foo[-3][2]   # we can also index strings.


Out[36]:
'v'

We can get a sub-sequence from the list by giving a range of the data to extract. This is done by using the format

start:stop:stride

where start is the first element, up to but not including the element indexed by stop, taking every stride elements. The defaluts are start at the beginning, include through the end, and include every element.

The up-to-but-not-including part is confusing to first time Python users, but makes sense given the zero-based indexing. For example, foo[:10] gives the first ten elements of a sequence.


In [37]:
# create a sequence of 10 elements, starting with zero, up to but not including 10.
bar = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [38]:
bar[2:5]


Out[38]:
[2, 3, 4]

In [39]:
bar[:4]


Out[39]:
[0, 1, 2, 3]

In [40]:
bar[:]


Out[40]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [41]:
bar[::2]


Out[41]:
[0, 2, 4, 6, 8]

Exercise

Use the list

bar = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

use indexing to get the following sequences:

[3, 4, 5]

[9]        # note this is different than just the last element. 
           # It is a sequence with only one element, but still a sequence

[2, 5, 8]

What happens when you exceed the limits of the list?

bar[99]
bar[-99]
bar[5:99]



In [ ]:

You can assign values to list elements by putting the indexed list on the right side of the assignment, as


In [42]:
bar[5] = -99
bar


Out[42]:
[0, 1, 2, 3, 4, -99, 6, 7, 8, 9]

This works for sequences as well,


In [43]:
bar[2:7] = [1, 1, 1, 1, 1, 1, 1, 1]
bar


Out[43]:
[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 8, 9]

Lists are also 'objects'; they also have 'methods'. Methods are functions that are designed to be applied to the data contained in the list. You can access them by putting a dot and the method name after the variable (called an 'object instance')


In [44]:
bar.insert(5, 'here')
bar


Out[44]:
[0, 1, 1, 1, 1, 'here', 1, 1, 1, 1, 1, 7, 8, 9]

In [45]:
bar = [4, 5, 6, 7, 3, 6, 7, 3, 5, 7, 9]
bar.sort()    # Note that we don't do 'bar = bar.sort()'. The sorting is done in place.
bar


Out[45]:
[3, 3, 4, 5, 5, 6, 6, 7, 7, 7, 9]

Exercise

What other methods are there? Type bar. and then <TAB>. This will show the possible completions, which in this case is a list of the methods and attributes. You can get help on a method by typing, for example, bar.pop?. The text in the help file is called a docstring; as we will see below, you can write these for your own functions.

See if you can use these four methods of the list instance bar:

        1. append
        2. pop
        3. index
        4. count




In [ ]:

Tuples

Tuples (pronounced too'-puls) are sequences that can't be modified, and don't have methods. Thus, they are designed to be immutable sequences. They are created like lists, but with parentheses instead of square brackets.


In [46]:
foo = (3, 5, 7, 9)
# foo[2] = -999  # gives an assignment error. Commented so that all cells run.

Tuples are often used when a function has multiple outputs, or as a lightweight storage container. Becuase of this, you don't need to put the parentheses around them, and can assign multiple values at a time.


In [47]:
a, b, c = 1, 2, 3   # Equivalent to '(a, b, c) = (1, 2, 3)'
print(b)


2

Dictionaries

Dictionaries are used for unordered sequences that are referenced by arbitrary 'keys' instead of by a (sequential) index. Dictionaries are created using curly braces with keys and values separated by a colon, and key:value pairs separated by commas, as


In [48]:
foobar = {'a':3, 'b':4, 'c':5}

Elements are referenced and assigned by keys:


In [49]:
foobar['b']


Out[49]:
4

In [50]:
foobar['c'] = -99
foobar


Out[50]:
{'a': 3, 'b': 4, 'c': -99}

The keys and values can be extracted as lists using methods of the dictionary class.


In [51]:
foobar.keys()


Out[51]:
dict_keys(['a', 'b', 'c'])

In [52]:
foobar.values()


Out[52]:
dict_values([3, 4, -99])

New values can be assigned simply by assigning a value to a key that does not exist yet


In [53]:
foobar['spam'] = 'eggs'
foobar


Out[53]:
{'a': 3, 'b': 4, 'c': -99, 'spam': 'eggs'}

Exercise

Create a dictionary variable with at least 3 entries. The entry keys should be the first name of people around you in the class, and the value should be their favorite food.

Explore the methods of the dictionary object, as was done with the list instance in the previous exercise.



In [ ]:

You can make an empty dictionary or list by using the dict and list functions respectively.


In [54]:
empty_dict = dict()
empty_list = list()
print(empty_dict, empty_list)


{} []

H. Logical Operators

You can compare statements that evaluate to a boolean value with the logical and and or. We can first think about this with boolean values directly:


In [5]:
True and True, True and False


Out[5]:
(True, False)

In [6]:
True or True, True or False


Out[6]:
(True, True)

Note that you can also use the word not to switch the meaning of a boolean:


In [18]:
not True, not False


Out[18]:
(False, True)

Now let's look at this with actual test examples instead of direct boolean values:


In [23]:
word = 'the'
sentence1 = 'the big brown dog'
sentence2 = 'I stand at the fridge'
sentence3 = 'go outside'

(word in sentence1) and (word in sentence2)


Out[23]:
True

In [24]:
(word in sentence1) and (word in sentence2) and (word in sentence3)


Out[24]:
False

In [25]:
(word in sentence1) or (word in sentence2) or (word in sentence3)


Out[25]:
True

In [20]:
x = 20
5 < x < 30, 5 < x and x < 30


Out[20]:
(True, True)

I. Loops

For loops

Loops are one of the fundamental structures in programming. Loops allow you to iterate over each element in a sequence, one at a time, and do something with those elements.

Loop syntax: Loops have a very particular syntax in Python; this syntax is one of the most notable features to Python newcomers. The format looks like

for *element* in *sequence*:                # NOTE the colon at the end
    <some code that uses the *element*>     # the block of code that is looped over for each element
    <more code that uses the *element*>     # is indented four spaces (yes four! yes spaces!)

<the code after the loop continues>         # the end of the loop is marked simply by unindented code

Thus, indentation is significant to the code. This was done because good coding practice (in almost all languages, C, FORTRAN, MATLAB) typically indents loops, functions, etc. Having indentation be significant saves the end of loop syntax for more compact code.

Some important notes on indentation Indentation in python is typically 4 spaces. Most programming text editors will be smart about indentation, and will also convert TABs to four spaces. Jupyter notebooks are smart about indentation, and will do the right thing, i.e., autoindent a line below a line with a trailing colon, and convert TABs to spaces. If you are in another editor remember: TABS AND SPACES DO NOT MIX. See PEP-8 for more information on the correct formatting of Python code.

A simple example is to find the sum of the squares of the sequence 0 through 99,


In [55]:
sum_of_squares = 0

for n in range(100):              # range yields a sequence of numbers from 0 up to but not including 100
    sum_of_squares += n**2        # the '+=' operator is equivalent to 'sum = sum + n**2', 
                                  # the '**' operator is a power, like '^' in other languages

print(sum_of_squares)


328350

You can iterate over any sequence, and in Python (like MATLAB) it is better to iterate over the sequence you want than to loop over the indices of that sequence. The following two examples give the same result, but the first is much more readable and easily understood than the second. Do the first whenever possible.


In [56]:
# THIS IS BETTER THAN THE NEXT CODE BLOCK. DO IT THIS WAY.
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''  # this initializes a string which we can then add onto
for word in words:
    sentence += word + ' '

sentence


Out[56]:
'the quick brown fox jumped over the lazy dog '

In [57]:
# DON'T DO IT THIS WAY IF POSSIBLE, DO IT THE WAY IN THE PREVIOUS CODE BLOCK.
words = ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

sentence = ''
for i in range(len(words)):
    sentence += words[i] + ' '

sentence


Out[57]:
'the quick brown fox jumped over the lazy dog '

Sometimes you want to iterate over a sequence but you also want the indices of those elements. One way to do that is the enumerate function:

enumerate(<sequence>)

This returns a sequence of two element tuples, the first element in each tuple is the index, the second the element. It is commonly used in for loops, like


In [58]:
for idx, word in enumerate(words):
    print('The index is', idx, '...')
    print('...and the word is', word)


The index is 0 ...
...and the word is the
The index is 1 ...
...and the word is quick
The index is 2 ...
...and the word is brown
The index is 3 ...
...and the word is fox
The index is 4 ...
...and the word is jumped
The index is 5 ...
...and the word is over
The index is 6 ...
...and the word is the
The index is 7 ...
...and the word is lazy
The index is 8 ...
...and the word is dog

List comprehension

There is a short way to make a list from a simple rule by using list comprehensions. The syntax is like

[<element(item)> for item in sequence]

for example, we can calculate the squares of the first 10 integers


In [59]:
[n**2 for n in range(10)]


Out[59]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The element can be any code snippet that depends on the item. This example gives a sequence of boolean values that determine if the element in a list is a string.


In [60]:
random_list = [1, 2, 'three', 4.0, ['five',]]
[isinstance(item, str) for item in random_list]


Out[60]:
[False, False, True, False, False]

In [61]:
random_list = [1, 2, 'three', 4.0, ['five',]]

foo = []
for item in random_list:
    foo.append(isinstance(item, str))
foo


Out[61]:
[False, False, True, False, False]

Exercise

Modify the previous list comprehension to test if the elements are integers.


While loops

The majority of loops that you will write will be for loops. These are loops that have a defined number of iterations, over a specified sequence. However, there may be times when it is not clear when the loop should terminate. In this case, you use a while loop. This has the syntax

while <condition>:
    <code>

condition should be something that can be evaluated when the loop is started, and the variables that determine the conditional should be modified in the loop.

This kind of loop should be use carefully — it is relatively easy to accidentally create an infinite loop, where the condition never is triggered to stop so the loop continues forever. This is especially important to avoid given that we are using shared resources in our class and a while loop that never ends can cause the computer the crash.


In [36]:
n = 5         # starting value
while n > 0:
    n -= 1    # subtract 1 each loop
    print(n)  # look at value of n


4
3
2
1
0

Flow control

There are a few commands that allow you to control the flow of any iterative loop: continue, break, and pass.

  • continue stops the current iteration and continues to the next element, if there is one.

  • break stops the current iteration, and leaves the loop.

  • pass does nothing, and is just a placeholder when syntax requires some code needs to be present


In [64]:
# print all the numbers, except 5

for n in range(10):
    if n == 5:
        continue
    print(n)


0
1
2
3
4
6
7
8
9

In [65]:
# print all the numbers up to (but not including) 5, then break out of the loop.

for n in range(10):
    print('.')
    if n == 5:
        break
    print(n)

print('done')


.
0
.
1
.
2
.
3
.
4
.
done

In [66]:
# pass can be used for empty functions or classes, 
# or in loops (in which case it is usually a placeholder for future code)

def foo(x):
    pass

class Foo(object):
    pass

x = 2
if x == 1:
    pass        # could just leave this part of the code out entirely...
elif x == 2:
    print(x)


2

J. Functions

Functions are ways to create reusable blocks of code that can be run with different variable values – the input variables to the function. Functions are defined using the syntax

def <function name> (var1, var2, ...):
    <block of code...>
    <...defining the function>
    return <return variable(s)>

Functions can be defined at any point in the code, and called at any subsequent point.


In [67]:
def addfive(x):
    return x+5

addfive(3.1415)


Out[67]:
8.1415

Function inputs and outputs

Functions can have multiple input and output values. The documentation for the function can (and should) be provided as a string at the beginning of the function.


In [68]:
def sasos(a, b, c):
    '''return the sum of a, b, and c and the sum of the squares of a, b, and c'''
    res1 = a + b + c
    res2 = a**2 + b**2 + c**2
    return res1, res2

s, ss = sasos(3, 4, 5)
print(s)
print(ss)


12
50

Functions can have variables with default values. You can also specify positional variables out of order if they are labeled explicitly.


In [69]:
def powsum(x, y, z, a=1, b=2, c=3):
    return x**a + y**b + z**c

print( powsum(2., 3., 4.) )
print( powsum(2., 3., 4., b=5) )
print( powsum(z=2., c=2, x=3., y=4.) )


75.0
309.0
23.0

Exercise

Verify powsum(z=2., x=3., y=4., c=2) is the same as powsum(3., 4., 2., c=2)

What happens when you do powsum(3., 4., 2., x=2)? Why?



In [ ]:


Exercise

Write a function that takes in a list of numbers and returns two lists of numbers: the odd numbers in the list and the even numbers in the list. That is, if your function is called odds_evens(), it should work as follows:

odds, evens = odds_evens([1,5,2,8,3,4]) odds, evens ([1, 5, 3], [2, 8, 4])

Note that x % y gives the remainder of x/y.

How would you change the code to make a counter (the index) available each loop?



In [ ]:

Docstrings

You can add 'help' text to functions (and classes) by adding a 'docstring', which is just a regular string, right below the definition of the function. This should be considered a mandatory step in your code writing.


In [70]:
def addfive(x):
    '''Return the argument plus five
    
    Input : x
            A number
    
    Output: foo
            The number x plus five
    
    '''
    return x+5


# now, try addfive?
addfive?

See PEP-257 for guidelines about writing good docstrings.

Scope

Variables within the function are treated as 'local' variables, and do not affect variables outside of the 'scope' of the function. That is, all of the variables that are changed within the block of code inside a function are only changed within that block, and do not affect similarly named variables outside the function.


In [71]:
x = 5

def changex(x):      # This x is local to the function
    x += 10.         # here the local variable x is changed
    print('Inside changex, x=', x)
    return x

res = changex(x)    # supply the value of x in the 'global' scope.
print(res)          
print(x)            # The global x is unchanged


Inside changex, x= 15.0
15.0
5

Variables from the 'global' scope can be used within a function, as long as those variables are unchanged. This technique should generally only be used when it is very clear what value the global variable has, for example, in very short helper functions.


In [72]:
x = 5

def dostuffwithx(y):
    res = y + x       # Here, the global value of x is used, since it is not defined inside the function.
    return res

print(dostuffwithx(3.0))
print(x)


8.0
5

Packing and unpacking function arguments

You can provide a sequence of arguments to a function by placing a * in front of the sequence, like

foo(*args)

This unpacks the elements of the sequence into the arguments of the function, in order.


In [73]:
list(range(3, 6))            # normal call with separate arguments


Out[73]:
[3, 4, 5]

In [74]:
args = [3, 6]
list(range(*args))            # call with arguments unpacked from a list


Out[74]:
[3, 4, 5]

You can also unpack dictionaries as keyword arguments by placing ** in front of the dictionary, like

bar(**kwargs)

These can be mixed, to an extent. E.g., foo(*args, **kwargs) works.

Using our function from earlier, here we call powsum first with keyword arguments written in and second by unpacking a dictionary.


In [75]:
x = 5; y = 6; z = 7
powdict = {'a': 1, 'b': 2, 'c': 3}

print(powsum(x, y, z, a=1, b=2, c=3))
print(powsum(x, y, z, **powdict))


384
384

One common usage is using the builtin zip function to take a 'transpose' of a set of points.


In [76]:
list(zip((1, 2, 3, 4, 5), ('a', 'b', 'c', 'd', 'e'), (6, 7, 8, 9, 10)))


Out[76]:
[(1, 'a', 6), (2, 'b', 7), (3, 'c', 8), (4, 'd', 9), (5, 'e', 10)]

In [77]:
pts = ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10))
x, y = list(zip(*pts))

print(x)
print(y)

# and back again,
print(list(zip(*(x,y))))


(1, 3, 5, 7, 9)
(2, 4, 6, 8, 10)
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

K. Classes

We won't cover classes in this class, but these notes are here for your reference in case you are interested.

Classes are used to define generic objects. The 'instances' of the class are supplied with specific data. Classes define a data structure, 'methods' to work with this data, and 'attributes' that define the data.

The computer science way to think of classes

Think of the class as a sentence. The nouns would be the classes, the associated verbs class methods, and associated adjectives class attributes. For example take the sentence

The white car signals and makes a left turn.

In this case the object is a car, a generic kind of vehicle. We see in the sentence that we have a particular instance of a car, a white car. Obviously, there can be many instances of the class car. White is a defining or distinguishing 'attribute' of the car. There are two 'methods' noted: signaling and turning. We might write the code for a car object like this:

class Car(object):

    def __init__(self, color):
        self.color = color

    def signal(self, direction):
        <signalling code>

    def turn(self, direction):
        <turning code>

The scientific way to thing about classes

Generally, in science we use objects to store and work with complicated data sets, so it is natural to think of the data structure first, and use that to define the class. The methods are functions that work on this data. The attributes hold the data, and other defining characteristics about the dataset (i.e., metadata). The primary advantage of this approach is that the data are in a specified structure, so that the methods can assume this structure and are thereby more efficient.

For example, consider a (atmospheric, oceanic, geologic) profile of temperature in the vertical axis. We might create a class that would look like:

class Profile(object):
    '''
    Documentation describing the object, in particular how it is instantiated.
    '''
    def __init__(self, z, temp, lat, lon, time):
        self.z = z            # A sequence of values defining the vertical positions of the samples
        self.property = temp  # A corresponding sequence of temperature values
        self.lat = lat        # The latitude at which the profile was taken
        self.lon = lon        # The longitude at which the profile was taken
        self.time = time      # The time at which the profile was taken

    def mean(self):
        'return the mean of the profile'
        <code to calculate the mean temperature along the profile>

Note, there could be a number of different choices for how the data are stored, more variables added to the profile, etc. Designing good classes is essential to the art of computer programming. Make classes as small and agile as possible, building up your code from small, flexible building blocks. Classes should be parsimonious and cogent. Avoid bloat.

Classes are traditionally named with a Capitol, sometimes CamelCase, sometimes underlined_words_in_a_row, as opposed to functions which are traditionally lower case (there are many exceptions to these rules, though). When a class instance is created, the special __init__ function is called to create the class instance. Within the class, the attributes are stored in self with a dot and the attribute name. Methods are defined like normal functions, but within the block, and the first argument is always self.

There are many other special functions, that allow you to, for exmaple, overload the addition operator (__add__) or have a representation of the class that resembles the command used to create it (__repr__).

Consider the example of a class defining a point on a 2D plan:


In [78]:
from math import sqrt     # more on importing external packages below

class Point(object):
    
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def norm(self):
        'The distance of the point from the origin'
        return sqrt(self.x**2 + self.y**2)
    
    def dist(self, other):
        'The distance to another point'
        dx = self.x - other.x
        dy = self.y - other.y
        return sqrt(dx**2 + dy**2)
        
    def __add__(self, other):
        return Point(self.x + other.x, self.y + other.y)
    
    def __repr__(self):
        return 'Point(%f, %f)' % (self.x, self.y)
    

p1 = Point(3.3, 4.)    # a point at location (3, 4)
p2 = Point(6., 8.)    # another point, we can have as many as we want..

res = p1.norm()
print('p1.norm() = ', res)

res = p2.norm()
print('p2.norm() = ', res)

res = p1.dist(p2)
res2 = p2.dist(p1)
print('The distance between p1 and p2 is', res)
print('The distance between p2 and p1 is', res2)

p3 = p1+p2
p1


p1.norm() =  5.185556864985669
p2.norm() =  10.0
The distance between p1 and p2 is 4.825971404805461
The distance between p2 and p1 is 4.825971404805461
Out[78]:
Point(3.300000, 4.000000)

Notice that we don't require other to be a Point class instance; it could be any object with x and y attributes. This is known as 'object composition' and is a useful approach for using multiple different kinds of objects with similar data in the same functions.

L. Packages

Functions and classes represent code that is intended to be reused over and over. Packages are a way to store and manage this code. Python has a number of 'built-in' classes and functions that we have discussed above. List, tuples and dictionaries; for and while loops; and standard data types are part of every python session.

There is also a very wide range of packages that you can import that extend the abilities of core Python. There are packages that deal with file input and output, internet communication, numerical processing, etc. One of the nice features about Python is that you only import the packages you need, so that the memory footprint of your code remains lean. Also, there are ways to import code that keep your 'namespace' organized.

Namespaces are one honking great idea -- let's do more of those!

In the same way directories keep your files organized on your computer, namespaces organize your Python environment. There are a number of ways to import packages, for example.


In [79]:
import math     # This imports the math function. Here 'math' is like a subdirectory 
                # in your namespace that holds all of the math functions

In [80]:
math.e
e = 15.7
print(math.e, e)


2.718281828459045 15.7

Exercise

After importing the math package, type math. and hit to see all the possible completions. These are the functions available in the math package. Use the math package to calculate the square root of 2.

There are a number of other ways to import things from the math package. Experiment with these commands

from math import tanh  # Import just the `tanh` function. Called as `tanh(x)`
import math as m       # Import the math package, but rename it to `m`. Functions called like `m.sin(x)`
from math import *     # All the functions imported to top level namespace. Functions called like `sin(x)`

This last example makes things easier to use, but is frowned on as it is less clear where different functions come from.

For the rest of the 'Zen of Python' type import this



In [ ]:

One particular package that is central to scientific Python is the numpy package (Numerical Python). We will talk about this package much more in the future, but will outline a few things about the package now. The standard way to import this package is


In [81]:
import numpy as np

The numpy package has the same math functions as the math package, but these functions are designed to work with numpy arrays. Arrays are the backbone of the numpy package. For now, just think of them as homogeneous, multidimensional lists.


In [82]:
a = np.array([[1., 2., 3], [4., 5., 6.]])
a


Out[82]:
array([[1., 2., 3.],
       [4., 5., 6.]])

In [83]:
np.sin(a)


Out[83]:
array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ]])

Note that we can have two sin functions at the same time, one from the math package and one from the numpy package. This is one of the advantages of namespaces.


In [84]:
math.sin(2.0) == np.sin(2.0)


Out[84]:
True