Programming with Python

Main goal: Learning Programming principles. If you master basic principles of programming, your research workflow can be greatly simplified. Learning Python language is NOT the main goal.


In [ ]:
git clone https://github.com/sungeunbae/python-files.git

In [ ]:
cd python-files

In [ ]:
!ls

Enter "ipython notebook" to get started

Lesson 1: Analyzing Patient Data


In [ ]:
import numpy

We import other modules to utilize other people's work. numpy enables to do fancy things with numbers and matrices


In [ ]:
numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

We use "." to ask Python to run the function loadtxt that belongs to the numpy library. It is used everywhere in Python to refer to the parts of things as thing.component


In [ ]:
weight_kg = 55

Assigning 55 to a variable weight_kg


In [ ]:
print weight_kg

In [ ]:
print 'weight in pounds:', 2.2 * weight_kg

In [ ]:
weight_kg = 57

In [ ]:
weight_lb = 2.2*weight_kg

In [ ]:
print 'weight in kilograms:',weight_kg, ' and in pounds:', weight_lb

Hands-on

Draw diagrams showing what variables refer to what values after each statement in the following program:


In [ ]:
weight = 70.5
age = 35
# Take a trip to the planet Neptune
weight = weight * 1.14
age = age + 20

What does the following program print out?


In [ ]:
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth

In [ ]:
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

In [ ]:
print data

In [ ]:
print type(data)

In [ ]:
print data.shape

In [ ]:
data[0,0] #top-left corner

In [ ]:
data[1,1]

In [ ]:
data[1,2]

In [ ]:
print data[0:4, 0:10]

In [ ]:
print data[5:10, 0:10]

In [ ]:
data[:3,36:] #:x is used to indicate from 0 to x (excluding x). x: indicates from x to the end.

In [ ]:
small = data[:3,36:]
print 'small is'
print small

Hands-on


In [ ]:
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]

What is the value of element[:4]? What about element[4:]? Or element[:]? What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does


In [ ]:
doubledata = data*2.0

In [ ]:
print 'original:'
print data[:3, 36:]
print 'doubledata:'
print doubledata[:3, 36:]

In [ ]:
tripledata = doubledata+data
print tripledata[:3,36:]

In [ ]:
print doubledata+2 #Creates an array of the same shape filled with 2's

In [ ]:
print data.mean()

data.shape : attribute (noun)

data.mean() : method, an action you can do with/to "data".


In [ ]:
print 'maximum inflammation:', data.max()
print 'minimum inflammation:', data.min()
print 'standard deviation:',data.std()

How do I know? try help(data) or look at Python manual


In [ ]:
help(data) #gives full documents. sometimes too much!

In [ ]:
dir(data) #gives a list of functions and attributes. Less information

Things listed are either "functions" or "attributes". Don't worry about all ___XXX ___, there are system variables or methods, not really meant to be used for usual programming.

How do you know if it is a function or an attribute?


In [ ]:
type(data.size)

In [ ]:
type(data.shape)

In [ ]:
type(data.std)

Or try help(data.xxxx) for more details.


In [ ]:
help(data.std)

Hands-on

Looking at the output of dir(data), can you figure out the function that computes the total of all elements? Can you figure out how to use the function?


In [ ]:
patient_0 = data[0,:] #row 0 and every column, extract everything from row 0
print 'maximum inflammation for patient 0:', patient_0.max()

In [ ]:
print data.mean(axis=0) #average inflmmation per day for all patients

In [ ]:
print data.mean(axis=0).shape


In [ ]:
print data.mean(axis=1)

In [ ]:
print data.mean(axis=1).shape

In [ ]:
#next line is very important - otherwise, your notebook will hang forever!!!!
%matplotlib inline 
from matplotlib import pyplot
pyplot.imshow(data) 
pyplot.show() #create a heatmap of our data and show

In [ ]:
ave_inflammation = data.mean(axis=0) # average inflammation over time.  per day for all patients
pyplot.plot(ave_inflammation)  # create a line graph of these values
pyplot.show()

In [ ]:
import numpy as np #alias to reduce typing

from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')

plt.figure(figsize=(10.0, 3.0))

plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

Challenge

1. Moving plots around

Modify the program to display the three plots on top of one another instead of side by side.


In [ ]:
import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')

plt.figure(figsize=(3.0, 10.0))

plt.subplot(3, 1, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(3, 1, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(3, 1, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

2.What's inside the box?

Draw diagrams showing what variables refer to what values after each statement in the following program


In [ ]:
mass = 47.5
age=122
mass = mass *2.0
age=age-20
print mass, age

(a) mass, age

(b) 47.5 122

(c) 95.0 102

(d) 95.0
     102

3. Sorting out references

What does the following program print out?


In [ ]:
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth

(a) causes an error

(b) Grace Hopper

(c) Hopper Grace

(d) third, fourth

4.Make your own plot

Create a plot showing the standard deviation of the inflammation data for each day across all patients.


In [ ]:
import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.ylabel('std')
plt.plot(data.std(axis=0))
plt.show()

Lesson 2: Analyzing Multiple Data Sets


In [ ]:
ls *.csv

We have a dozen data sets to process and analyse. Of course, we can repeat the process over and over - but it will be frustrating to do so. If you want to become a good programmer, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born programming talents.) We can teach the computer how to repeat things and let it do the boring job.


In [ ]:
s='christchurch'
print s[0]
print s[1]
print s[2]
print s[3]
print s[4]
print s[5]
print s[6]
print s[7]
print s[8]
print s[9]
print s[10]
print s[11]

In [ ]:
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'

In [ ]:
for c in s: #remember indentation
    print c

In [ ]:
length=0 #repeatedly updating this variable
for c in s:
    print c
    length=length+1
print "There are",length,'characters'

In [ ]:
len(s)

Challenge

1. for loop through a string

What does the following code print out?


In [ ]:
s1='Newton'
s2=''
for c in s1:
    s2=c+s2
print s2

(a) (empty)

(b) N+e+w+t+o+n

(c) Newton

(d) notweN

2. Slicing strings

A section of an array is called a slice. We can take slices of character strings as well:


In [ ]:
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]

What is the value of element[:4]? What about element[4:]? Or element[:]?

What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does.


In [ ]:
print element[:4]
print element[4:]
print element[:]
print element[-1]
print element[-2]
print element[1:-1]

========================================================================================================================

List


In [ ]:
odds = [1,3,5,7]
print 'odds are:', odds

In [ ]:
print 'first and last:', odds[0], odds[-1] #last element

Somehow, a list is similar to a string


In [ ]:
for number in odds:
    print number

In [ ]:
names = ['Newton', 'Darwing','Turing']
print 'names is originally:',names
names[1]='Darwin' #we can update an element of list
print 'finval value of names:',names

In [ ]:
name = 'Bell'
name[0]='b' #we can't update a character in a string this way

In [ ]:
odds.append(11)
print 'odds after adding a value:',odds

In [ ]:
del odds[0]
print 'odds after removing the first element:',odds

In [ ]:
odds.reverse()
print 'odds after reversing:',odds

In [ ]:
odds.sort()
print 'odds aftger sorting:',odds

Challenge

Python has a built-in function called range that creates a list of numbers:

range(3) produces [0, 1, 2], range(2, 5) produces [2, 3, 4].

Using range, write a loop that uses range to print

(1). the first 10 natural numbers

(2). the total sum of first 10 natural numbers


In [ ]:
for i in range(1,11):
    print i

In [ ]:
sum = 0
for i in range(1,11):
    sum += i
print sum

========================================================================================================================

We covered almost everything we need to know to process multiple data sets - One more thing!

glob : To collect file names matching a pattern


In [ ]:
import glob
print glob.glob('*.csv') #collects files that match the pattern

Challenge

1. Draw multiple graphs

Write a code that collects the first 3 filenames from the list obtained by


In [ ]:
glob.glob('*.csv')

Convert the following code (copied from above) to process the first 3 inflammation files and draw 3 graphs (ie. mean, max, min) for each file using "for" loop


In [ ]:
%matplotlib inline
#don't forget the line above!
import glob
import numpy as np

from matplotlib import pyplot as plt

#do something here to take the first 3 files from the list.
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.figure(figsize=(10.0, 3.0))

plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

Challenge

What is the output of the following program?


In [ ]:
v=3
p=1
for i in range(2):
    p = p*v
    
print p

(1) 3

(2) 6

(3) 9

(4) 27

Lesson 3: Making Choices


In [ ]:


In [ ]:
num=37
if num > 100:
    print 'greater'
else:
    print 'not greater'
print 'done'

We can add some intelligence to our program to make decisions


In [ ]:
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'
for c in s:
    if c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'

In [ ]:
s='msoffice2013'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'

In [ ]:
if (1>0) and (-1>0):
    print 'both parts are true'
else:
    print 'one part is not true'

In [ ]:
if (1<0) or (-1 < 0):
    print 'at least one test is true'
else:
    print 'failed all tests'

Challenge

Look at the output from the code below


In [ ]:
if '': print 'empty string is true'
if 'word': print 'non-empty string is true'
if []: print 'empty list is true'
if [1,2,3]: print 'non-empty list is true'
if 0: print 'zero is true'
if 1: print 'non-zero is true'

What do you think this code will output?


In [ ]:
a=[1,2]
del a[0]
a.append(0)
del a[0]
x= a[0]
if x : print "x is",x

(a) x is 1

(b) x is 0

(c) x is 2

(d) <- nothing

In-place operators

Python (and most other languages in the C family) provides in-place operators that work like this:


In [ ]:
x=1
x+=1
x*=3
print x

Write some code that sums the positive and negative numbers in a list separately, using in-place operators.


In [ ]:
l=[-3,-1,-2,1,2,3]
psum=0
nsum=0
for v in l:
    #do something here
    
print "total of positive numbers is",psum,"and total of negative numbers is",nsum

swap


In [ ]:
left=1
right=2
temp=left
left=right
right=temp
print left, right

In [ ]:
left=1
right=2
left,right = right, left
print left, right

Lesson 4: Creating Functions

Same operation for different inputs


In [ ]:
s='msoffice2013'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'

In [ ]:
s='windows7'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'

In [ ]:
def classify_char(s):
    print s
    for c in s:
        if c in ['1','2','3','4','5','6','7','8','9','0']:
            print c,'is a digit'
        elif c in ['a','e','i','o','u']:
            print c,'is a vowel'
        else:
            print c, 'is a consonant'

classify_char('msoffice2013')
classify_char('windows7')

return

To send a result back to whoever asked for it


In [ ]:
def double(v):
    return 2*v

print double(2)
print double(3) #what happens if you do double([1,2,3])?

In [ ]:
def say_hello(name):
    return "Hello "+name+", how are you?"

print say_hello("Sung")
print say_hello("John")

Challenge

Combining strings

"Adding" two strings produces their concatention: 'a' + 'b' is 'ab'. Write a function called fence that takes two parameters called original and wrapper and returns a new string that has the wrapper character at the beginning and end of the original:


In [ ]:
def fence(a,b):
    #do something here
    return result
print fence('name','*') #expecting *name*

Selecting characters from strings

If the variable s refers to a string, then s[0] is the string's first character and s[-1] is its last. Write a function called outer that returns a string made up of just the first and last characters of its input:


In [ ]:
def outer(a):
    #do something here
print outer('helium') #expecting hm

Farenheit to Kelvin and to Celcius


In [ ]:
def fahr_to_kelvin(temp):
    return ((temp-32)*(5/9))+273.15

In [ ]:
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)

Something is wrong! Why?


In [ ]:
(212-32)*(5/9)

In [ ]:
5/9

In [ ]:
5/9.

In [ ]:
def fahr_to_kelvin(temp):
    return ((temp-32)*(5/9.))+273.15

In [ ]:
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)

In [ ]:
def kelvin_to_celcius(temp):
    return temp- 273.15

In [ ]:
print 'absolute zero in Celcius:', kelvin_to_celcius(0.0)

In [ ]:
def fahr_to_celcius(temp):
    temp_k = fahr_to_kelvin(temp)
    result = kelvin_to_celcius(temp_k)
    return result

print 'freezing point of water in Celcius', fahr_to_celcius(32.0)

Challenge

Copy the program above that generated 3 graphs for each inflammation file (using for loop). Create a function "analyze" that accepts one filename as a parameter, opens the file and stores the value, then displays 3 graphs for that file. (no return needed)


In [ ]:
def analyze(filename):
    #do something here

Test the function by


In [ ]:
analyze('inflammation-01.csv')

and you expect to see 3 graphs for inflammation-01.csv Then, complete the code below such that it will call "analyze" funtion inside the for loop, each time with a different filename


In [ ]:
import numpy as np
from matplotlib import pyplot as plt
import glob

filenames = glob.glob('*.csv')

filenames= filenames[0:3]
for f in filenames:
    print f
    #do something here

Lesson 5: Defensive Programming

In an ideal world, you write a program and it will work out of box. Sadly, it doesn't happen often in reality. How do we know if our program is working correctly? And how do we know if our program is still working correctly when we make a changes to it?

One strategy for writing a correct program starts with assumption that mistakes will happen and guard against them.

This is called "defensive programming" - this is analoguous to "defensive driving". We assume there will be a crazy drivers and give an extra care to defend ourselves.

The most common way to do it is to add "assertions" to our code so that it checks itself as it runs.


In [17]:
def avg_age(ages):
    sum = 0.0
    for v in ages:
        sum= sum+v
    print sum/len(ages)

avg_age([10,30,20])
avg_age([10,-30,20])


20.0
0.0

An assertion is simply a statement that something must be true at a certain point in a program. Python evaluates the assertion condition, if it is true, Python does nothing, but if it is false, Python halts the program immediately and prints the error message.

When you cross an intersection and drive defensively, you check left and right. If there is a crazy driver at the intersection, what do you do? Press the brake, make sure you are ok first, then roll down the window and say some nice words to the offender. :) Assertion works just like that. Terminate and error message.


In [18]:
def avg_age(ages):
    sum = 0.0
    for v in ages:
        assert v >= 0, "invalid age:"+str(v)
        sum= sum+v
    print sum/len(ages)

avg_age([10,30,20])
avg_age([10,-30,20])


20.0
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-45ef73a187e8> in <module>()
      7 
      8 avg_age([10,30,20])
----> 9 avg_age([10,-30,20])

<ipython-input-18-45ef73a187e8> in avg_age(ages)
      2     sum = 0.0
      3     for v in ages:
----> 4         assert v >= 0, "invalid age:"+str(v)
      5         sum= sum+v
      6     print sum/len(ages)

AssertionError: invalid age:-30

Challenge

Expected output?


In [ ]:
def seq_check(s):
    for c in s:
        assert c in ['a','t','c','g'], "invalid input:"+str(c)
    print "all good:"+s
    
seq_check('atcg')
seq_check('bcgt')
seq_check('ctga')

(a)

all good:atcg
all good:bcgt
all good:ctga


(b)

(AssertionError)

(c)

all good:atcg
(AssertionError)

(d)

all good:atcg
(AssertionError)
all good:ctga

Lesson 6: Command-Line Programs

We used IPython notebook - this is a great tool for prototyping code and training. But if we want to do some more serious work in Python, we will need to learn how to write a standalone program in Python. For example, open Bash terminal and run the following command (in the same directory as all .csv files)


In [ ]:
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv
"""""""""""""""""""""

If you can run your python program in the terminal, you can combine with shell commands!


In [ ]:
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv |head -4
5.45
5.425
6.1
5.9
"""""""""""""""""""""

Now let's learn how to write a command-line python program. Using the text editor, open argv-list.py (Run the next command in Bash terminal)


In [ ]:
#argv-list.py 
import sys
print 'sys.argv is', sys.argv

In [ ]:
"""""""""""""""""""""
$ python argv-list.py 

sys.argv is ['argv-list.py']
"""""""""""""""""""""

In [ ]:
"""""""""""""""""""""
$ python argv-list.py first second third

sys.argv is ['argv-list.py', 'first', 'second', 'third']
"""""""""""""""""""""

sys.argv[0] is the program file name (ALWAYS!), sys.argv[1...] are all the arguments you call with the program.

Open readings-01.py from the text editor and examine the code.


In [ ]:
#readings-01.py 
import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1): #mean in the same row
        print m

and let's play with the data file small-01..-03.csv.


In [ ]:
"""""""""""""""""""""
$ cat small-01.csv
0,0,1
0,1,2
$ cat small-02.csv
9,17,15
20,8,5
$ cat small-03.csv
0,2,0
1,1,0
"""""""""""""""""""""

Let's run our program with one of this file as an input


In [ ]:
"""""""""""""""""""""
$python readings-01.py small-01.csv
"""""""""""""""""""""

No output - because the main() function was defined, but not called.


In [ ]:
#readings-02.py 
import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1):
        print m

main() # <==== Call the function to do some action!!

In [ ]:
"""""""""""""""""""""
$ python readings-02.py small-01.csv
0.333333333333
1.0
$ python readings-02.py small-02.csv
13.6666666667
11.0
$ python readings-02.py small-03.csv
0.666666666667
0.666666666667
"""""""""""""""""""""

Challenge

Write a program that behaves like this


In [ ]:
"""""""""""""""""""""
$ python readings.py small-01.csv small-02.csv small-03.csv
0.333333333333
1.0
13.6666666667
11.0
0.666666666667
0.666666666667
"""""""""""""""""""""

Note

For more advanced interaction with a command-line Python program, consider using argparse library. (https://docs.python.org/2/howto/argparse.html)

Wrapping up

Find a Python program "show_graphs.py" and examine the code.


In [ ]:
import sys
import numpy as np
from matplotlib import pyplot as plt
import glob

def display(files):
  plt.figure(figsize=(10.0, 3.0))
  for f in files:
    data = np.loadtxt(fname=f, delimiter=',')
    plt.subplot(1, 3, 1)
    plt.ylabel('average')
    plt.plot(data.mean(axis=0))

    plt.subplot(1, 3, 2)
    plt.ylabel('max')
    plt.plot(data.max(axis=0))

    plt.subplot(1, 3, 3)
    plt.ylabel('min')
    plt.plot(data.min(axis=0))

  plt.tight_layout()
  plt.show()


files = sys.argv[1:]
display(files)

Run the program with the following command. Examine the output


In [ ]:
"""""""""""""""""""""
$ python show_graphs.py inflammation-01.csv inflammation-02.csv inflammation-03.csv inflammation-04.csv 
"""""""""""""""""""""