Programming with Python

Main goal: Learning Programming principles. If you master basic principles of programming, your research workflow can be greatly simplified. Learning Python language is NOT the main goal.



In [ ]:

    
git clone https://github.com/sungeunbae/python-files.git



In [ ]:

    
cd python-files



In [ ]:

    
!ls

Enter "ipython notebook" to get started

Lesson 1: Analyzing Patient Data



In [ ]:

    
import numpy

We import other modules to utilize other people's work. numpy enables to do fancy things with numbers and matrices



In [ ]:

    
numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

We use "." to ask Python to run the function loadtxt that belongs to the numpy library. It is used everywhere in Python to refer to the parts of things as thing.component



In [ ]:

    
weight_kg = 55

Assigning 55 to a variable weight_kg



In [ ]:

    
print weight_kg



In [ ]:

    
print 'weight in pounds:', 2.2 * weight_kg



In [ ]:

    
weight_kg = 57



In [ ]:

    
weight_lb = 2.2*weight_kg



In [ ]:

    
print 'weight in kilograms:',weight_kg, ' and in pounds:', weight_lb

Hands-on

Draw diagrams showing what variables refer to what values after each statement in the following program:



In [ ]:

    
weight = 70.5
age = 35
# Take a trip to the planet Neptune
weight = weight * 1.14
age = age + 20

What does the following program print out?



In [ ]:

    
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth



In [ ]:

    
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')



In [ ]:

    
print data



In [ ]:

    
print type(data)



In [ ]:

    
print data.shape



In [ ]:

    
data[0,0] #top-left corner



In [ ]:

    
data[1,1]



In [ ]:

    
data[1,2]



In [ ]:

    
print data[0:4, 0:10]



In [ ]:

    
print data[5:10, 0:10]



In [ ]:

    
data[:3,36:] #:x is used to indicate from 0 to x (excluding x). x: indicates from x to the end.



In [ ]:

    
small = data[:3,36:]
print 'small is'
print small

Hands-on



In [ ]:

    
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]

What is the value of element[:4]? What about element[4:]? Or element[:]? What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does



In [ ]:

    
doubledata = data*2.0



In [ ]:

    
print 'original:'
print data[:3, 36:]
print 'doubledata:'
print doubledata[:3, 36:]



In [ ]:

    
tripledata = doubledata+data
print tripledata[:3,36:]



In [ ]:

    
print doubledata+2 #Creates an array of the same shape filled with 2's



In [ ]:

    
print data.mean()

data.shape : attribute (noun)

data.mean() : method, an action you can do with/to "data".



In [ ]:

    
print 'maximum inflammation:', data.max()
print 'minimum inflammation:', data.min()
print 'standard deviation:',data.std()

How do I know? try help(data) or look at Python manual



In [ ]:

    
help(data) #gives full documents. sometimes too much!



In [ ]:

    
dir(data) #gives a list of functions and attributes. Less information

Things listed are either "functions" or "attributes". Don't worry about all ___XXX ___, there are system variables or methods, not really meant to be used for usual programming.

How do you know if it is a function or an attribute?



In [ ]:

    
type(data.size)



In [ ]:

    
type(data.shape)



In [ ]:

    
type(data.std)

Or try help(data.xxxx) for more details.



In [ ]:

    
help(data.std)

Hands-on

Looking at the output of dir(data), can you figure out the function that computes the total of all elements? Can you figure out how to use the function?



In [ ]:

    
patient_0 = data[0,:] #row 0 and every column, extract everything from row 0
print 'maximum inflammation for patient 0:', patient_0.max()



In [ ]:

    
print data.mean(axis=0) #average inflmmation per day for all patients



In [ ]:

    
print data.mean(axis=0).shape



In [ ]:

    
print data.mean(axis=1)



In [ ]:

    
print data.mean(axis=1).shape



In [ ]:

    
#next line is very important - otherwise, your notebook will hang forever!!!!
%matplotlib inline 
from matplotlib import pyplot
pyplot.imshow(data) 
pyplot.show() #create a heatmap of our data and show



In [ ]:

    
ave_inflammation = data.mean(axis=0) # average inflammation over time.  per day for all patients
pyplot.plot(ave_inflammation)  # create a line graph of these values
pyplot.show()



In [ ]:

    
import numpy as np #alias to reduce typing

from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')

plt.figure(figsize=(10.0, 3.0))

plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

Challenge

1. Moving plots around

Modify the program to display the three plots on top of one another instead of side by side.



In [ ]:

    
import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')

plt.figure(figsize=(3.0, 10.0))

plt.subplot(3, 1, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(3, 1, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(3, 1, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

2.What's inside the box?

Draw diagrams showing what variables refer to what values after each statement in the following program



In [ ]:

    
mass = 47.5
age=122
mass = mass *2.0
age=age-20
print mass, age

(a) mass, age

(b) 47.5 122

(d) 95.0
102

3. Sorting out references

What does the following program print out?



In [ ]:

    
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth

(a) causes an error

(b) Grace Hopper

(d) third, fourth

4.Make your own plot

Create a plot showing the standard deviation of the inflammation data for each day across all patients.



In [ ]:

    
import numpy as np
from matplotlib import pyplot as plt

data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.ylabel('std')
plt.plot(data.std(axis=0))
plt.show()

Lesson 2: Analyzing Multiple Data Sets



In [ ]:

    
ls *.csv

We have a dozen data sets to process and analyse. Of course, we can repeat the process over and over - but it will be frustrating to do so. If you want to become a good programmer, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born programming talents.) We can teach the computer how to repeat things and let it do the boring job.



In [ ]:

    
s='christchurch'
print s[0]
print s[1]
print s[2]
print s[3]
print s[4]
print s[5]
print s[6]
print s[7]
print s[8]
print s[9]
print s[10]
print s[11]

Only in New Zealand



In [ ]:

    
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'



In [ ]:

    
for c in s: #remember indentation
    print c



In [ ]:

    
length=0 #repeatedly updating this variable
for c in s:
    print c
    length=length+1
print "There are",length,'characters'



In [ ]:

    
len(s)

Challenge

1. for loop through a string

What does the following code print out?



In [ ]:

    
s1='Newton'
s2=''
for c in s1:
    s2=c+s2
print s2

(a) (empty)

(b) N+e+w+t+o+n

(d) notweN

2. Slicing strings

A section of an array is called a slice. We can take slices of character strings as well:



In [ ]:

    
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]

What is the value of element[:4]? What about element[4:]? Or element[:]?

What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does.



In [ ]:

    
print element[:4]
print element[4:]
print element[:]
print element[-1]
print element[-2]
print element[1:-1]

========================================================================================================================

List



In [ ]:

    
odds = [1,3,5,7]
print 'odds are:', odds



In [ ]:

    
print 'first and last:', odds[0], odds[-1] #last element

Somehow, a list is similar to a string



In [ ]:

    
for number in odds:
    print number



In [ ]:

    
names = ['Newton', 'Darwing','Turing']
print 'names is originally:',names
names[1]='Darwin' #we can update an element of list
print 'finval value of names:',names



In [ ]:

    
name = 'Bell'
name[0]='b' #we can't update a character in a string this way



In [ ]:

    
odds.append(11)
print 'odds after adding a value:',odds



In [ ]:

    
del odds[0]
print 'odds after removing the first element:',odds



In [ ]:

    
odds.reverse()
print 'odds after reversing:',odds



In [ ]:

    
odds.sort()
print 'odds aftger sorting:',odds

Challenge

Python has a built-in function called range that creates a list of numbers:

range(3) produces [0, 1, 2], range(2, 5) produces [2, 3, 4].

Using range, write a loop that uses range to print

(1). the first 10 natural numbers

(2). the total sum of first 10 natural numbers



In [ ]:

    
for i in range(1,11):
    print i



In [ ]:

    
sum = 0
for i in range(1,11):
    sum += i
print sum

========================================================================================================================

We covered almost everything we need to know to process multiple data sets - One more thing!

glob : To collect file names matching a pattern



In [ ]:

    
import glob
print glob.glob('*.csv') #collects files that match the pattern

Challenge

1. Draw multiple graphs

Write a code that collects the first 3 filenames from the list obtained by



In [ ]:

    
glob.glob('*.csv')

Convert the following code (copied from above) to process the first 3 inflammation files and draw 3 graphs (ie. mean, max, min) for each file using "for" loop



In [ ]:

    
%matplotlib inline
#don't forget the line above!
import glob
import numpy as np

from matplotlib import pyplot as plt

#do something here to take the first 3 files from the list.
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.figure(figsize=(10.0, 3.0))

plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))

plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))

plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))

plt.tight_layout()
plt.show()

Challenge

What is the output of the following program?



In [ ]:

    
v=3
p=1
for i in range(2):
    p = p*v
    
print p

(1) 3

(2) 6

(3) 9

(4) 27

Lesson 3: Making Choices



In [ ]:



In [ ]:

    
num=37
if num > 100:
    print 'greater'
else:
    print 'not greater'
print 'done'

We can add some intelligence to our program to make decisions



In [ ]:

    
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'
for c in s:
    if c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'



In [ ]:

    
s='msoffice2013'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'



In [ ]:

    
if (1>0) and (-1>0):
    print 'both parts are true'
else:
    print 'one part is not true'



In [ ]:

    
if (1<0) or (-1 < 0):
    print 'at least one test is true'
else:
    print 'failed all tests'

Challenge

Look at the output from the code below



In [ ]:

    
if '': print 'empty string is true'
if 'word': print 'non-empty string is true'
if []: print 'empty list is true'
if [1,2,3]: print 'non-empty list is true'
if 0: print 'zero is true'
if 1: print 'non-zero is true'

What do you think this code will output?



In [ ]:

    
a=[1,2]
del a[0]
a.append(0)
del a[0]
x= a[0]
if x : print "x is",x

(a) x is 1

(b) x is 0

(d) <- nothing

In-place operators

Python (and most other languages in the C family) provides in-place operators that work like this:



In [ ]:

    
x=1
x+=1
x*=3
print x

Write some code that sums the positive and negative numbers in a list separately, using in-place operators.



In [ ]:

    
l=[-3,-1,-2,1,2,3]
psum=0
nsum=0
for v in l:
    #do something here
    
print "total of positive numbers is",psum,"and total of negative numbers is",nsum

swap



In [ ]:

    
left=1
right=2
temp=left
left=right
right=temp
print left, right



In [ ]:

    
left=1
right=2
left,right = right, left
print left, right

Lesson 4: Creating Functions

Same operation for different inputs



In [ ]:

    
s='msoffice2013'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'



In [ ]:

    
s='windows7'
for c in s:
    if c in ['1','2','3','4','5','6','7','8','9','0']:
        print c,'is a digit'
    elif c in ['a','e','i','o','u']:
        print c,'is a vowel'
    else:
        print c, 'is a consonant'



In [ ]:

    
def classify_char(s):
    print s
    for c in s:
        if c in ['1','2','3','4','5','6','7','8','9','0']:
            print c,'is a digit'
        elif c in ['a','e','i','o','u']:
            print c,'is a vowel'
        else:
            print c, 'is a consonant'

classify_char('msoffice2013')
classify_char('windows7')

return

To send a result back to whoever asked for it



In [ ]:

    
def double(v):
    return 2*v

print double(2)
print double(3) #what happens if you do double([1,2,3])?



In [ ]:

    
def say_hello(name):
    return "Hello "+name+", how are you?"

print say_hello("Sung")
print say_hello("John")

Challenge

Combining strings

"Adding" two strings produces their concatention: 'a' + 'b' is 'ab'. Write a function called fence that takes two parameters called original and wrapper and returns a new string that has the wrapper character at the beginning and end of the original:



In [ ]:

    
def fence(a,b):
    #do something here
    return result
print fence('name','*') #expecting *name*

Selecting characters from strings

If the variable s refers to a string, then s[0] is the string's first character and s[-1] is its last. Write a function called outer that returns a string made up of just the first and last characters of its input:



In [ ]:

    
def outer(a):
    #do something here
print outer('helium') #expecting hm

Farenheit to Kelvin and to Celcius



In [ ]:

    
def fahr_to_kelvin(temp):
    return ((temp-32)*(5/9))+273.15



In [ ]:

    
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)

Something is wrong! Why?



In [ ]:

    
(212-32)*(5/9)



In [ ]:

    
5/9



In [ ]:

    
5/9.



In [ ]:

    
def fahr_to_kelvin(temp):
    return ((temp-32)*(5/9.))+273.15



In [ ]:

    
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)



In [ ]:

    
def kelvin_to_celcius(temp):
    return temp- 273.15



In [ ]:

    
print 'absolute zero in Celcius:', kelvin_to_celcius(0.0)



In [ ]:

    
def fahr_to_celcius(temp):
    temp_k = fahr_to_kelvin(temp)
    result = kelvin_to_celcius(temp_k)
    return result

print 'freezing point of water in Celcius', fahr_to_celcius(32.0)

Challenge

Copy the program above that generated 3 graphs for each inflammation file (using for loop). Create a function "analyze" that accepts one filename as a parameter, opens the file and stores the value, then displays 3 graphs for that file. (no return needed)



In [ ]:

    
def analyze(filename):
    #do something here

Test the function by



In [ ]:

    
analyze('inflammation-01.csv')

and you expect to see 3 graphs for inflammation-01.csv Then, complete the code below such that it will call "analyze" funtion inside the for loop, each time with a different filename



In [ ]:

    
import numpy as np
from matplotlib import pyplot as plt
import glob

filenames = glob.glob('*.csv')

filenames= filenames[0:3]
for f in filenames:
    print f
    #do something here

Lesson 5: Defensive Programming

In an ideal world, you write a program and it will work out of box. Sadly, it doesn't happen often in reality. How do we know if our program is working correctly? And how do we know if our program is still working correctly when we make a changes to it?

One strategy for writing a correct program starts with assumption that mistakes will happen and guard against them.

This is called "defensive programming" - this is analoguous to "defensive driving". We assume there will be a crazy drivers and give an extra care to defend ourselves.

The most common way to do it is to add "assertions" to our code so that it checks itself as it runs.



In [17]:

    
def avg_age(ages):
    sum = 0.0
    for v in ages:
        sum= sum+v
    print sum/len(ages)

avg_age([10,30,20])
avg_age([10,-30,20])

An assertion is simply a statement that something must be true at a certain point in a program. Python evaluates the assertion condition, if it is true, Python does nothing, but if it is false, Python halts the program immediately and prints the error message.

When you cross an intersection and drive defensively, you check left and right. If there is a crazy driver at the intersection, what do you do? Press the brake, make sure you are ok first, then roll down the window and say some nice words to the offender. :) Assertion works just like that. Terminate and error message.



In [18]:

    
def avg_age(ages):
    sum = 0.0
    for v in ages:
        assert v >= 0, "invalid age:"+str(v)
        sum= sum+v
    print sum/len(ages)

avg_age([10,30,20])
avg_age([10,-30,20])









    



20.0






    



---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-45ef73a187e8> in <module>()
      7 
      8 avg_age([10,30,20])
----> 9 avg_age([10,-30,20])

<ipython-input-18-45ef73a187e8> in avg_age(ages)
      2     sum = 0.0
      3     for v in ages:
----> 4         assert v >= 0, "invalid age:"+str(v)
      5         sum= sum+v
      6     print sum/len(ages)

AssertionError: invalid age:-30

Challenge

Expected output?



In [ ]:

    
def seq_check(s):
    for c in s:
        assert c in ['a','t','c','g'], "invalid input:"+str(c)
    print "all good:"+s
    
seq_check('atcg')
seq_check('bcgt')
seq_check('ctga')

(a)

all good:atcg
all good:bcgt
all good:ctga

(b)

(AssertionError)

(c)

all good:atcg
(AssertionError)

(d)

all good:atcg
(AssertionError)
all good:ctga

Lesson 6: Command-Line Programs

We used IPython notebook - this is a great tool for prototyping code and training. But if we want to do some more serious work in Python, we will need to learn how to write a standalone program in Python. For example, open Bash terminal and run the following command (in the same directory as all .csv files)



In [ ]:

    
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv
"""""""""""""""""""""

If you can run your python program in the terminal, you can combine with shell commands!



In [ ]:

    
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv |head -4
5.45
5.425
6.1
5.9
"""""""""""""""""""""

Now let's learn how to write a command-line python program. Using the text editor, open argv-list.py (Run the next command in Bash terminal)



In [ ]:

    
#argv-list.py 
import sys
print 'sys.argv is', sys.argv



In [ ]:

    
"""""""""""""""""""""
$ python argv-list.py 

sys.argv is ['argv-list.py']
"""""""""""""""""""""



In [ ]:

    
"""""""""""""""""""""
$ python argv-list.py first second third

sys.argv is ['argv-list.py', 'first', 'second', 'third']
"""""""""""""""""""""

sys.argv[0] is the program file name (ALWAYS!), sys.argv[1...] are all the arguments you call with the program.

Open readings-01.py from the text editor and examine the code.



In [ ]:

    
#readings-01.py 
import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1): #mean in the same row
        print m

and let's play with the data file small-01..-03.csv.



In [ ]:

    
"""""""""""""""""""""
$ cat small-01.csv
0,0,1
0,1,2
$ cat small-02.csv
9,17,15
20,8,5
$ cat small-03.csv
0,2,0
1,1,0
"""""""""""""""""""""

Let's run our program with one of this file as an input



In [ ]:

    
"""""""""""""""""""""
$python readings-01.py small-01.csv
"""""""""""""""""""""

No output - because the main() function was defined, but not called.



In [ ]:

    
#readings-02.py 
import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1):
        print m

main() # <==== Call the function to do some action!!



In [ ]:

    
"""""""""""""""""""""
$ python readings-02.py small-01.csv
0.333333333333
1.0
$ python readings-02.py small-02.csv
13.6666666667
11.0
$ python readings-02.py small-03.csv
0.666666666667
0.666666666667
"""""""""""""""""""""

Challenge

Write a program that behaves like this



In [ ]:

    
"""""""""""""""""""""
$ python readings.py small-01.csv small-02.csv small-03.csv
0.333333333333
1.0
13.6666666667
11.0
0.666666666667
0.666666666667
"""""""""""""""""""""

Note

For more advanced interaction with a command-line Python program, consider using argparse library. (https://docs.python.org/2/howto/argparse.html)

Wrapping up

Find a Python program "show_graphs.py" and examine the code.



In [ ]:

    
import sys
import numpy as np
from matplotlib import pyplot as plt
import glob

def display(files):
  plt.figure(figsize=(10.0, 3.0))
  for f in files:
    data = np.loadtxt(fname=f, delimiter=',')
    plt.subplot(1, 3, 1)
    plt.ylabel('average')
    plt.plot(data.mean(axis=0))

    plt.subplot(1, 3, 2)
    plt.ylabel('max')
    plt.plot(data.max(axis=0))

    plt.subplot(1, 3, 3)
    plt.ylabel('min')
    plt.plot(data.min(axis=0))

  plt.tight_layout()
  plt.show()


files = sys.argv[1:]
display(files)

Run the program with the following command. Examine the output



In [ ]:

    
"""""""""""""""""""""
$ python show_graphs.py inflammation-01.csv inflammation-02.csv inflammation-03.csv inflammation-04.csv 
"""""""""""""""""""""