Main goal: Learning Programming principles. If you master basic principles of programming, your research workflow can be greatly simplified. Learning Python language is NOT the main goal.
In [ ]:
git clone https://github.com/sungeunbae/python-files.git
In [ ]:
cd python-files
In [ ]:
!ls
Enter "ipython notebook" to get started
In [ ]:
import numpy
We import other modules to utilize other people's work. numpy enables to do fancy things with numbers and matrices
In [ ]:
numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
We use "." to ask Python to run the function loadtxt that belongs to the numpy library. It is used everywhere in Python to refer to the parts of things as thing.component
In [ ]:
weight_kg = 55
Assigning 55 to a variable weight_kg
In [ ]:
print weight_kg
In [ ]:
print 'weight in pounds:', 2.2 * weight_kg
In [ ]:
weight_kg = 57
In [ ]:
weight_lb = 2.2*weight_kg
In [ ]:
print 'weight in kilograms:',weight_kg, ' and in pounds:', weight_lb
Draw diagrams showing what variables refer to what values after each statement in the following program:
In [ ]:
weight = 70.5
age = 35
# Take a trip to the planet Neptune
weight = weight * 1.14
age = age + 20
What does the following program print out?
In [ ]:
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth
In [ ]:
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
In [ ]:
print data
In [ ]:
print type(data)
In [ ]:
print data.shape
In [ ]:
data[0,0] #top-left corner
In [ ]:
data[1,1]
In [ ]:
data[1,2]
In [ ]:
print data[0:4, 0:10]
In [ ]:
print data[5:10, 0:10]
In [ ]:
data[:3,36:] #:x is used to indicate from 0 to x (excluding x). x: indicates from x to the end.
In [ ]:
small = data[:3,36:]
print 'small is'
print small
In [ ]:
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]
What is the value of element[:4]? What about element[4:]? Or element[:]? What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does
In [ ]:
doubledata = data*2.0
In [ ]:
print 'original:'
print data[:3, 36:]
print 'doubledata:'
print doubledata[:3, 36:]
In [ ]:
tripledata = doubledata+data
print tripledata[:3,36:]
In [ ]:
print doubledata+2 #Creates an array of the same shape filled with 2's
In [ ]:
print data.mean()
data.shape : attribute (noun)
data.mean() : method, an action you can do with/to "data".
In [ ]:
print 'maximum inflammation:', data.max()
print 'minimum inflammation:', data.min()
print 'standard deviation:',data.std()
How do I know? try help(data) or look at Python manual
In [ ]:
help(data) #gives full documents. sometimes too much!
In [ ]:
dir(data) #gives a list of functions and attributes. Less information
Things listed are either "functions" or "attributes". Don't worry about all ___XXX ___, there are system variables or methods, not really meant to be used for usual programming.
How do you know if it is a function or an attribute?
In [ ]:
type(data.size)
In [ ]:
type(data.shape)
In [ ]:
type(data.std)
Or try help(data.xxxx) for more details.
In [ ]:
help(data.std)
Looking at the output of dir(data), can you figure out the function that computes the total of all elements? Can you figure out how to use the function?
In [ ]:
patient_0 = data[0,:] #row 0 and every column, extract everything from row 0
print 'maximum inflammation for patient 0:', patient_0.max()
In [ ]:
print data.mean(axis=0) #average inflmmation per day for all patients
In [ ]:
print data.mean(axis=0).shape
In [ ]:
print data.mean(axis=1)
In [ ]:
print data.mean(axis=1).shape
In [ ]:
#next line is very important - otherwise, your notebook will hang forever!!!!
%matplotlib inline
from matplotlib import pyplot
pyplot.imshow(data)
pyplot.show() #create a heatmap of our data and show
In [ ]:
ave_inflammation = data.mean(axis=0) # average inflammation over time. per day for all patients
pyplot.plot(ave_inflammation) # create a line graph of these values
pyplot.show()
In [ ]:
import numpy as np #alias to reduce typing
from matplotlib import pyplot as plt
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.figure(figsize=(10.0, 3.0))
plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))
plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))
plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))
plt.tight_layout()
plt.show()
Modify the program to display the three plots on top of one another instead of side by side.
In [ ]:
import numpy as np
from matplotlib import pyplot as plt
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.figure(figsize=(3.0, 10.0))
plt.subplot(3, 1, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))
plt.subplot(3, 1, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))
plt.subplot(3, 1, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))
plt.tight_layout()
plt.show()
Draw diagrams showing what variables refer to what values after each statement in the following program
In [ ]:
mass = 47.5
age=122
mass = mass *2.0
age=age-20
print mass, age
(a) mass, age
(b) 47.5 122
(c) 95.0 102
(d) 95.0
102
What does the following program print out?
In [ ]:
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print third, fourth
(a) causes an error
(b) Grace Hopper
(c) Hopper Grace
(d) third, fourth
Create a plot showing the standard deviation of the inflammation data for each day across all patients.
In [ ]:
import numpy as np
from matplotlib import pyplot as plt
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.ylabel('std')
plt.plot(data.std(axis=0))
plt.show()
In [ ]:
ls *.csv
We have a dozen data sets to process and analyse. Of course, we can repeat the process over and over - but it will be frustrating to do so. If you want to become a good programmer, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born programming talents.) We can teach the computer how to repeat things and let it do the boring job.
In [ ]:
s='christchurch'
print s[0]
print s[1]
print s[2]
print s[3]
print s[4]
print s[5]
print s[6]
print s[7]
print s[8]
print s[9]
print s[10]
print s[11]
In [ ]:
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'
In [ ]:
for c in s: #remember indentation
print c
In [ ]:
length=0 #repeatedly updating this variable
for c in s:
print c
length=length+1
print "There are",length,'characters'
In [ ]:
len(s)
What does the following code print out?
In [ ]:
s1='Newton'
s2=''
for c in s1:
s2=c+s2
print s2
(a) (empty)
(b) N+e+w+t+o+n
(c) Newton
(d) notweN
A section of an array is called a slice. We can take slices of character strings as well:
In [ ]:
element = 'oxygen'
print 'first three characters:', element[0:3]
print 'last three characters:', element[3:6]
What is the value of element[:4]? What about element[4:]? Or element[:]?
What is element[-1]? What is element[-2]? Given those answers, explain what element[1:-1] does.
In [ ]:
print element[:4]
print element[4:]
print element[:]
print element[-1]
print element[-2]
print element[1:-1]
========================================================================================================================
In [ ]:
odds = [1,3,5,7]
print 'odds are:', odds
In [ ]:
print 'first and last:', odds[0], odds[-1] #last element
Somehow, a list is similar to a string
In [ ]:
for number in odds:
print number
In [ ]:
names = ['Newton', 'Darwing','Turing']
print 'names is originally:',names
names[1]='Darwin' #we can update an element of list
print 'finval value of names:',names
In [ ]:
name = 'Bell'
name[0]='b' #we can't update a character in a string this way
In [ ]:
odds.append(11)
print 'odds after adding a value:',odds
In [ ]:
del odds[0]
print 'odds after removing the first element:',odds
In [ ]:
odds.reverse()
print 'odds after reversing:',odds
In [ ]:
odds.sort()
print 'odds aftger sorting:',odds
Python has a built-in function called range that creates a list of numbers:
range(3) produces [0, 1, 2], range(2, 5) produces [2, 3, 4].
Using range, write a loop that uses range to print
(1). the first 10 natural numbers
(2). the total sum of first 10 natural numbers
In [ ]:
for i in range(1,11):
print i
In [ ]:
sum = 0
for i in range(1,11):
sum += i
print sum
========================================================================================================================
We covered almost everything we need to know to process multiple data sets - One more thing!
In [ ]:
import glob
print glob.glob('*.csv') #collects files that match the pattern
Write a code that collects the first 3 filenames from the list obtained by
In [ ]:
glob.glob('*.csv')
Convert the following code (copied from above) to process the first 3 inflammation files and draw 3 graphs (ie. mean, max, min) for each file using "for" loop
In [ ]:
%matplotlib inline
#don't forget the line above!
import glob
import numpy as np
from matplotlib import pyplot as plt
#do something here to take the first 3 files from the list.
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
plt.figure(figsize=(10.0, 3.0))
plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))
plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))
plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))
plt.tight_layout()
plt.show()
What is the output of the following program?
In [ ]:
v=3
p=1
for i in range(2):
p = p*v
print p
(1) 3
(2) 6
(3) 9
(4) 27
In [ ]:
In [ ]:
num=37
if num > 100:
print 'greater'
else:
print 'not greater'
print 'done'
We can add some intelligence to our program to make decisions
In [ ]:
s='taumatawhakatangihangakoauauotamateapokaiwhenuakitanatahu'
for c in s:
if c in ['a','e','i','o','u']:
print c,'is a vowel'
else:
print c, 'is a consonant'
In [ ]:
s='msoffice2013'
for c in s:
if c in ['1','2','3','4','5','6','7','8','9','0']:
print c,'is a digit'
elif c in ['a','e','i','o','u']:
print c,'is a vowel'
else:
print c, 'is a consonant'
In [ ]:
if (1>0) and (-1>0):
print 'both parts are true'
else:
print 'one part is not true'
In [ ]:
if (1<0) or (-1 < 0):
print 'at least one test is true'
else:
print 'failed all tests'
Look at the output from the code below
In [ ]:
if '': print 'empty string is true'
if 'word': print 'non-empty string is true'
if []: print 'empty list is true'
if [1,2,3]: print 'non-empty list is true'
if 0: print 'zero is true'
if 1: print 'non-zero is true'
What do you think this code will output?
In [ ]:
a=[1,2]
del a[0]
a.append(0)
del a[0]
x= a[0]
if x : print "x is",x
(a) x is 1
(b) x is 0
(c) x is 2
(d) <- nothing
Python (and most other languages in the C family) provides in-place operators that work like this:
In [ ]:
x=1
x+=1
x*=3
print x
Write some code that sums the positive and negative numbers in a list separately, using in-place operators.
In [ ]:
l=[-3,-1,-2,1,2,3]
psum=0
nsum=0
for v in l:
#do something here
print "total of positive numbers is",psum,"and total of negative numbers is",nsum
In [ ]:
left=1
right=2
temp=left
left=right
right=temp
print left, right
In [ ]:
left=1
right=2
left,right = right, left
print left, right
Same operation for different inputs
In [ ]:
s='msoffice2013'
for c in s:
if c in ['1','2','3','4','5','6','7','8','9','0']:
print c,'is a digit'
elif c in ['a','e','i','o','u']:
print c,'is a vowel'
else:
print c, 'is a consonant'
In [ ]:
s='windows7'
for c in s:
if c in ['1','2','3','4','5','6','7','8','9','0']:
print c,'is a digit'
elif c in ['a','e','i','o','u']:
print c,'is a vowel'
else:
print c, 'is a consonant'
In [ ]:
def classify_char(s):
print s
for c in s:
if c in ['1','2','3','4','5','6','7','8','9','0']:
print c,'is a digit'
elif c in ['a','e','i','o','u']:
print c,'is a vowel'
else:
print c, 'is a consonant'
classify_char('msoffice2013')
classify_char('windows7')
To send a result back to whoever asked for it
In [ ]:
def double(v):
return 2*v
print double(2)
print double(3) #what happens if you do double([1,2,3])?
In [ ]:
def say_hello(name):
return "Hello "+name+", how are you?"
print say_hello("Sung")
print say_hello("John")
"Adding" two strings produces their concatention: 'a' + 'b' is 'ab'. Write a function called fence that takes two parameters called original and wrapper and returns a new string that has the wrapper character at the beginning and end of the original:
In [ ]:
def fence(a,b):
#do something here
return result
print fence('name','*') #expecting *name*
If the variable s refers to a string, then s[0] is the string's first character and s[-1] is its last. Write a function called outer that returns a string made up of just the first and last characters of its input:
In [ ]:
def outer(a):
#do something here
print outer('helium') #expecting hm
In [ ]:
def fahr_to_kelvin(temp):
return ((temp-32)*(5/9))+273.15
In [ ]:
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)
Something is wrong! Why?
In [ ]:
(212-32)*(5/9)
In [ ]:
5/9
In [ ]:
5/9.
In [ ]:
def fahr_to_kelvin(temp):
return ((temp-32)*(5/9.))+273.15
In [ ]:
print 'freezing point of water:',fahr_to_kelvin(32)
print 'freezing point of water:',fahr_to_kelvin(212)
In [ ]:
def kelvin_to_celcius(temp):
return temp- 273.15
In [ ]:
print 'absolute zero in Celcius:', kelvin_to_celcius(0.0)
In [ ]:
def fahr_to_celcius(temp):
temp_k = fahr_to_kelvin(temp)
result = kelvin_to_celcius(temp_k)
return result
print 'freezing point of water in Celcius', fahr_to_celcius(32.0)
Copy the program above that generated 3 graphs for each inflammation file (using for loop). Create a function "analyze" that accepts one filename as a parameter, opens the file and stores the value, then displays 3 graphs for that file. (no return needed)
In [ ]:
def analyze(filename):
#do something here
Test the function by
In [ ]:
analyze('inflammation-01.csv')
and you expect to see 3 graphs for inflammation-01.csv Then, complete the code below such that it will call "analyze" funtion inside the for loop, each time with a different filename
In [ ]:
import numpy as np
from matplotlib import pyplot as plt
import glob
filenames = glob.glob('*.csv')
filenames= filenames[0:3]
for f in filenames:
print f
#do something here
In an ideal world, you write a program and it will work out of box. Sadly, it doesn't happen often in reality. How do we know if our program is working correctly? And how do we know if our program is still working correctly when we make a changes to it?
One strategy for writing a correct program starts with assumption that mistakes will happen and guard against them.
This is called "defensive programming" - this is analoguous to "defensive driving". We assume there will be a crazy drivers and give an extra care to defend ourselves.
The most common way to do it is to add "assertions" to our code so that it checks itself as it runs.
In [17]:
def avg_age(ages):
sum = 0.0
for v in ages:
sum= sum+v
print sum/len(ages)
avg_age([10,30,20])
avg_age([10,-30,20])
An assertion is simply a statement that something must be true at a certain point in a program. Python evaluates the assertion condition, if it is true, Python does nothing, but if it is false, Python halts the program immediately and prints the error message.
When you cross an intersection and drive defensively, you check left and right. If there is a crazy driver at the intersection, what do you do? Press the brake, make sure you are ok first, then roll down the window and say some nice words to the offender. :) Assertion works just like that. Terminate and error message.
In [18]:
def avg_age(ages):
sum = 0.0
for v in ages:
assert v >= 0, "invalid age:"+str(v)
sum= sum+v
print sum/len(ages)
avg_age([10,30,20])
avg_age([10,-30,20])
Expected output?
In [ ]:
def seq_check(s):
for c in s:
assert c in ['a','t','c','g'], "invalid input:"+str(c)
print "all good:"+s
seq_check('atcg')
seq_check('bcgt')
seq_check('ctga')
(a)
all good:atcg
all good:bcgt
all good:ctga
(b)
(AssertionError)
(c)
all good:atcg
(AssertionError)
(d)
all good:atcg
(AssertionError)
all good:ctga
We used IPython notebook - this is a great tool for prototyping code and training. But if we want to do some more serious work in Python, we will need to learn how to write a standalone program in Python. For example, open Bash terminal and run the following command (in the same directory as all .csv files)
In [ ]:
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv
"""""""""""""""""""""
If you can run your python program in the terminal, you can combine with shell commands!
In [ ]:
"""""""""""""""""""""
$ python readings-02.py inflammation-01.csv |head -4
5.45
5.425
6.1
5.9
"""""""""""""""""""""
Now let's learn how to write a command-line python program. Using the text editor, open argv-list.py (Run the next command in Bash terminal)
In [ ]:
#argv-list.py
import sys
print 'sys.argv is', sys.argv
In [ ]:
"""""""""""""""""""""
$ python argv-list.py
sys.argv is ['argv-list.py']
"""""""""""""""""""""
In [ ]:
"""""""""""""""""""""
$ python argv-list.py first second third
sys.argv is ['argv-list.py', 'first', 'second', 'third']
"""""""""""""""""""""
sys.argv[0] is the program file name (ALWAYS!), sys.argv[1...] are all the arguments you call with the program.
Open readings-01.py from the text editor and examine the code.
In [ ]:
#readings-01.py
import sys
import numpy as np
def main():
script = sys.argv[0]
filename = sys.argv[1]
data = np.loadtxt(filename, delimiter=',')
for m in data.mean(axis=1): #mean in the same row
print m
and let's play with the data file small-01..-03.csv.
In [ ]:
"""""""""""""""""""""
$ cat small-01.csv
0,0,1
0,1,2
$ cat small-02.csv
9,17,15
20,8,5
$ cat small-03.csv
0,2,0
1,1,0
"""""""""""""""""""""
Let's run our program with one of this file as an input
In [ ]:
"""""""""""""""""""""
$python readings-01.py small-01.csv
"""""""""""""""""""""
No output - because the main() function was defined, but not called.
In [ ]:
#readings-02.py
import sys
import numpy as np
def main():
script = sys.argv[0]
filename = sys.argv[1]
data = np.loadtxt(filename, delimiter=',')
for m in data.mean(axis=1):
print m
main() # <==== Call the function to do some action!!
In [ ]:
"""""""""""""""""""""
$ python readings-02.py small-01.csv
0.333333333333
1.0
$ python readings-02.py small-02.csv
13.6666666667
11.0
$ python readings-02.py small-03.csv
0.666666666667
0.666666666667
"""""""""""""""""""""
Write a program that behaves like this
In [ ]:
"""""""""""""""""""""
$ python readings.py small-01.csv small-02.csv small-03.csv
0.333333333333
1.0
13.6666666667
11.0
0.666666666667
0.666666666667
"""""""""""""""""""""
For more advanced interaction with a command-line Python program, consider using argparse library. (https://docs.python.org/2/howto/argparse.html)
Find a Python program "show_graphs.py" and examine the code.
In [ ]:
import sys
import numpy as np
from matplotlib import pyplot as plt
import glob
def display(files):
plt.figure(figsize=(10.0, 3.0))
for f in files:
data = np.loadtxt(fname=f, delimiter=',')
plt.subplot(1, 3, 1)
plt.ylabel('average')
plt.plot(data.mean(axis=0))
plt.subplot(1, 3, 2)
plt.ylabel('max')
plt.plot(data.max(axis=0))
plt.subplot(1, 3, 3)
plt.ylabel('min')
plt.plot(data.min(axis=0))
plt.tight_layout()
plt.show()
files = sys.argv[1:]
display(files)
Run the program with the following command. Examine the output
In [ ]:
"""""""""""""""""""""
$ python show_graphs.py inflammation-01.csv inflammation-02.csv inflammation-03.csv inflammation-04.csv
"""""""""""""""""""""