Procedural programming in python


  • Flow control, part 2
    • Functions
    • In class exercise:
      • Functionalize this!
    • From nothing to something:
      • Pairwise correlation between rows in a pandas dataframe
      • Sketch of the process
      • In class exercise:
        • Write the code!
      • Rejoining, sharing ideas, problems, thoughts

Flow control

Flow control figure

Flow control refers how to programs do loops, conditional execution, and order of functional operations.


If statements can be use to execute some lines or block of code if a particular condition is satisfied. E.g. Let's print something based on the entries in the list.

instructors = ['Dave', 'Jim', 'Dorkus the Clown']

if 'Dorkus the Clown' in instructors:

There is a special do nothing word: pass that skips over some arm of a conditional, e.g.

if 'Jim' in instructors:
    print("Congratulations!  Jim is teaching, your class won't stink!")


For loops are the standard loop, though while is also common. For has the general form:

for items in list:
    do stuff

For loops and collections like tuples, lists and dictionaries are natural friends.

for instructor in instructors:

You can combine loops and conditionals:

for instructor in instructors:
    if instructor.endswith('Clown'):
        print(instructor + " doesn't sound like a real instructor name!")
        print(instructor + " is so smart... all those gooey brains!")


Since for operates over lists, it is common to want to do something like:

NOTE: C-like
for (i = 0; i < 3; ++i) {

The Python equivalent is:

for i in [0, 1, 2]:
    do something with i

What happens when the range you want to sample is big, e.g.

NOTE: C-like
for (i = 0; i < 1000000000; ++i) {

That would be a real pain in the rear to have to write out the entire list from 1 to 1000000000.

Enter, the range() function. E.g. range(3) is [0, 1, 2]

In [1]:
sum = 0
for i in range(10):
    sum += i


Now, use your code from above for the following URLs and filenames

URL filename csv_filename HCEPDB_moldata_set1.csv HCEPDB_moldata_set2.csv HCEPDB_moldata_set3.csv

What pieces of the data structures and flow control that we talked about earlier can you use?

How did you solve this problem?


For loops let you repeat some code for every item in a list. Functions are similar in that they run the same lines of code for new values of some variable. They are different in that functions are not limited to looping over items.

Functions are a critical part of writing easy to read, reusable code.

Create a function like:

def function_name (parameters):
    function expressions
    return [variable]

Note: Sometimes I use the word argument in place of parameter.

Here is a simple example. It prints a string that was passed in and returns nothing.

In [20]:
def print_string(str):
    """This prints out a string passed as the parameter."""
    for c in str:
        if c == 'r':

In [21]:


To call the function, use:

print_string("Dave is awesome!")

Note: The function has to be defined before you can call it!

print_string("Dave is awesome!")

If you don't provide an argument or too many, you get an error.

TypeError                                 Traceback (most recent call last)
<ipython-input-22-ad26026057f7> in <module>()
----> 1 print_string()

TypeError: print_string() missing 1 required positional argument: 'str'

Parameters (or arguments) in Python are all passed by reference. This means that if you modify the parameters in the function, they are modified outside of the function.

See the following example:

def change_list(my_list):
   """This changes a passed list into this function"""
   print('list inside the function: ', my_list)

my_list = [1, 2, 3];
print('list before the function: ', my_list)
print('list after the function: ', my_list)

In [23]:
def change_list(my_list):
   """This changes a passed list into this function"""
   print('list inside the function: ', my_list)

my_list = [1, 2, 3];
print('list before the function: ', my_list)
print('list after the function: ', my_list)

list before the function:  [1, 2, 3]
list inside the function:  [1, 2, 3, 'four']
list after the function:  [1, 2, 3, 'four']

Variables have scope: global and local

In a function, new variables that you create are not saved when the function returns - these are local variables. Variables defined outside of the function can be accessed but not changed - these are global variables, Note there is a way to do this with the global keyword. Generally, the use of global variables is not encouraged, instead use parameters.

my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    my_global_2 = 'broke your global, man!'
    global my_global_3
    my_global_3 = 'still a better idea'


In [25]:
my_global_1 = 'bad idea'
my_global_2 = 'another bad one'
my_global_3 = 'better idea'

def my_function():
    my_global_2 = 'broke your global, man!'
    global my_global_3
    my_global_3 = 'still a better idea'


bad idea
broke your global, man!
another bad one
still a better idea

In general, you want to use parameters to provide data to a function and return a result with the return. E.g.

def sum(x, y):
    my_sum = x + y
    return my_sum

If you are going to return multiple objects, what data structure that we talked about can be used? Give and example below.

In [30]:
def a_function(parameter):
    return None

In [31]:
foo = a_function('bar')


Parameters have three different types:

type behavior
required positional, must be present or error, e.g. my_func(first_name, last_name)
keyword position independent, e.g. my_func(first_name, last_name) can be called my_func(first_name='Dave', last_name='Beck') or my_func(last_name='Beck', first_name='Dave')
default keyword params that default to a value if not provided

In [32]:
def print_name(first, last='the Clown'):
    print('Your name is %s %s' % (first, last))

Take a minute and play around with the above function. Which are required? Keyword? Default?

In [34]:
def massive_correlation_analysis(data, method='pearson'):

Functions can contain any code that you put anywhere else including:

  • if...elif...else
  • for...else
  • while
  • other function calls

In [39]:
def print_name_age(first, last, age):
    print_name(first, last)
    print('Your age is %d' % (age))
    print('Your age is ' + str(age))
    if age > 35:
        print('You are really old.')

In [40]:
print_name_age(age=40, last='Beck', first='Dave')

Your name is Dave Beck
Your age is 40
Your age is 40
You are really old.

Once you have some code that is functionalized and not going to change, you can move it to a file that ends in .py, check it into version control, import it into your notebook and use it!

Let's do this now for the above two functions.


See you after the break!

Import the function...

Call them!

Hacky Hack Time with Functions!

Notes from last class:

  • The os package has tools for checking if a file exists: os.path.exists
    import os
    filename = ''
    if os.path.exists(filename):
  • Use the requests package to get the file given a url (got this from the requests docs)
    import requests
    url = ''
    req = requests.get(url)
    assert req.status_code == 200 # if the download failed, this line will generate an error
    with open(filename, 'wb') as f:
  • Use the zipfile package to decompress the file while reading it into pandas
    import pandas as pd
    import zipfile
    csv_filename = 'HCEPDB_moldata.csv'
    zf = zipfile.ZipFile(filename)
    data = pd.read_csv(

Here was my solution

import os
import requests
import pandas as pd
import zipfile

filename = ''
url = ''
csv_filename = 'HCEPDB_moldata.csv'

if os.path.exists(filename):
    req = requests.get(url)
    assert req.status_code == 200 # if the download failed, this line will generate an error
    with open(filename, 'wb') as f:

zf = zipfile.ZipFile(filename)
data = pd.read_csv(

In class exercise

5-10 minutes

Objective: How would you functionalize the code for downloading, unzipping, and making a dataframe?

Bonus! Add the the code to a file and import it!

def download_if_not_exists(filename):
    if os.path.exists(filename):
        req = requests.get(url)
        assert req.status_code == 200 # if the download failed, this line will generate an error
        with open(filename, 'wb') as f:

How many functions did you use?

Why did you choose to use functions for these pieces?

From something to nothing

Task: Compute the pairwise Pearson correlation between rows in a dataframe.

Let's say we have three molecules (A, B, C) with three measurements each (v1, v2, v3). So for each molecule we have a vector of measurements:

$$X=\begin{bmatrix} X_{v_{1}} \\ X_{v_{2}} \\ X_{v_{3}} \\ \end{bmatrix} $$

Where X is a molecule and the components are the values for each of the measurements. These make up the rows in our matrix.

Often, we want to compare molecules to determine how similar or different they are. One measure is the Pearson correlation.

Pearson correlation:

Expressed graphically, when you plot the paired measurements for two samples (in this case molecules) against each other you can see positively correlated, no correlation, and negatively correlated. Eg.

Simple input dataframe (note when you are writing code it is always a good idea to have a simple test case where you can readily compute by hand or know the output):

index v1 v2 v3
A -1 0 1
B 1 0 -1
C .5 0 .5
  • If the above is a dataframe what shape and size is the output?
  • Whare are some unique features of the output?

For our test case, what will the output be?

A 1 -1 0
B -1 1 0
C 0 0 1

Let's sketch the idea...

In class exercise

20-30 minutes


  1. Write code using functions to compute the pairwise Pearson correlation between rows in a pandas dataframe. You will have to use for and possibly if.
  2. Use a cell to test each function with an input that yields an expected output. Think about the shape and values of the outputs.
  3. Put the code in a .py file in the directory with the Jupyter notebook, import and run!

In [1]:
import pandas as pd
import math

In [7]:
df = pd.read_csv('HCEPDB_moldata.csv')

/Users/Mandy/anaconda/lib/python3.6/site-packages/IPython/core/ DtypeWarning: Columns (0,3,4,5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

In [8]:

In [4]:
def typ(x,y):
    sol = x.mean() + y.mean()
    return sol

