Python: Loops, File I/O and Functions

In the previous lesson we learned the very basics of python: variable types, how to work with arrays, and how to use python modules and their built-in fuctions. In this lesson we are going to take all this one step further and learn about control flow using loops, how to read and write files, and how to write our own functions in python scripts.

Loops and Conditional Statements

There are a few basic types of control flow statements that we will learn. These are as follows:

  • for: executes indentation block FOR items (sequentially) in the list/array.
  • while: executs indentation block WHILE the statement is true.
  • if: executes the indentation block IF the statement is true. statement must be something that can evaluate to true or false.

Let's look at some very basic examples. Notice that identation (tabbing) matters in python! Blocks of text following a for, while or if statement must be indented to run properly.

If Statements and While Loops


In [ ]:
x = 5. # assign value of 5 to variable x 
if x < 10: # the statement we are testing. IF this statement is true...
    print(x) # this block of text will execute. the next line after if, while, or for should always be indented.

In [ ]:
if x == 5.: # recall the "==" is used to TEST whether two values are equal to one another 
    print("True")
else: # if, elif, else statements can be used to test multiple cases
    print("False")

In [ ]:
y = 5. # assign value of 5 to variable y 
while y < 10: # this is the statement we are testing. WHILE this statement is true...
    print(y) # print the value of y 
    y += 1 #then add one to y. so this will only execute WHILE x < 10, as we can see from the output

Iteration with For Loops


In [ ]:
# here we iterate over the items in a list 
z = range(5) # make a list of integers using function range
for i in z: # for each value "i" in my array z...
    print(i) # print that value i 
    
# this will take the value i and set it equal to the first value in the list. then will will
# execute the indentation block and set i equal to the NEXT value in the list, and so on.

In [ ]:
# we could just as easily do the following, which is equivalent to the above:
for i in range(5):
    print(i)

In [ ]:
# we can also combine range() and len() to iterate over the indices of a list or array 
solar_system = ['sun', 'moon', 'earth', 'pluto']
for i in range(len(solar_system)):
    print(i, solar_system[i])

In [ ]:
# what is this for loop doing? 
for i in solar_system:
    print(i, len(i))

In [ ]:
# perhaps a more useful example: what if i want to make everything in my list a float? 
magnitudes = ['-1.46', '-0.04', '0.12', '0.5', '1.25']
for i in range(len(magnitudes)):
    magnitudes[i] = float(magnitudes[i])
print(magnitudes)

List Comprehension in Python

List comprehensions are a fancy way to make lists "on-the-fly" in python. We can look at an example of how to simplify the above statement. Let's translate the for loop into a list comprehension:


In [ ]:
magnitudes_lc = [float(i) for i in magnitudes] # for each value i in magnitudes, make that value i into a float. 
print(magnitudes_lc)

In [ ]:
# Try it yourself. Turn the following for loop into a list comprehension:
empty_list = [] # create an empty list 
for num in range(50):
    empty_list.append(num)

In [ ]:
# Hint: You do not actually need to do append! A list comprehension will build up the list without using that function.

Combining Control Flow Loops


In [ ]:
num_array = range(20)
for i in num_array:
    if i < 1: # for multiple if statements use if, elif, else. 
        print("Zero")
    elif (i <= 1) & (x > 0): # you can combine comparison operators! 
        print("One")
    elif i == 2:
        print("Two")
    elif (i > 2) & (i <= 3):
        pass # pass does nothing 
    elif i > 3:
        break # break quits the loop

In [ ]:
# use a for loop to pick out even numbers! 
for num in range(2,10):
    if num % 2 == 0: # note the "==" this is for checking whether two values are equal, i.e. it is doing "?="
        print("Even number {}".format(num))
        continue # this tells the loop to continue, although it is not *required* that you use this
    elif num % 2 != 0:
        print("Odd number {}".format(num))
        break # try commenting out the break to see what happens!

In [ ]:
# try it for yourself! write a loop that takes the following list and prints only the 
# words that are longer than four letters. 
# helpful hint: you will need a for loop and an if statement. 
solar_system_objects = ['sun', 'moon', 'earth', 'mars', 'pluto', 'neptune', 'venus', 'mercury']

File I/O: Reading Files

Since we are already familiar with the built-in module "numpy" in python, let's use some of its functions to read and write data files. There are many other ways to read and write files in python; this is just a little something to get us started.


In [ ]:
import numpy as np
# numpy has a function called "loadtxt" which will read in text files. you just need to give it the path to the file!
# note that this will actually give you an error. why? because loadtxt doesn't like that there are different 
# variable types in different columns, i.e. a mix of strings and floats; it wants to assume everything is float.
# there are a couple of ways around this
stars_table = np.loadtxt('data/BrightStars.dat')

In [ ]:
# one option is to simply skip the first column, the star names, since these are the string values that are throwing
# loadtxt off. for this we use the loadtxt option "usecols" where we can feed it a list of column numbers we want to
# use. remember that python is zero indexing, so to have it skip the first column we omit the number zero. 
# just like with the bash "sort" command loadtxt sees spaces as separating one column from a new column.
stars_table = np.loadtxt('data/BrightStars.dat', usecols=[1,2,3,4,5,6,7,8,9,10,11])

# the output of this is an array that contains 11 columns and 5016 lines.
type(stars_table), stars_table.shape

In [ ]:
# another option is to explicitly tell it all the column types 
# we will do this with dictionaries and lists! we make the "dtype" argument of 
# loadtxt into a dictionary which has the keys "names" and "formats," each of which
# have tuples that make up their values.
stars_table = np.loadtxt('data/BrightStars.dat', dtype={'names': ('starname', 'ra_hr', 'ra_min', 'ra_s', 'dec_deg', 
                                                                  'dec_arcmin', 'deg_arcsec', 'Vmag', 'B-V', 
                                                                  'Parallax', 'uRA', 'uDec'),
                                                        'formats': ('|S15', np.float, np.float, np.float, np.float, 
                                                                    np.float, np.float, np.float,np.float,np.float, 
                                                                    np.float,np.float)})

In [ ]:
# the neat thing about doing this (although it is tedious) is that we get all our columns, and now we can refer
# to them by name! 
stars_table['starname'], stars_table['Vmag']

In [ ]:
# let's look at one last simple option. numpy has another built-in function called "genfromtxt," which can more 
# easily handle different data types. If I set the "dtype" argument here to be "None" it will try to guess 
# all the types of the columns. This is pretty powerful! 
stars_table = np.genfromtxt('data/BrightStars.dat', dtype=None)

In [ ]:
# try doing np.genfromtxt? to see all the other arguments that the function genfromtxt can take 
# find the argument that will help us define column names, and rerun the above command using it. 
stars_table = np.genfromtxt('data/BrightStars.dat', dtype=None, names=('starname', 'ra_hr', 'ra_min', 'ra_s', 'dec_deg', 
                                                                  'dec_arcmin', 'deg_arcsec', 'Vmag', 'B-V', 
                                                                  'Parallax', 'uRA', 'uDec'))

In [ ]:
stars_table['Parallax'] / 1000

In [ ]:
# now we can work with the contents of this file! let's say we want to find the brightest star.
brightest_star_mag = stars_table['Vmag'].min()
print(brightest_star_mag)

In [ ]:
# how would you find all the stars that have Vmag < 3? we did this in bash with a series of commands, but 
# np.where makes it much easier! 
Vlessthan_three = np.where(stars_table['Vmag'] < 3.) # recall that where only gives you the INDICES 
len(stars_table[Vlessthan_three]) # we can easily find that it is 172 lines, just like we found on our bash homework

In [ ]:
# how would you find the parallax of the
# most distant star (smallest parallax) with Vmag < 3? hint: half of the work is already done in the previous cell!

In [ ]:
# to get the name of this star we can do the following:
stars_table[np.where((stars_table['Vmag'] < 3) & (stars_table['Parallax'] == 1.01))]['starname']

In [ ]:
# if we want to do something like in our first assignment and make a list of the five brightest stars' magnitudes
# it is much easier now that we've read in the data! 
sort_index = np.argsort(stars_table['Vmag']) # this is a built in function of numpy that returns the INDICES that sort an array
five_brightest_mag = stars_table['Vmag'][sort_index[:5]]
five_brightest_mag

File I/O: Writing Files

Now that we have seen how to read in files, let's look at writing output to a different file; this will likely be how you are recording output from programs or simulations that you run. If you write data output to a file you can do data analysis on it later!


In [ ]:
# the function that we will be using to write output to a file is numpy's "savetxt"
# savetxt takes as arguments the filename (in quotes) and then the data (in array form) that you write to that filename
# the "fmt" argument is optional but specifying it allows us to choose the specific data type
# in this case we are choosing float values with two decimal places. try it without this to see what you get!
np.savetxt('five_brightest_table.dat', five_brightest_mag, fmt='%.2f')

In [ ]:
%%bash 
cat five_brightest_table.dat

In [ ]:
# here we have only written one line, but what if we want to write a multi-column file? 
# let's get the parallax values of the five brightest stars. remember that argsort gave us INDICES. let's reuse them. 
five_brightest_dist = stars_table['Parallax'][sort_index[:5]]
five_brightest_dist

In [ ]:
# if we want to save both these arrays, we need to feed both of them to "savetxt" in the following way. 
# when we look at the output of this we'll notice that savetxt defaults to saving arrays as ROWS.
np.savetxt('five_brightest_table.dat', (five_brightest_mag,five_brightest_dist), fmt='%.2f')

In [ ]:
%%bash 
cat five_brightest_table.dat

In [ ]:
# if we want each array to be a column instead we should use numpy's built-in transpose function
# let's look at what transpose does 
print(five_brightest_mag, five_brightest_dist)
print np.transpose([five_brightest_mag,five_brightest_dist])

In [ ]:
# to get a two-column data table we do the following: 
np.savetxt('five_brightest_table.dat', np.transpose([five_brightest_mag, five_brightest_dist]), fmt=['%.2f', '%.2f'])

In [ ]:
%%bash 
cat five_brightest_table.dat

There is one wrinkle to using things like np.genfromtxt, np.savetxt, and np.loadtxt: these functions have a hard time dealing with mixed variable types. As you saw, when we read in the files we had to tell it the kind of data types we wanted for each column. It is likewise not simple to write columns of different data types out to a file using these functions. For this reason we'll import a new module astropy and the submodules as functions associated with it.


In [ ]:
from astropy.io import ascii 
# we use the read function with the path to the file name. notice how we don't define data types
stars_table_astropy = ascii.read('data/BrightStars.dat') 
print(stars_table_astropy)

In [ ]:
print(stars_table_astropy['col1']) # we don't have names for our columns so astropy's read just names them in this way
print(type(stars_table_astropy['col1'][0])) # it knows that elements of the first column are strings!

In [ ]:
# now let's see how easy it is to write a table using this module 
ascii.write([stars_table_astropy['col1'], stars_table_astropy['col9']],'five_brightest_table_astropy.dat')

In [ ]:
%%bash 
cat five_brightest_table_astropy.dat | head -5

In [ ]:
# that is a rather simple example of how powerful ascii.write() can be. let's see what else it can do 
# here i can pass it a LIST of column names, and a DICTIONARY of data types for those columns 
ascii.write([stars_table_astropy['col1'], stars_table_astropy['col9']], 'five_brightest_table_astropy.dat', 
            names=['StarName', 'Vmag'], formats={'StarName': '%s', 'Vmag': '%.2f'})

In [ ]:
%%bash 
cat five_brightest_table_astropy.dat | head -5

Writing Functions and Modules in Python

You have already seen how to use built-in modules (e.g. numpy) in python and the functions that accompany them. Now we will learn how to write our own functions, and in addition our own modules. The modules that we write will be in the form of ".py" files.

Functions

Functions in python are of the following form:

def function_name(arg1, arg2,..., kw1=val1, kw2=val2, ...)

Where arg1 and arg2 are "arguments" and are required, and kw1 and kw2 are called "keyword arguments" and are optional. We will ignore these for now. The names of python functions can be any combination of letters, numbers and underscores as long as they don't start with a number and as long as they are not already the name of a built-in keyword (i.e. print). Let's look at a very simple example of a function:


In [ ]:
def add_four(x): # on this first line we name our function "add" and designate that it has one required argument, x 
    return x + 4 # for a function to output values we use this "return" statement

In [ ]:
# now if i call my function with a value, i should get the output x + 4 
add_four(4)

In [ ]:
# let's try an example with two arguments 
def multiply(x,y):
    return x*y

In [ ]:
# in this case we need to feed multiply two arguments 
multiply(2,3)

An important note: above we have passed integer value arguments to our functions. Simple variable types like int, float, and string are passed by VALUE to a function, while compound types like lists, dicts, tuples, etc are passed by REFERENCE. Let's see a concrete example:


In [ ]:
z = 5
print add_four(z) # we can pass z by reference to our add function 
print(z) # but it's value won't be changed globally

In [ ]:
def append_element(x):
    return x.append(4)

In [ ]:
list = [1,2,3]
append_element(list) # pass the list by reference to our append_element function
print(list) # the values in the list are changed globally

Doc Strings

When writing functions it is important to use documentation strings so that you and others know how the function works. Let's quickly look at a simple example of the convention for doing this:


In [ ]:
def multiply(x,y):
    """
    Returns the product of x and y 
    
    Parameters
    ----------
    x, y: int
        the values for which the product is calculated 
        
    Returns
    -------
    result: int
        the product of x and y 
    """

In [ ]:
# now if we do multiply? or help(multiply) we get info from the doc string!
multiply?
help(multiply)

Modules

Modules, as we discussed in the first lecture are organized units of code. They can contain functions, classes (something which we will not cover in this course) and other defined variables. Python treats ".py" files as modules. We can define multiple functions within particular modules and organize our code in this way. Let's write a very simple module. I will show how to do this in the notebook, but when writing your own modules you should use a text editor like emacs, vim, gedit, etc.


In [ ]:
%%file add_subtract.py 
# simple addition and subtraction module 

def add_nums(x,y):
    return x + y

def subtract_nums(x,y):
    return x - y

In [ ]:
# If you recall from the first lecture, we can now import this module and use the 
# functions that we have defined within it. 
import add_subtract 
print add_subtract.add_nums(2,3)
print add_subtract.subtract_nums(5,3)

In [ ]:
# another way to import this module is as follows:
from add_subtract import add_nums
add_nums(2,3) # and now we can recall it directly

This kind of syntax should look familiar to you! Whenever you do something like

import numpy as np 
    np.savetxt(blahblahblah)

you are calling a module (albeit one that is already written) and using the functions that are within that module. You have been doing this all along so far with built-in modules! The structure is exactly the same now that you are writing your own. Modules have associated functions, and these funtions take specific arguments.

import module
    module.function(argument)

Let's look at one last more complicated example. In this case I have written a module below that I will then run (as opposed to importing).


In [ ]:
%%file calculate_distance.py 
import numpy as np
# module to read in data file, find distance in lyr from parallax, and write output to a file

stars_table = np.genfromtxt('data/BrightStars.dat', dtype=None, names=('starname', 'ra_hr', 'ra_min', 'ra_s', 'dec_deg', 
                                                                  'dec_arcmin', 'deg_arcsec', 'Vmag', 'B-V', 
                                                                  'Parallax', 'uRA', 'uDec'))

parallax_mas = stars_table['Parallax']
parallax_as = stars_table['Parallax'] / 1000.

def distance_in_lyr(x):
    """
    Finds distance in lyr given parallax in arcseconds
    
    Parameters
    ----------
    x: float
        parallax in arcseconds
    
    Returns
    -------
    result: float
        distance in lyr
    
    """
    return (1. / x)*3.26

if __name__ == "__main__":
    
    
    print("Running calculate_distance as main")
    all_distances_lyr = distance_in_lyr(parallax_as)
    np.savetxt('distance_parallax.dat', np.transpose([parallax_as, all_distances_lyr]), fmt=['%.2e', '%.2f'])

Now that I've written this module, we want to run it. In IPython you can simply do "run module.py" and the module will run. In this case, the calculate_distance module should write a file called "distance_parallax.dat," so with the bash command below I can check to see if a) this file exists and b) it contains the correct contents given my module. The last few lines of the file define a special variable called "name." This block of codes basically means that whatever comes after the if statement will ONLY run if this module is called directly from the command line (i.e. it is RUN and not simply imported). If you were to just do

import calculate_distance

the block of code would not execute. Instead you could do something like

import calculate_distance
    calculate_distance.distance_in_lyr(argument)

In [ ]:
%run calculate_distance.py

In [ ]:
%%bash 
cat distance_parallax.dat

As a first step for your homework, try writing a module called "ra_conversion.py" which reads in BrightStars.dat, contains a function that turns RA in hours, minutes and seconds to decimal degrees, and then writes that RA in decimal degrees (with four decimal) places to a file. Here's a basic outline for the function to get you started.

def convert_ra_to_degrees(hrs, mins, sec):
    """
    a simple function that takes RA in hours, minutes and seconds and returns RA in decimal degrees

    Parameters
    ----------
    hrs: 
    mins:
    sec:

    Returns:
    result: 
        RA in decimal degrees

    """
    ra_degrees = equation 
    return ra_degrees

In [ ]: