Week 4 - Python

Today we will cover some basic python techniques and structures that are really useful for analyzing data

Today's Agenda

  • Basics of Python
  • List Comprehension
  • Dictionaries
  • Functions
  • Classes

Basics of Python

The minimal Python script

Unlike many other languages, a simple Python script does not require any sort of header information in the code. So, we can look at the standard programming example, Hello World, in Python (below). Here we're simply printing to screen. If we put that single line into a blank file (called, say, HelloWorld.py]) and then run that in the command line by typing 'python HelloWorld.py' the script should run with no problems. This also shows off the first Python function, print, which can be used to print strings or numbers.


In [ ]:
print "Hello World!"

There are, however, a few lines that you will usually see in a Python script. The first line often starts with #! and is called the shebang. For a Python script, an example of the shebang line would be "#!/usr/bin/env python"

Within Python, any line that starts with # is a comment, and won't be executed when running the script. The shebang, though, is there for the shell. If you run the script by calling python explicitly, then the script will be executed in Python. If, however, you want to make the script an executable (which can be run just by typing "./HelloWorld.py") then the shell won't know what language the script should be run in. This is the information included in the shebang line. You don't need it, in general, but it's a good habit to have in case you ever decide to run a script as an executable.

Another common thing at the starts of scripts is several lines that start with 'import'. These lines allow you to allow import individual functions or entire modules (files that contain multiple functions). These can be those you write yourself, or things like numpy, matplotlib, etc.

Python variables

Some languages require that every variable be defined by a variable type. For example, in C++, you have to define a variable type, first. For example a line like "int x" would define the variable x, and specify that it be an an integer. Python, however, using dynamic typing. That means that variable types are entirely defined by what the variable is stored.

In the below example, we can see a few things happening. First of all, we can see that x behaves initally as a number (specifically, an integer, which is why 42/4=10). However, we can put a string in there instead with no problems. However, we can't treat it as a number anymore and add to it.

Try commenting out the 5th line (print x+10) by adding a # to the front of that line, and we'll see that Python will still add strings to it.


In [ ]:
x=42
print x+10
print x/4
x="42"
print x+10
print x+"10"

Lists

The basic way for storing larger amounts of data in Python (and without using other modules like numpy) is Python's default option, lists. A list is, by its definition, one dimensional. If we'd like to store more dimensions, then we are using what are referred to as lists of lists. This is not the same thing as an array, which is what numpy will use. Let's take a look at what a list does.

We'll start off with a nice simple list below. Here the list stores integers. Printing it back, we get exactly what we expect. However, because it's being treated as a list, not an array, it gets a little bit weird when we try to do addition or multiplication. Feel free to try changing the operations that we're using and see what causes errors, and what causes unexpected results.


In [ ]:
x=[1, 2, 3]
y=[4,5, 6]
print x
print x*2
print x+y

We can also set up a quick list if we want to using the range function. If we use just a single number, then we'll get a list of integers from 0 to 1 less than the number we gave it.

If we want a bit fancier of a list, then we can also include the number to start at (first parameter) and the step size (last parameter). All three of these have to be integers.

If we need it, we can also set up blank lists.


In [ ]:
print range(10)
print range(20, 50, 3)
print []

If we want to, we can refer to subsets of the list. For just a single term, we can just use the number corresponding to that position. An important thing with Python is that the list index starts at 0, not at 1, starting from the first term. If we're more concerned about the last number in the list, then we can use negative numbers as the index. The last item in the list is -1, the item before that is -2, and so on.

We can also select a set of numbers by using a : to separate list indices. If you use this, and don't specify first or last index, it will presume you meant the start or end of the list, respectively.

After you try running the sample examples below, try to get the following results:

  • [6] (using two methods)
  • [3,4,5,6]
  • [0,1,2,3,4,5,6]
  • [7,8,9]

In [ ]:
x=range(10)
print x
print "First value", x[0]
print "Last value", x[-1]
print "Fourth to sixth values", x[3:5]

Modifying lists

The simplest change we can make to a list is to change it at a specific index just be redefining it, like in the second line in the code below.

There's three other handy ways to modify a list. append will add whatever we want as the next item in the list, but this means if we're adding more than a single value, it will add a list into our list, which may not always be what we want.

extend will expand the list to include the additional values, but only if it's a list, it won't work on a single integer (go ahead and try that).

Finally, insert will let us insert a value anywhere within the list. To do this, it requires a number for what spot in the list it should go, and also what we want to add into the list.


In [ ]:
x=[1,2,3,4,5]
x[2]=8
print x

print "Testing append"
x.append(6)
print x
x.append([7,8])
print x

print "testing extend"
x=[1,2,3,4,5]
#x.extend(6)
#print x
x.extend([7,8])
print x

print "testing insert"
x=[1,2,3,4,5]
x.insert(3, "in")
print x

Loops and List Comprehension

Like most languages, we can write loops in Python. One of the most standard loops is a for loop, so we'll focus on that one. Below is a 'standard' way of writing a 'for' loop. We'll do something simple, where all we want is to get the square of each number in the array.


In [ ]:
x=range(1,11,1)
print x
x_2=[]
for i in x:
    i_2=i*i
    x_2.append(i_2)
print x_2

While that loop works, even this pretty simple example can be condensed into something a bit shorter. We have to set up a blank list, and then after that, the loop itself was 3 lines, so just getting the squares of all these values took 4 lines. We can do it in one with list comprehension.

This is basically a different way of writing a for loop, and will return a list, so we don't have to set up an empty list for the results.


In [ ]:
x=range(1,11,1)
print x
x_2=[i*i for i in x]
print x_2

Dictionaries

Dictionaries are another way of storing a large amount of data in Python, except instead of being referenced by an ordered set of numbers like in a list, they are referenced by either strings/characters or numbers, referred to as keys.


In [ ]:
x={}
x['answer']=42
print x['answer']

These are particularly useful if you'll have a handful of values you'd like to call back to often. For an astronomy example, we can set up a dictionary that contains the absolute magnitude of the Sun in a bunch of bands (from Binney & Merrifield). We can now have a code that easily calls absolute magnitudes whenever needed using that dictionary.

We can also list out the dictionary, if needed, with AbMag.items(). There's some other tools for more advanced tricks with dictionaries, but this covers the basics.


In [ ]:
AbMag={'U':5.61, 'B':5.48, 'V':4.83, 'R':4.42, 'I':4.08}
print AbMag['U']
print AbMag.items()

Functions

At a certain point you'll be writing the same bits of code over and over again. That means that if you want to update it, you'll have to update it in every single spot you did the same thing. This is.... less than optimal use of time, and it also means it's really easy to screw up by forgetting to keep one spot the same as the rest.

We can try out a function by writing a crude function for the sum of a geometric series. $$\frac{1}{r} + \frac{1}{r^2} + \frac{1}{r^3} + \frac{1}{r^4} + \ldots $$

Conveniently, so long as r is larger than 1, there's a known solution to this series. We can use that to see that this function works. $$ \frac{1}{r-1} $$

This means we can call the function repeatedly and not need to change anything. In this case, you can try using this GeoSum function for several different numbers (remember, r>1), and see how closely this works, by just changing TermValue


In [ ]:
def GeoSum(r):
    powers=range(1,11,1) #set up a list for the exponents 1 to 10
    terms=[(1./(r**x)) for x in powers] #calculate each term in the series
    return sum(terms) #return the sum of the list

TermValue=2
print GeoSum(TermValue), (1.)/(TermValue-1)

Classes

To steal a good line for this, "Classes can be thought of as blueprints for creating objects."

With a class, we can create an object with a whole set of properties that we can access. This can be very useful when you want to deal with many objects with the same set of parameters, rather than trying to keep track of related variables over multiple lists, or even just having a single object's properties all stored in some hard to manage list or dictionary.

Here we'll just use a class that's set up to do some basic math. Note that the class consists of several smaller functions inside of it. The first function, called init, is going to be run as soon as we create an object belonging to this class, and so that'll create two attributes to that object, value and square. The other function, powerraise, only gets called if we call it. Try adding some other subfunctions in there to try this out. They don't need to have anything new passed to them to be run.


In [ ]:
class SampleClass:
    def __init__(self, value): #run on initial setup of the class, provide a value
       self.value = value
       self.square = value**2
    
    def powerraise(self, powerval): #only run when we call it, provide powerval
        self.powerval=powerval
        self.raisedpower=self.value**powerval

MyNum=SampleClass(3)
print MyNum.value
print MyNum.square
MyNum.powerraise(4)
print MyNum.powerval
print MyNum.raisedpower
print MyNum.value,'^',MyNum.powerval,'=',MyNum.raisedpower

Next week, the first modules!