Often when we think of scientists conducting an experiment, we think of laboratories filled with beakers and whirring machines. However, especially in physics, these laboratories are often replaced by computers, whether they are simple desktop machines or some of the world's largest supercomputing clusters.
Scientists' reliance on computing and software skills increases daily, but often a student in the physical sciences may only take one (or sometimes zero!) programming courses. As a result, these programming skills are often self-taught and can sometimes result in poorly-built and/or incorrect software. One way to remedy this would be to insist that scientists get an extra degree in computer science or software engineering. This however is unfeasible for a variety of reasonables.
One alternative is to give students exposure to software development at an earlier point in their education. This gives up-and-coming scientists a chance to hone their coding skills for a longer period of time and will help to avoid many of the pitfalls that scientific codes have recently been subject to.
This textbook seeks to give an introduction to software development through a series of motivating physical examples using the Python language. Specifically, we will be using the IPython notebook which will allow us to mix code with equations and explanations. Our first chapter will provide a (very!) brief introduction to the Python language and all of the other additional tools we need to work through the following physical examples.
There are hundreds of different programming languages out there, but many are not suited for scientific computing. Often we also must mix and match several different languages depending on the task at hand. In the last 10 years or so, Python has emerged as a very versatile and easy to use language. Its applications range from managing supercomputers to critical financial calculations to detailed physics simulations.
Because of its widespread use, being a skilled Python programmer is useful not only in an academic setting, but in a commercial one as well. Looking at the Fig. 1, we can see that being able to program in Python is a very versatile skill. Learning Python won't make you automatically rich of course; we only include this figure to show that the skills learned here are not limited to scientific computing.
Historically, physicists have preferred less-than-readable programming languages like C,C++, and Fortran. This is mostly due to the fact that, while verbose and difficult to use, these languages are extremely fast. This is extremely important in instances where we need to perform many hundreds of millions of operations in a reasonable amount of time.
These languages use what is known as static type-checking. That is, when we declare a variable or a function, we must tell the computer exactly how much memory we need. This ensures that the computer uses no more memory than it needs, allowing our simulation to run much quicker, but forcing us to write many more lines of code. Fig. 2 shows the speed of Python relative to Fortran and C, as well as a few other popular languages; Python falls in between much faster languages like Fortran and much slower languages like Matlab and R.
Python uses what is known as dynamic type-checking. This means that we don't need to specify the type when declaring a variable. For example, we can say a=2
where in this case a
has type int
for integer. Then, later in our program we can say a='Hello World!'
, which is a string (or a collection of characters) without having to say "I'm switching a
from an int
to a string
." This type of operation would be forbidden in languages with static type-checking. In C, for example, we would need to say int a=2;
and later on if we assigned a string to a
, we would get an error because a
doesn't have enough room to store a string.
Hopefully by now, you've seen why we've chosen to use Python for this tutorial and why it is widely used by both scientists and software engineers alike: Python is easy to use while providing reasonable compute times and is used in a wide variety of applications.
Additionally, several open source packages have been developed for Python that provide a large number of additional functions useful to scientists, mathematicians, and engineers. Two of the most prominent examples are Numpy and SciPy. Additionally, the Matplotlib package has been developed to create publication-quality figures in Python. We will use all three of these packages throughout this course, but will leave discussion of them to subsequent sections.
In the following few sections, we'll provide a brief introduction to the Python language mostly by example. Nearly any possible question you could have regarding programming has probably been answered on the popular question-and-answer website StackOverflow or by just typing your question into Google. The Python documentation is also a good source of Python help, but sometimes the explanations can be a bit technical and unhelpful to beginners.
In the context of scientific computing, it's often easiest to think about any programming language as a really fancy calculator. One function that we will use right away (and throughout the rest of the tutorial) is the print
function; unsurprisingly, this just tells the computer to print whatever follows print
to the screen. The cell below shows a bunch of simple (and probably already obvious) operations.
In [1]:
#addition
print 4+3
#subtraction
print 4-3
#multiplication
print 4*3
#exponentiation
print 4**3
#division
print 4/3
The lines beginning with #
are called comments. These are ignored by Python and are often used to provide explanations of your code. Note that we can combine strings and numbers in our print statements to give more meaningful output. We denote strings, or collections of characters, using double quotes ""
or single quotes ''
.
In [2]:
#addition
print "4+3 = ",4+3
#subtraction
print "4-3 = ",4-3
#multiplication
print '4*3 = ',4*3
#exponentiation
print "4^3 = ",4**3
#division
print "4/3 = ",4/3
This is a nice example, but if we look at the last line, $4/3=1$, we notice that is in fact incorrect. In fact, $4/3=1~\frac{1}{3}=1.\overline{33}$. So what's going on with division in Python? In turns out that this is a very common mistake that programmers make and can lead to some very serious and hard-to-find problems.
Recall that we said Python is dynamically typed, meaning it automatically infers the data type when you declare a variable. Note that both 4 and 3 have type int
. Thus, Python expects the result of an operation between these two numbers to also be of type int
. However, $4/3$ has a decimal component and thus must be represented as type float
, meaning that Python has requested space to store the decimal part of our answer as well. We can thus solve our problem by writing 4 and 3 with type float
; in general, it is good practice to always represent your numbers with type float
if you think there is a chance you will suffer from roundoff error (or truncation error), the mistakes that result in representing numbers with incorrect type.
In [3]:
#division with floats
print "4/3 = ",4.0/3.0
Performing simple calculations is nice, but what if we want to keep track of a particular value after we've performed several different operations on it? We do this be defining a variable, a concept we already referred to while discussing dynamic versus static type-checking. Rather than providing anymore wordy explanations, we'll show a few easy examples of how variables are used.
In [4]:
#Give a value of 3.0 to a and 4.0 to b
a = 4.0
b = 3.0
#output the values of these variables to the screen
print "The initial value of a is ",a," and the initial value of b is ",b
Now let's perform some operations on a
and b
and see what happens.
In [5]:
#Update a and b
a = a - 1.0
b = b + 5.0
print "The new value of a is ",a," and the new value of b is ",b
Variables can also contain strings and we can even perform operations on these strings, within reason of course.
In [6]:
#define the two words
word1 = "Hello "
word2 = "World!"
#print the two words to the screen
print "The first word is ",word1
print "The second word is ",word2
#add or concatenate the two strings
expression = word1+word2
#print the new string to the screen
print "The whole sentence is ",expression
We shouldn't get too carried away though. For example, it's nonsensical to multiply, divide, or subtract two words. If we try to do this, Python will give us an error.
In [7]:
word1-word2
In [8]:
word1*word2
In [9]:
word1/word2
Python provides several different tools for organizing data. Creating collections of numbers or words is helpful when organizing our data and may often be necessary for handling large amounts of data.
In [10]:
student1 = "Jake"
student2 = "Jenny"
student3 = "Lucas"
print "The names of three of the students are ",student1,', ',student2,', and ',student3
However, what if we have 30 students? Defining a variable for each one seems a little unwieldy and offers no information about how these variables are connected (i.e. they are in the same class). It's easier and better practice to define the relationship between these students using a list.
In [11]:
#Use a list to define a classroom rather than individual variables
class1 = ["Marissa","Ben","Seth","Rachel","Ryan"]
#Print the list
print "The students in the class are ",class1
But what if we wanted to access the individual parts of the list? We will use what is called the index. One important thing to note about lists (and counting in general) in Python is that numbers start at 0. Thus, for the above list, we can use 0-4 to access the parts of our list.
In [12]:
print "The first student in our class is ",class1[0]
print "The second student in our class is ",class1[1]
print "The fifth student in our class is ",class1[4]
But what if we try to access an element beyond the last element in our list?
In [13]:
print class1[5]
We can also use negative numbers to access the elements of our list, unintuitive as this may seem. In this case, -1 corresponds to the last element of our list, -2 the second-to-last element and so on. This is especially useful when we have very long lists or we have lists where the length is unknown and we want to access elements starting from the back.
In [14]:
print "The last student on our class list is ",class1[-1]
print "The second-to-last student on our class list is ",class1[-2]
What if another student joins the class? We would like to be able to add elements to our list as well. This can be done through the append
command, shown below.
In [15]:
#add a student to the class
class1.append('Mischa')
#print the class with the new student
print "The students in the class are ",class1
Alternatively, we can remove elements of a list using the pop
command in a similar way.
In [16]:
#remove Ben from the class list; note Ben corresponds to entry 1
class1.pop(1)
#print the class minus Ben
print "The new class roster is ",class1
There are many more ways of manipulating lists and we won't cover all of them here. Consult the Python documentation for (many more) additional details.
{}
. The main difference between dictionaries and lists is that dictionaries use a key-value pair rather than a numerical index to locate specific entries. But why would we want to use a dictionary instead of a list? Say we have a car and we want to specify several different properties of the car: its make, model, color, year. We could of course put this information in a list.
In [17]:
#make a list for my_car
my_car_list = ['Mercury','Sable','Dark Green',1998]
#print the information
print "The details on my car are ",my_car_list
However, this doesn't give us any information about what each of the individual entries mean. To preserve the context of the information in our list, we have to know the correspondence of the index (0-3) to the property it specifies. However, by using a dictionary, the key we use to access the value tells us what the value means.
In [18]:
#make a dictionary for my car
my_car_dict = {'make':'Mercury','model':'Sable','color':'Dark Green','year':1998}
#print the details of my car
print "The make of my car is ",my_car_dict['make']
print "The model of my car is ",my_car_dict['model']
print "The color of my car is ",my_car_dict['color']
print "My car was made in ",my_car_dict['year']
The kind of container we choose to use will depend on the problem at hand. Throughout our tutorial, we will show the advantages of using both types. Unsurprisingly, there are several more types of containers available in Python. We have only provided the two most used types here.
When writing a piece of code, we often want to tell our program to make a certain decision based on some input. For example, consider the conversion of numerical grade percentages to their corresponding letter grades. Suppose we want to assign grades based on the table below.
Letter Grade | Numerical Grade |
---|---|
A | $\ge90$ |
B | $\ge80,\lt90$ |
C | $\ge70,\lt80$ |
D | $\ge60,\lt70$ |
F | everything else |
How do we do this? Quite intuitively, most programming languages use what are called "if-else" statements. The general idea is that if some condition is met, we execute a certain piece of code. This is the "if" part. We can also provide an "else" block that will be executed if the condition is not met, though the "else" statement is not required. Additionally, "else-if" statements are also used to test multiple conditions (such as the different grade brackets).
This is all best explained through an example. Say your class average is an 88.
In [19]:
my_average=91
And now I want to assign a letter grade to this average.
In [20]:
if my_average >= 90:
print "Your letter grade is an A!"
else:
print "You did not get an A."
This is nice, but if we got anything below a 90, this snippet of code doesn't give us much information. Say my class average is an 89.
In [21]:
my_average = 89
if my_average >= 90:
print "Your letter grade is an A!"
else:
print "You did not get an A."
Well now I know I didn't get an A, but for all I know I got an F when in reality I got a B. To solve this problem, let's test the B condition. Looking at the table, we can see that to get a B, our average must satisfy two conditions: it must be greater than or equal to an 80 and less than a 90. To do this, we use what is called (unsurprisingly) an and statement, shown in the example below.
In [22]:
if my_average >= 80 and my_average < 90:
print "Your letter grade is a B!"
Now, let's combine our A and B conditions (along with the C, D, and F conditions) using the "if", "else-if" (denoted in Python using elif
) and "else" statements. Now, we'll change our average to a 72.
In [23]:
my_average=72
if my_average >= 90:
print "Your letter grade is an A!"
elif my_average >= 80 and my_average < 90:
print "Your letter grade is a B!"
elif my_average >= 70 and my_average < 80:
print "Your letter grade is a C"
elif my_average >= 60 and my_average < 70:
print "Your letter grade is a D"
else:
print "Your letter grade is an F"
We can also evaluate more strict conditions, like if two things are exactly equal. Say we reverse the above situation: we are given a letter grade and we want to determine what numerical bracket we fall into. Equality is determined through the ==
sign.
In [24]:
my_letter_grade="D"
if my_letter_grade == "A":
print "Your grade is greater than a 90"
elif my_letter_grade == "B":
print "Your grade is between 80 and 90"
elif my_letter_grade == "C":
print "Your grade is between a 70 and an 80"
elif my_letter_grade == "D":
print "Your grade is between a 60 and a 70"
else:
print "Your grade is below a 60"
These symbols that we've been using to determine relationships between objects are called relational operators: they tell us something about on object relative to another object. Pay careful attention not to mix up ==
and =
. A single equal sign, the assignment operator, assigns a value to a variable. Using it with "if-else" statements will lead to an error.
Finally, say we want to test whether one or the other condition is true. If we are getting an A or a B in a class, we are doing pretty well, but if we're getting a C or below, we need to improve our grade. To do this, we use the or
keyword.
In [25]:
if my_letter_grade == "A" or my_letter_grade == "B":
print "You're doing great!"
elif my_letter_grade == "C" or my_letter_grade == "D":
print "You need to do better..."
else:
print "You are failing."
In [26]:
my_letter_grade="B"
if my_letter_grade == "A" or my_letter_grade == "B":
print "You're doing great!"
elif my_letter_grade == "C" or my_letter_grade == "D":
print "You need to do better..."
else:
print "You are failing."
Collectively, what we've been using are known as conditional statements: based on whether a condition (the thing that follows the if
(or else
or elif
)) is true or not, we evaluate a block of code. Note that this block of code to be evaluated is indented. In Python, this indentation is required and tells Python that this piece of code is be evaluated only if the condition is true.
The conditions of True and False are what these conditional statements are built on. Variables that evaluate to True or False are known as boolean variables. Correspondingly, we can use boolean logic much in the same way we used our relational operators through the is
statement.
In [27]:
my_letter_grade is "B"
Out[27]:
In [28]:
if my_letter_grade is "A":
print "You got an A!"
else:
print "You did not get an A."
Here it should be obvious that is
is not an assignment operator. Rather we are testing whether something is true. Similarly, we can assign the result of an is
statement to a variable.
In [29]:
i_got_an_a=(my_letter_grade is "A")
print "The statement 'I got an A' is ",i_got_an_a
Note also that conditional statements can also be used to directly evaluate whether something is true or false. Really this is what has been going on all along, we've just been hiding it in a way. Additionally, the default for a conditional statement is to test whether a statement is True. Note the equivalence of the following two statements.
In [30]:
if i_got_an_a:
print "Your letter grade is an A!"
else:
print "You did not get an A."
In [31]:
if i_got_an_a is True:
print "Your letter grade is an A!"
else:
print "You did not get an A."
This may just seem like we are saying the same thing over and over again and the use of relational operators and boolean logic may not seem immediately obvious. However, these decision making tools are some of the most useful when writing code, scientific or otherwise, and their usefulness will become more apparent the more examples we work through.
for
loops: Often (nearly always) when we write a program, we want to perform a task (or a similar set of tasks) over and over again. Let's return to our example of class averages and what we learned about lists earlier. Let's say we have a list of numerical grades and we want to know their corresponding letter grades.
In [32]:
class_grades = [99,78,44,82,56,61,94,78,76,100,85]
We could of course look at each list entry individually, writing a block of code to evaluate the 0th entry, then the 1st entry, then the 2nd entry and so on. However, this would mean writing as many if statements as there are entries in our class grades list. Instead, we will use what is called a loop, in this case a for loop, to iterate over the list, applying the same block of code to each successive entry.
In [33]:
for i in range(len(class_grades)):
print "The numerical grade is ",class_grades[i]
Let's unpack this code snippet. The len()
command gets the length of the list, in this case 11. The range(n)
command creates a list with entries 0 through n-1
, separated by 1; thus, this is a list of all the indices of our class_grades
list. The for i in ...
line tells Python to execute the indented block of code 11 times, incrementing i by 1 each time.
Similarly, and perhaps more succinctly, we can skip the range()
and len()
commands and just iterate over the class_grades
list itself.
In [34]:
for i in class_grades:
print "The numerical grade is ",i
Here, i
doesn't represent the index of the list, but rather the list entry itself. When iterating over a single list, this is often the best and most concise way to construct your for
loop. However, when iterating over two lists where there is a correspondence between the entries, it is often useful to iterate over the list of indices. A for
loop is possibly the most useful tool in any programming language, especially in scientific computing. We will make frequent use of both for
and while
loops in this tutorial, showing their usefulness in a variety of contexts.
while
loops: Whether you want to use a while
loop or a for
loop depends on the task at hand. If I know that I need to perform n
number of tasks, I would use a for
loop. But what if I don't know how many times I need to perform a task? What if instead I want to perform some set of tasks until a condition is met? Like the if
and else
and elif
statements that we discussed previously, we give the while
loop a condition (or a series of conditions). As long as this/these condition(s) are met, the statement inside of the while
loop will continue to be executed. Another way of saying this is, as long as the condition given to the while
loop evaluates to True
, then the block below the while
statement will continue to be evaluated. As soon as the statement evaluates to False
, the evaluation of this block stops. For example, say we want to find the first two Cs in the list of class grades and only the first two.
In [39]:
c_grades = [] #declare empty list to save C grades
found_c_grades = 0 #set a counter for the number of C grades found
counter = 0 #set a counter to step through the class grades list
while found_c_grades < 2:
if class_grades[counter] >= 70 and class_grades[counter] < 80:
c_grades.append(class_grades[counter])
found_c_grades = found_c_grades + 1
counter = counter +1
In [40]:
print "The first two C grades are ",c_grades
As long as the number of C grades we've found is less than 2, we will continue searching the list. Once we've found 2, we stop searching. Note that if we didn't increase our counter and the found_c_grades
variable, our while
loop will continue to execute forever. When using a while
loop, special attention should be given to avoiding this problem.
Notice that we could've used a for
loop to accomplish this task. However, what if we had a list of 1,000 grades or 100,000 grades? We can save quite a bit of time by stopping the evaluation of this block of code once we've finished the task: finding the first two C grades. This would also be useful if we were reading from a file of unknown length.
When writing a program, one of the main things we want to avoid is rewriting code. This is a good way to waste space, spend more time writing our program, decreasing the readability of our code, and potentially slowing the execution time of our program. One easy way to avoid all of these pitfalls is through the use of functions.
The concept of functions is simple, but powerful. Consider the mathematical expression $f(x)=x^2+1$. We put in a value $x$ or a range of values, say $-10<x<10$, and get out that value squared plus one. How would we write this in terms of a Python function?
In [41]:
def my_first_function(x):
return x**2 + 1
Every function is denoted using the def
keyword (for definition). Then, we give the function a name (my_first_function
in this example) followed by the inputs (x
in this case) to the function. The return
statement then tells the function what result should be output. Below is an example of how we would use the function.
In [44]:
#single value of x
x = 1.0
print "The result of f(x) = x^2 + 1 for x = ",x," is ",my_first_function(x)
#list of values
f_result = []
x = [-2.0,-1.0,0.0,1.0,2.0]
for i in x:
f_result.append(my_first_function(i))
print "The result of f(x) = x^2 + 1 for x = ",x," is ",f_result
Notice that we've saved ourselves quite a few lines of code by not having to write $f(x)=x^2+1$ repeatedly. Instead, we can just reference the function definition above. Additionally, if we wanted to change something about our expression $f(x)$, we would only need to make the change in one place, the function definition, rather than having to make the same change in multiple places in our code. Writing our code using functions helps us to avoid simple mistakes that so often occur when writing a program.
We can do much more than just evaluate simple mathematical expressions with functions. Let's look back to our example of mapping numerical grades to letter grades. This time, we'll iterate through our list of numerical grades, passing each one to a function that finds the corresponding letter grade, and then adding that letter grade to a new list.
In [52]:
def numerical_grade_to_letter_grade(num_grade):
if num_grade >= 90:
let_grade = 'A'
elif num_grade >= 80 and num_grade < 90:
let_grade = 'B'
elif num_grade >= 70 and num_grade < 80:
let_grade = 'C'
elif num_grade >= 60 and num_grade < 70:
let_grade = 'D'
else:
let_grade = 'F'
return let_grade
In [53]:
#map numerical grades to letter grades
class_grades_letters = []
for grade in class_grades:
class_grades_letters.append(numerical_grade_to_letter_grade(grade))
#print correspondence between numerical and letter grades
for i in range(len(class_grades)):
print "The numerical grade is ",class_grades[i]," and the letter grade is ",class_grades_letters[i]
Below are some excellent resources for learning various techniques in Python. Above we have provided only a very cursory introduction to a lot of very important concepts. Supplementing our introduction with the tutorials and documentation of others will be very helpful later on in the course.
In [ ]: