This iPython notebook covers some of the most important aspects of the Python language that is used day to day by real Astronomers and Pysicists. Topics will include:
A review of numpy arrays and a discussion of their usefulness in solving real problems Reading in data from text and numpy file formats, along with creating your own outputs to be used later The logic of Python, including while loops and if/else statements
In [ ]:
import numpy as np
Here we are calling in the contents of numpy and giving it the shorthand name 'np' for convenience.
To create an array variable (let's call it 'x'), we simply assign 'x' to be equal to the output of the np.array() function, using a list as an input. You can then verify its contents by using the print() function.
In [ ]:
x = np.array([1,2,3,4,5])
print(x)
As we learned in Lecture 1, numpy arrays are convenient because they allow us to do math across the whole array and not just individual numbers.
For example, let's say we want to make a new variable 'y' such that y = $x^{2}$, then this is done simply as
In [ ]:
y = x**2
print(y)
The documentation of possible functions that can be applied to integers and floats (i.e. single numbers), as well as numpy arrays, can be found here: https://docs.scipy.org/doc/numpy/reference/routines.math.html
As discussed previously, there are numerous ways to create arrays beyond np.numpy(). These include: np.arange() np.linspace()
These create arrays of numbers within a range with a specific step-size between each consecutive number in the array.
It is sometimes convenient to have Python create other arrays for you, depending on the problem that you are going to solve. For example, sometimes it is handy to create an array of all zeros, which can then be replaced later with data. This can be done by using np.zeros().
Say that we have some sort of experiment such that we want to collect 10 different measurments, the result being some number. To ready such an array, you simply do the following.
In [ ]:
data = np.zeros(10)
print(data)
After we do our experiment or calculation, we can then assign a new value to an element of the array. In order to do that, we use the syntax array_name[index_number] = value In this, the array (with the name "array_name" or whatever it is you have named it) will have "value" replace whatever is in the position corresponding to "index_number." Arrays are numbered starting from 0, such that
First position = 0 Second position = 1 Third position = 2 etc.
It is a bit confusing, but after a bit of time, this becomes quite natural. Here is an example.
Let's say we have taken our first measurement of this 10 measurement experiment that was discussed above, and the value we got was 5. We then can store that value in the first position (0 index number) in the data array we made above.
In [ ]:
data[0] = 5
print(data[0])
Now you try it. Let's say that the 2nd measurement comes back with a value of 7. Now store that value in the second position in the array and print it to verify.
In [ ]:
#Your code goes here
Python array indexing is fairly straight forward, but let's say that you had an array and you wanted the last element of the array. One of the easiest ways to access that, is to use negative indexing.
Negative indexing is the same as normal indexing, but backward, in the sense that you start with the last element of the array and count forward. More explicitly, for any array:
array[-1] = last element of array array[-2] = second to last element of the array array[-3] = third to last element of the array etc
Now then, let's create an array using np.arange() with 10 elements, and see if you can access the last element and the second to last element using negative indexing. Print out these values.
In [ ]:
#Your code goes here
Now, sometimes its useful to access more than one element of an array. Let's say that we have an array with 100 elements starting with a range of 0-10. If you recall, this is done via the np.linspace() function.
In [ ]:
x = np.linspace(0,10,100)
Now then, in order to get a range of elements rather than simply a single one, we use the notation:
x[i_start,i_end+1]
For example, let's say you want the 1st, 2nd, and 3rd element, then you'd have to do
x[0:3]
In this notation, ":" represents you want everything between 0 and 3, and including 0. Let's test this.
In [ ]:
x[0:3]
If you want everything passed a certain point of the array (including that point), then you would just eliminate the right number, for example
x[90:]
would give you everything passed, and including, the 90 index element. Similarly, if you want everything before a certain index.
x[:90]
This would give you everything before the 90 index element.
So, let's say that you would want everything up to the 10-th element of the array x we've defined above (remember, the 10-th element has an index of 9). How would you do that?
In [ ]:
#Your code goes here
Finally, simply using the ":" gives you all the elements in the array.
So, we have learned all about data arrays and how we can manipulate them, either through mathematics or indexing. However, up until this point, all we've done is use arrays that we ourselves created. But what happens if we have data from elsewhere? Can Python use that?
The answer is of course yes, and while there are ways to import data that are a bit complicated at times, we're going to teach you some of the most basic, and most useful, ways.
For this section, we will be using plotting to visualize the data as you're working with it, and as such, we will be loading in the package "matplotlib.pyplot" which you used in Lecture 1
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
Now then, let's say we are doing a timing experiment, where we look at the brightness of an object as a function of time. This is actually a very common type of measurement that you may do in research, such as looking for dips in the brightness of stars as a way to detect planets.
Now, then, this data is stored in a text file named "timeseries_data.txt" in the directory "lecture2_data". Let's load it in.
In [ ]:
timeseriesData = np.loadtxt("./lecture2_data/timeseries_data.txt")
Now we have the data loaded into Python as a numpy array, and one handy thing you can do is to use Python to find the dimensions of the array. This is done by using ".shape" as so.
In [ ]:
timeseriesData.shape
In this format, we know that this is a 2x1000 array (two rows, 1000 columns). Another way you can think about this is that you have two 1000-element arrays contained within another array, where each of those arrays are elements (think of it as an array of arrays).
The first row is the time stamps when each measurement was taken, while the second row is that of the value of the measurement itself.
For ease of handling this data, one can in principle take each of these rows and create new arrays out of them. Let's do just that.
In [ ]:
t = timeseriesData[0,:]
signal = timeseriesData[1,:]
Here, you have 2 dimensions with the array 'timeseriesData', and as such much specify the row first and then the column. So,
array_name[n,:] is the n-th row, and all columns within that row. array_name[:,n] is the n-th column, and all rows within that particular column.
Now then, let's see what the data looks like using the plot() function that you learned last time. Do you remember how to do it? Why don't you try! Plot t as your x-axis and signal as your y-axis.
In [ ]:
#Your code here
plt.plot(t,signal)
plt.show()
Looking at our data, you see clear spikes that jump well above most of the signal. I've added this to the data to represent outliers that may sometimes appear when your messing with raw data, and those must be dealt with. In astronomy, you sometimes have relativistic charged particles, not from your source, that hit the detector known as cosmic rays, and we often times have to remove these.
There are some very complex codes that handle cosmic rays, but for our purposes (keeping it easy), we're going to just set a hard cut off of, let's say 15.
In order to do this, we can use the np.where() function in place of an normal indices.
In [ ]:
cutOff = 15.
tFix = t[np.where(signal<cutOff)]
signalFix = signal[np.where(signal<cutOff)]
np.where() is an incredibly useful function that takes a logical statment (in this case "signal < cutOff") and searches the array for indices that correspond to places where this statement is true (we will be doing more with logic soon). So, in this case, it would keep the data and the corresponding time stamps that we have deemed "good" by this criteria.
Now let's plot it. You try.
In [ ]:
#Your code goes here
plt.plot(tFix,signalFix)
plt.show()
Now that you have your data all cleaned up, it would be nice if we could save it for later and not have to go through the process of cleaning it up every time. Fear not! Python has you covered.
There are two formats that we are going to cover, one that is Python-specific, and the other a simple text format.
First, we must package our two cleaned up arrays into one again. This can be done simply with the np.array() function.
In [ ]:
dataFix = np.array([tFix,signalFix])
Then, we can use either the np.save() function or the np.savetxt function, the first saving the array into a '.npy' file and the other, into a '.txt' file. The syntax is pretty much the same for each.
In [ ]:
np.save('./lecture2_data/dataFix.npy',dataFix)
np.savetxt('./lecture2_data/dataFix.txt',dataFix)
Now that your data files are saved, you can load them up again, using np.loadtxt() and np.load() for .txt and .npy files respectively. We used np.loadtxt() above, and np.load works the same way. So, let's load in the .npy file and see if our data was saved correctly.
In [ ]:
data = np.load('./lecture2_data/dataFix.npy')
t = data[0,:]
signal = data[1,:]
plt.plot(t,signal)
plt.show()
Now, let's see if you can do the same thing, but with the .txt file that we saved.
In [ ]:
#Your code goes here
So, to summarize, not only can you manipulate arrays, but now you can save them and load them. In a way, those are some of the most important skills in scientific computing. Almost everything you'll be doing requires you know this, and now that you've mastered it, you're well on your way to being an expert in computational physics and astronomy!
In [ ]:
#Example conditional statements
x = 1
y = 2
x<y #x is less than y
In [ ]:
#x is greater than y
x>y
In [ ]:
#x is less-than or equal to y
x<=y
In [ ]:
#x is greater-than or equal to y
x>=y
If you let a and b be conditional statements (like the above statements, e.g. a = x < y), then you can combine the two together using logical operators, which can be thought of as functions for conditional statements.
There are three logical operators that are handy to know:
And operator: a and b Or operator: a or Not operator: not(a)
In [ ]:
#Example of and operator
(1<2)and(2<3)
In [ ]:
#Example of or operator
(1<2)or(2>3)
In [ ]:
#Example of not operator
not(1>2)
Now, these might not seem especially useful at first, but we've already seen their importance in the np.where() function. Even more importantly, they are used when we are doing if/else statements or loops, which we will now cover.
An if/else statement (or simply an if statement) are segments of code that have a conditional statement built into it, such that the code within that segment doesn't activate unless the conditional statement is true.
Here's an example. Play around with the variables x and y to see what happens.
In [ ]:
x = 1
y = 2
if (x < y):
print("Yup, totally true!")
else:
print("Nope, completely wrong!")
The idea here is that Python checks to see if the statement (in this case "x < y") is True. If it is, then it will do what is below the if statement. The else statement tells Python what to do if the condition is False.
Note that Python requires you to indent these segments of code, and WILL NOT like it if you don't. Some languages don't require it, but Python is very particular when it comes to this point.
You also do not need an "else" segment, which effectively means that if the condition isn't True, then that segment of code doesn't do anything, and Python will just continue on passed the if statement.
Here is an example of such a case. Play around with it to see what happens when you change the values of x and y.
In [ ]:
x = 1
y = 2
if (x>y):
print("The condition is True!")
x+y
While-loops are similar to if statements, in the sense that they also have a conditional statement that is built into it and it executes when the conditional is True. However, the only difference is, it will KEEP executing that segment of code until the conditional statement becomes False.
This might seem a bit strange, but you can get the hang of it!
For example, let's say we want Python to count from 1 to 10.
In [ ]:
x = 1
while (x <= 10):
print(x)
x = x+1
Note here that we tell Python to print the number x (x starts at 1) and then redefining x as itself +1 (so, x=1 gets redefined to x = x+1 = 1+1 = 2). Python then executes the loop again, but now x has been incremented by 1. We continue this process from x = 1 to x = 10, printing out x every time. Thus, with a fairly compact bit of code, you get 10 lines of output.
It is sometimes handy to define what is known as a DUMMY VARIABLE, whose only job is to count the number of times the loop has been executed. Let's call this dummy variable i.
In [ ]:
x = 2
i = 0
while (i<10):
x = 2*x
print(x)
i = i+1 #another way to write this is i+=1, but it's understandably odd and thus isn't used here
So far, we have really focused on using built in functions (such as from numpy), but what about defining our own? This is easy to do, and can be a way to not only clean up your code, but also allows you to apply the same set of operations to multiple variables without having to explicitely write it out every time.
For example, let's say we want to define a line function y = mx + b that allows us to calculate the value of y, given x, for any m and b.
In [ ]:
#Defining a linear function y = mx+b
def line(x, m, b):
return (m*x)+b
In this, you are defining a function (called "line") that has three variables (x,m,b). You then do something with (x,m,b) (in this case m*x + b), and then returning that value to the user.
Why don't you play around with our new function, giving it some numbers and seeing how it works.
In [ ]:
#Your code here
You can also use arrays as variables. For example, let's use our line, but instead let's put an numpy array in for x.
In [ ]:
x = np.array([1,2,3])
line(x,2,3)
Now that we know how to define a function, let's try something a bit harder:
We're going to make a set of lines, and plot them. First, let's create a set of arrays with the values of the y-intercepts, and to make it easy, let's assume that our slope for all of these lines is 2 (i.e. m = 2). Then we're going to make an array representing the sample of x-values for the family of lines.
In [ ]:
m = 2. #defining a constant slope (same for all lines to be plotted)
bArray = np.array([0.,2.,4.,6.]) #Defining the y-intercept array
x = np.linspace(0.,10.,100) #Defining x sample array
Finally, we use our line function, and then use a while-loop to plot a line with a different intercept for each iteration.
In [ ]:
i = 0
endWhile = len(bArray) #Finding how long bArray is
while(i < endWhile):
b = bArray[i]
plt.plot(x,line(x,m,b))
i = i+1
plt.show()
You can play around with the y-intercept array (bArray) to get a different family of lines.
Now then, can you do it yourself? Let's define a function $y = x^{a} + b$, and using the line example that we just did, can you plot a family of parabolas (a=2, same y-intercepts as before) within the range of -10 to 10 (say, 100 sample points). Good luck!
In [ ]:
#Your code goes here
#Answer:
def funct(x,a,b):
return (x**2.)+b
a=2.
bArray = np.array([0.,2.,4.,6.])
x = np.linspace(-10.,10.,100)
i=0
endWhile = len(bArray)
while(i < endWhile):
b = bArray[i]
plt.plot(x,funct(x,a,b))
i = i+1
plt.show()
Here are two challenge problems to help you learn the concepts in this lecture. One is a mathematical problem where you will calculate what is known as the Fibonacci Sequence, and the second one, you will be building what is known as a NUMERICAL INTEGRATOR in order to predict the projectiles trajectory through a gravitational field (i.e. what happens when you throw a ball through the air)
Let's say that you have a projectile (let's say a ball) in a world with 2 spatial dimensions (dimensions x and y). This world has a constant acceleration due to gravity (call it simply g) that points in the -y direction and has a surface at y = 0.
Can we calculate the motion of the projectile in the x-y plane after the projectile is given some initial velocity vector v? In particular, can we predict where the ball will land? With loops, yes we can!
Let's first define all of the relevant variables so far. Let g = -9.8 (units of m/s, so an Earth-like world), the initial velocity vector being an array v = [3.,3.], and an initial position vector (call it r) in the x-y plane of r = [0.,1.]. For ease, let's use numpy arrays for the vectors.
In [ ]:
#Your code here
#Answers
g = -9.8
v = np.array([3.,3.])
r = np.array([0.,0.])
Now then, remember that,
$a = \frac{dv}{dt}$ and thus, $dv = a\ dt$
So, the change of an objects velocity ($dv$) is equal to the acceleration ($a = g$ in this case) multiplied by the change in time ($dt$)
Likewise:
$v_{x} = \frac{dx}{dt}$ and $v_{y} = \frac{dy}{dt}$, or
$v_{x}\ dt = dx$ and $v_{y}\ dt = dy$
Now, in this case, since there is only downward acceleration, the change of $v_{x}$ is 0 until the projective hits the ground.
Now, we're going to define two functions, one that will calculate the velocity vector components and the other, the position vector components, and returning a new vector with the new components. I'll give you the first one.
In [ ]:
def intV(v,g,dt):
deltaVy = g*dt
vXnew = v[0]
vYnew = v[1]+deltaVy
return np.array([vXnew,vYnew])
Now that we've defined intV (short for "integrate v"), let's use it real quick, just to test it out. Let dt = 0.1 (meaning, your taking a step forward in time by 0.1 seconds).
In [ ]:
dt = 0.1
intV(v,g,dt)
As you can see, $V_{x}$ hasn't changed, but $V_{y}$ has decreased, representing the projectile slowing down as it's going upward.
I'll let you define the function now for the position vector. Call it intR, and it should be a function of (r,v,dt), and remember that now both $r_{x}$ and $r_{y}$ are changing. Remember to return an array.
In [ ]:
#Your code here.
#Answer
def intR(r,v,dt):
rXnew = r[0]+(v[0]*dt)
rYnew = r[1]+(v[1]*dt)
return np.array([rXnew,rYnew])
Now we have the functions that calculate the changes in the position and velocity vectors. We're almost there!
Now, we will need a while-loop in order to step the projectile through its trajectory. What would the condition be?
Well, we know that the projectile stops when it hits the ground. So, one way we can do this is to have the condition being (r[1] >= 0.) since the ground is defined at y = 0.
So, having your intV and intR functions, along with a while-loop and a dt = 0.1 (known as the "step-size"), can you use Python to predict where the projectile will land?
In [ ]:
#Your code here.
#Answer
dt = 0.01
while (r[1] >= 0.):
v = intV(v,g,dt)
r = intR(r,v,dt)
print(r)
Now, note that we've defined the while-loop such that it doesn't stop exactly at 0. Firstly, this was strategic, since the initial y = 0, and thus the while-loop wouldn't initialize to begin with (you can try to change it). One way you can overcome this issue is to decrease dt, meaning that you're letting less time pass between each step. Ideally, you'd want dt to be infinitely small, but we don't have that convenience in reality. Re-run the cells, but with dt = 0.01 and we will get much closer to the correct answer.
So, we know where it's going to land...can we plot the full trajectory? Yes, but this is a bit complicated, and requires one last function: np.append(). https://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html
The idea is to use np.append() to make an array that keeps track of where the ball has been.
Let's define two empty arrays that will store our information (x and y). This is an odd idea, defining an array variable without any elements, so instead think of it as a basket without anything inside of it yet, and we will np.append() to fill it.
In [ ]:
x = np.array([]) #defining an empty array that will store x position
y = np.array([]) #defining an empty array that will store y position
Now, all you have to do, is each time the while-loop executes, you use np.append() for the x and y arrays, adding the new values to the end of them.
How do you do that? Well, looking at the np.append() documentation, for x, you do x = np.append(x,[r[0]])
The same syntax is used for the y array.
After that, you simply use plt.plot(x,y,'o') to plot the trajectory of the ball (the last 'o' is used to change the plotting from a line to points).
Good luck! Also, don't forget to reset your v and r arrays (otherwise, this will not work)
In [ ]:
#Your code goes here
#Answer
v = np.array([3.,3.])
r = np.array([0.,0.])
dt = 0.01
while (r[1] >= 0.):
v = intV(v,g,dt)
r = intR(r,v,dt)
x = np.append(x,r[0])
y = np.append(y,r[1])
print(r)
plt.plot(x,y,'o')
plt.show()
If everything turns out alright, you should get the characteristic parabola.
Also, if you're going to experiment with changing the intial position and velocity, remember to re-run the cell where we define the x and y arrays in order to clear the plot.
Now you've learned how to do numerical integration. This technique is used all throughout Physics and Astronomy, and while there are more advanced ways to do it in order to increase accuracy, the heart of the idea is the same.
Here is a figure made by Corbin Taylor (head of the Python team) that used numerical integration to track the position of a ray of light as it falls into a spinning black hole.