Review from the previous lecture

We covered a bunch of basic math commands (+,-,/,**) and variables:

int, float, list, "string", array (today: booleans, functions)
type() and print()

We also introduced you to Numerical Python (numpy) arrays and basic plotting using matplotlib.



In [ ]:

    
import numpy as np

Here we are calling in the contents of numpy and giving it the shorthand name 'np' for convenience.

To create an array variable (let's call it 'x'), we simply assign 'x' to be equal to the output of the np.array() function, using a list as an input. You can then verify its contents by using the print() function.



In [ ]:

    
x = np.array([1,2,3,4,5])
print(x)

As we learned in Lecture 1, numpy arrays are convenient because they allow us to do math across the whole array and not just individual numbers.

For example, let's say we want to make a new variable 'y' such that y = $x^{2}$, then this is done simply as



In [ ]:

    
y = x**2
print(y)

Now let's say we wanted to plot y as a function of x.



In [ ]:

    
%matplotlib inline
import matplotlib.pyplot as plt

x = np.arange(1,10,.1)
y = x**2

p = plt.plot(x,y)

Lecture 2 - Logic, Loops, and Arrays

Today, we will extend on the arrays

The logic of Python, including while loops and if/else statements
Functions and how to define new ones.
A review of numpy arrays and a discussion of their usefulness in solving real problems
Reading in data from text and numpy file formats, along with creating your own outputs to be used later

A. Logic, If/Else, and Loops

You can make conditional (logical) statements in Python, which return either "True" or "False", also known as "Booleans."



In [ ]:

    
#Example conditional statements
x = 1
y = 2
x<y #x is less than y



In [ ]:

    
#x is greater than y
x>y



In [ ]:

    
#x is less-than or equal to y
x<=y



In [ ]:

    
#x is greater-than or equal to y
x>=y

If you let a and b be conditional statements (like the above statements, e.g. a = x < y), then you can combine the two together using logical operators, which can be thought of as functions for conditional statements.

There are three logical operators that are handy to know:

And operator: a and b
- outputs True only if both a and b are True
Or operator: a or b
- outputs True if at least one of a and b are True
Not operator: not(a)
- outputs the negation of a



In [ ]:

    
#Example of the `and` operator
(1<2) and (2<3)



In [ ]:

    
#Example of the `or` operator
(1<2) or (2>3)



In [ ]:

    
#Example of not operator
not(1<2)

Now, these might not seem especially useful at first, but they're the bread and butter of programming. Even more importantly, they are used when we are doing if/else statements or loops, which we will now cover.

An if/else statement (or simply an if statement) are segments of code that have a conditional statement built into it, such that the code within that segment doesn't activate unless the conditional statement is true.

Here's an example. Play around with the variables x and y to see what happens.



In [ ]:

    
x = 1
y = 2
if (x < y):
    print("Yup, totally true!")
else:
    print("Nope, completely wrong!")

The idea here is that Python checks to see if the statement (in this case "x < y") is True. If it is, then it will do what is below the if statement. The else statement tells Python what to do if the condition is False.

Note that Python requires you to indent these segments of code, and WILL NOT like it if you don't. Some languages don't require it, but Python is very particular when it comes to this point. (The parentheses around the conditional statement, however, are optional.)

You also do not always need an "else" segment, which effectively means that if the condition isn't True, then that segment of code doesn't do anything, and Python will just continue on past the if statement.

Here is an example of such a case. Play around with it to see what happens when you change the values of x and y.



In [ ]:

    
x = 2
y = 1
if (x > y):
    print("x is greater than y")

Here's a more complicated case. Here, we introduce some logic that helps you figure out if two objects are equal or not.

There's the == operator and the != operator. Can you figure out what they mean?



In [ ]:

    
x = 2
y = 3
if (x == y):
    print("first statement is true")
if (x != y):
    print("second statement is true")
if (x > y or x < y):
    print("third statement is true")

While-loops are similar to if statements, in the sense that they also have a conditional statement built into them. The code inside the loop will execute when the conditional is True. And then it will check the conditional and, if it evaluates to True, the code will execute again. And so on and so forth...

The funny thing about while-loops is that they will KEEP executing that segment of code until the conditional statement evaluates to False...which hopefully will happen...right?

Although this seems a bit strange, you can get the hang of it!

For example, let's say we want Python to count from 1 to 10.



In [ ]:

    
x = 1
while (x <= 10):
    print(x)
    x = x+1

Note here that we tell Python to print the number x (x starts at 1) and then redefining x as itself +1 (so, x=1 gets redefined to x = x+1 = 1+1 = 2). Python then executes the loop again, but now x has been incremented by 1. We continue this process from x = 1 to x = 10, printing out x every time. Thus, with a fairly compact bit of code, you get 10 lines of output.

It is sometimes handy to define what is known as a DUMMY VARIABLE, whose only job is to count the number of times the loop has been executed. Let's call this dummy variable i.



In [ ]:

    
x = 2
i = 0 #dummy variable
while (i<10):
    x = 2*x
    print(x)
    i = i+1 
    #another way to write this is i+=1, but it's idiosyncratic and we won't use it here

Food for though: what if the conditional statement is always true?

B. Defining Your Own Functions

So far, we have focused on learning built-in functions (such as from numpy and matplotlib), but what about defining our own? This allows you to clean up your code, and apply the same set of operations to multiple variables without having to explicitly write them out every time.

For example, let's say we want to define a function that takes the square root of a number. Let's check to make sure the number is positive first, so that we don't end up with an imaginary answer.



In [ ]:

    
#Defining a square root function
def sqrt(x):
    if (x < 0):
        print("Your input is not positive!")
    else: 
        return x**(1/2)



In [ ]:

    
sqrt(4)



In [ ]:

    
sqrt(-4)

So the outline for a function is

def <function name> (<input variable>):
    <some code here>
    return <output variable>

In general, many common mathematical functions like sqrt, log, exp, sin, cos can be found in the math module. So we don't have to write our own - phew!



In [ ]:

    
import math
print(math.sqrt(25))
print(math.sin(math.pi/2))
print(math.exp(math.pi)-math.pi)

When defining your own functions, you can also use multiple input variables. For example, if we want to calculate the length of a vector $(x,y)$, we can create a function that takes in the components $x$ and $y$ individually.



In [ ]:

    
def length(x, y):
    """Calculates the length of a vector (x,y) using the Pythagorean theorem."""
    return math.sqrt(x**2+y**2)

If we call this function on the vector (3,4), we should get 5.



In [ ]:

    
length(3,4)

In this lecture, we've learned about numpy arrays, loops, and defining functions. You'll have a chance to test these skills in the following exercise!

Challenge 1 - Approximate $\pi$

For a given set of points, the pathlength $L$ from $(x_0,y_0)$ to $(x_n,y_n)$ is given by the following expression, \begin{gather} L = \sum_{i = 1}^n \sqrt{ \left(x_i - x_{i-1}\right)^2 + \left(y_i - y_{i-1} \right)^2} \end{gather} Write a function pathLength which computes $L$ given two numpy arrays x_array and y_array as input variables



In [ ]:

    
def pathLength(x_array,y_array):
    #Your code goes here

Test your function on the example below. Your answer should come out to $4\sqrt{2} \approx 5.657$



In [ ]:

    
import numpy as np 
x = np.array([1,2,3,4,5])
y = np.array([1,2,3,4,5])
pathLength(x,y)

We can define $\pi$ to be the circumference of a circle whose radius is $1/2$. Using $n + 1$ points to describe a given circle, the $x$ and $y$ coordinates are given by the following expressions, \begin{gather} x_i = \frac{1}{2} \cos \left( \frac{2\pi i}{n} \right), \\ y_i = \frac{1}{2} \sin \left( \frac{2\pi i}{n} \right), \end{gather} where $i = 0,\dots, n$. You can check that each of these points satisfies the equation for a circle with radius $\frac{1}{2}$. $$x_i^2+y_i^2 = \frac{1}{4}.$$

Your task:

Write a function approxPi which takes one input argument n and populates two numpy arrays, x and y, of length n+1.

Plot x and y using plt.plot(). You should get an image of a polygon with $n$ sides.

To compute the circumference of the circle, pass x and y to pathLength.

Print the approximation to $\pi$ you have computed, $\pi_{\text{approx}}$, and return the error for your estimate, $e = \pi_{\text{approx}} - \pi$.



In [ ]:

    
def approxPi(n):
       
    # Initialize two numpy arrays x and y of size n+1 with the points defined above
    # Hint: use np.arange()
    
    # Plot the points in x and y
    
    # Call the function pathLength() with the arguments x and y and set it equal to pi_approx
    
    # Print the value for pi_approx 
    
    # Calculate the error e = pi_approx - pi
    
    return e

As an extra challenge, write a function piTolerance which takes a given tolerance, tol, as its input. This is the error to which you want to compute $\pi$ using the approxPi function. Find the smallest value of $n$ such that you can achieve $e < \text{tol}$.



In [ ]:

    
def piTolerance(tol):
    #Your code goes here

C. Numpy Arrays - Review of Basics and Some More Advanced Topics

The documentation of possible functions that can be applied to integers and floats (i.e. single numbers), as well as numpy arrays, can be found here: https://docs.scipy.org/doc/numpy/reference/routines.math.html

As discussed previously, there are numerous ways to create arrays beyond np.numpy(). These include:

np.arange(start_,stop_,step_)
np.linspace(first_,last_,num_)

These create arrays of numbers within a range with a fixed step-size between each consecutive number in the array.

What if we want to fill an array with a different list of numbers? Sometimes it is handy to create an array of all zeros, which can then be replaced later with data. This can be done by using the command np.zeros().

Let's say we want to store 10 numbers in an numpy array for easy access in the future. To ready such an array, we call the function np.zeros() with the size of the array as an input variable.



In [ ]:

    
data = np.zeros(10)

print(data)

Now how do we assign a new value to an element of the array? We use the following "square bracket" notation:

array_name[index_number] = value

This command will replace whatever value is currently in the position corresponding to "index_number" in the array called "array_name" with the value stored in "value."

Arrays are numbered starting from 0, such that

First position = 0
Second position = 1
Third position = 2
etc.

It is a bit confusing, but after a bit of time, this becomes quite natural. Let's practice by adding our favorite numbers to the array we initialized above.

First, we use the brackets to store that value in the first position (0 index number) in the data array we made above.



In [ ]:

    
data[0] = 137 
print(data[0])

Now you try it. Store your second favorite number in the second position of your array and use a print statement to verify that you have done so.



In [ ]:

    
#Your code goes here

Python array indexing is fairly straightforward once you get the hang of it.

Let's say you wanted the last element of the array, but you don't quite recall the size of the array. One of the easiest ways to access that element is to use negative indexing.

Negative indexing is the same as normal indexing, but backward, in the sense that you start with the last element of the array and count forward. More explicitly, for any array:

array[-1] = last element of array
array[-2] = second to last element of the array
array[-3] = third to last element of the array
etc

Now then, let's create an array using np.arange() with 10 elements, and see if you can access the last element and the second to last element using negative indexing. Print out these values.



In [ ]:

    
#Your code goes here

Now, sometimes its useful to access more than one element of an array. Let's say that we have an array with 100 elements in the range [0,10] (including endpoints). If you recall, this can be done via the np.linspace() function.



In [ ]:

    
x = np.linspace(0,10,100)

Now then, in order to get a range of elements rather than simply a single one, we use the notation:

x[i_start:i_end+1]

For example, let's say you want the 1st, 2nd, and 3rd element, then you'd have to do

x[0:3]

In this notation, ":" represents you want everything between 0 and 3 (including x[0], but excluding x[3]). Let's test this.



In [ ]:

    
x[0:3]



In [6]:

    
import numpy as np 

t = (









    Out[6]:





1

If you want everything passed a certain point of the array (including that point), then you would just eliminate the right number, for example

x[90:]

would give you everything after (and including) the 90 index element. Similarly, if you want everything before a certain index

x[:90]

would give you everything before the 90 index element.

So, let's say that you would want everything up to and including the tenth element of the array $x$. How would you do that?

(Remember, the tenth element has an index of 9)



In [ ]:

    
#Your code goes here

Finally, simply using the ":" gives you all the elements in the array.

Challenge 2 - Theme Park Data Analysis

For this part, you’ll play theme park engineer. You’re assigned to analyze the performance of a drop tower, which is like a roller coaster that only moves vertically. Riders are strapped into seats in the cab, and then motors within the obelisk-shaped structure accelerate the cab up and down. Your task is to plot the position, velocity, and acceleration of the drop tower over time.

Take a look at the data file named droptower_vdata.txt. It is located in the folder lecture2_data. The first few lines are comments that describe how the data was obtained. The data itself can be read in using the following code, which we will explain later in the lecture.



In [ ]:

    
import numpy as np
time, velocity = np.loadtxt("./lecture2_data/droptower_vdata.txt",unpack = True)

Now we've initialized two numpy arrays: time and velocity.

The i-th element of the velocity array contains the measured velocity data $v[i]$ of the drop tower at time $t[i]$.

Part I - Acceleration

The acceleration is what causes the thrill of the roller coaster. Let's figure out how the acceleration changes over time, which will tell us when the drop tower riders feel the most vertigo!

We know the velocity $v$ as a function of time. Remember we can the acceleration is defined as the derivative of the velocity:

$$a = \frac{dv}{dt}\approx \frac{\Delta v}{\Delta t}.$$

Numerically, the derivative can be approximated as as a ratio of finite differences

$$ a[i] = \frac{v[i+1]-v[i]}{t[i+1]-t[i]}$$

Make sure you understand why this approximates the derivative. Since our times are consecutive intergers, $t[i+1]-t[i] = 1$, the formula reduces to:

$$ a[i] = v[i+1]-v[i]$$

To implement this, use a while-loop to update the entries of a numpy array of accelerations with the correct value.



In [ ]:

    
n = len(velocity)

# Initialize accel as an array of zeros with size n-1 

accel = # Your code goes here

# Use a while loop to replace the value in element [i] with the acceleration at time i. 

# Your code goes here

Note that the size of the array of accelerations is one less than the size of the array of velocities. Why should this be so?

After this, use plt.plot(time,velocity,'o') to plot the velocity of the drop tower as a function of time (the last 'o' is used to change the plotting from a line to points).



In [ ]:

    
#Your code goes here

To plot the acceleration, we need to do a little more work. Note that the array of times is one element longer than the array of accelerations, so calling the function plot(time,accel) will give an error.

Using what you know about indexing into multiple entries of a Python array, plot the acceleration as a function of time.



In [ ]:

    
#Your code goes here

At what rate are the riders accelerating downwards during times t=1 to t=4? Does this match your physical intuition?

Part II - Position

Now we will find the position of the drop tower as a function of time.

Remember that the velocity is defined as the derivative of the vertical position: $v = \frac{dy}{dt}$.

We can approximate this using the formula: $\Delta y = v \Delta t$. The change of an object's vertical position $\Delta y$ is equal to the velocity $v$ multiplied by the change in time $\Delta t$.

The position at time $i$ can be approximated by adding up the velocities from time $j = 0,\dots,i-1$:

$$ y[i] = \sum_{j=0}^{i-1} v[j] \Delta t = \sum_{j=0}^{i-1} v[j], $$

where the index $i$ goes from 1 to $n$. This method of approximating the position is often referred to as a left Riemann sum.

Using a while-loop, create an array of positions by summing elements from the velocity array.



In [ ]:

    
# Your code goes here

# Hint: think about the number of `while`-loops you might need to use

After doing this, plot the positions of the drop tower as a function of time.



In [ ]:

    
#Your code goes here

As an extra challenge, consider the following problem:

Your supervisor wants to know how powerful the drop tower’s motors are. They’re exerting the most force when they’re causing the cab to accelerate upwards. Your task is to calculate the average acceleration of the cab when it’s accelerating upwards.

Use the array of accelerations you calculated in Part 1. We don’t care about the negative values of acceleration (because that’s when gravity is doing the work), so create a new array containing only the values of acceleration that are positive. (Hint: use conditional indexing.) Then calculate the average of the positive accelerations and print it.



In [ ]:

    
# Your code goes here

D. Loading And Saving Data Arrays

So, we have learned a lot about data arrays and how we can manipulate them, either through mathematics or indexing. However, up until this point, all we've done is use arrays that we ourselves created. But what happens if we have data from elsewhere? Can Python use that?

The answer is of course yes, and while there are ways to import data that are a bit complicated at times, we're going to teach you some of the most basic, and most useful, ways.

For this section, we will be using plotting to visualize the data as you're working with it, and as such, we will be loading in the package "matplotlib.pyplot" which you used in Lecture 1



In [ ]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

Now then, let's say we are doing a timing experiment, where we look at the brightness of an object as a function of time. This is actually a very common type of measurement that you may do in research, such as looking for dips in the brightness of stars as a way to detect planets.

This data is stored in a text file named timeseries_data.txt in the directory lecture2_data. Let's load it in.



In [ ]:

    
timeseriesData = np.loadtxt("./lecture2_data/timeseries_data.txt")

Now we have the data loaded into Python as a numpy array, and one handy thing you can do is to use Python to find the dimensions of the array. This is done by using ".shape" as so.



In [ ]:

    
timeseriesData.shape

In this format, we know that this is a 2x1000 array (two rows, 1000 columns). Another way you can think about this is that you have two 1000-element arrays contained within another array, where each of those arrays are elements (think of it as an array of arrays).

The first row is the time stamps when each measurement was taken, while the second row is that of the value of the measurement itself.

For ease of handling this data, one can in principle take each of these rows and create new arrays out of them. Let's do just that.



In [ ]:

    
t = timeseriesData[0,:]
signal = timeseriesData[1,:]

Here, you have 2 dimensions with the array timeseriesData, and as such much specify the row first and then the column. So,

array_name[n,:] is the n-th row, and all columns within that row.
array_name[:,n] is the n-th column, and all rows within that particular column.

Now then, let's see what the data looks like using the plot() function with t as your x-axis and signal as your y-axis.



In [ ]:

    
#Your code goes here
plt.plot(t,signal)

Looking at our data, you see clear spikes that jump well above most of the signal. (I've added this to the data to represent outliers that may sometimes appear when you're messing with raw data, and those must be dealt with). In astronomy, you sometimes have relativistic charged particles, not from your source, that hit the detector known as cosmic rays, and we often have to remove these.

There are some very complex codes that handle cosmic rays, but for our purposes (keeping it easy), we're going to just set a hard cut off of, let's say 15.

In order to do this, we can use conditional indexing in place of normal indices. This involves taking a conditional statement and testing whether it evaluates to True on each element in the array.

This gives an array of Booleans, which we can use as logical indices to select only the entries for which the logical statement is True.



In [ ]:

    
cutOff = 15.
signalFix = signal[signal < cutOff]

In this case, the conditional statement that we have used is signal < cutOff.

Here, conditional indexing keeps the data that we have deemed "good" by this criteria. We can also do the same for the corresponding time stamps, since t and signal have the same length.



In [ ]:

    
tFix = t[signal < cutOff]

Now let's plot it. You try.



In [ ]:

    
#Your code goes here
plt.plot(tFix,signalFix)

Now that you have your data all cleaned up, it would be nice if we could save it for later and not have to go through the process of cleaning it up every time. Fear not! Python has you covered.

There are two formats that we are going to cover, one that is Python-specific, and the other a simple text format.

First, we must package our two cleaned up arrays into one again. This can be done simply with the np.array() function.



In [ ]:

    
dataFix = np.array([tFix,signalFix])

Then, we can use either the np.save() function or the np.savetxt function, the first saving the array into a '.npy' file and the other, into a '.txt' file. The syntax is pretty much the same for each.



In [ ]:

    
np.save('./lecture2_data/dataFix.npy',dataFix)
np.savetxt('./lecture2_data/dataFix.txt',dataFix)

Now that your data files are saved, you can load them up again, using np.loadtxt() and np.load() for .txt and .npy files respectively. We used np.loadtxt() above, and np.load works the same way. So, let's load in the .npy file and see if our data was saved correctly.



In [ ]:

    
data = np.load('./lecture2_data/dataFix.npy')
t = data[0,:]
signal = data[1,:]
plt.plot(t,signal)
plt.show()

Now, let's see if you can do the same thing, but with the .txt file that we saved.



In [ ]:

    
#Your code goes here

Loading data files automatically

Let's combine what we learned about loops to make our data workflow more efficient. Suppose we have a set of data saved in separate text files that we would like to load automatically. In ./lecture2_data/ you will find files c1.dat, c2.dat, c3.dat, c4.dat, c5.dat, c6.dat. Rather than loading each of these files individually, you can use a for (or while) loop, constructing a string at each iteration corresponding to each of these files. In python you can use + to concatenate strings together. Here's an example.



In [ ]:

    
first_string = 'a'
second_string = 'b'
print(first_string + second_string)

You can also cast an integer to a string using the str command.



In [ ]:

    
first_string = 'a'
second_string = str(1)
print(first_string + second_string)

Now you try

Your goal in this task is to write some code to load in this set of 6 .dat files as numpy arrays.

First, we will define an empty list (call it datalist) that will store the data.

This is an odd idea, defining an list variable without any elements, so instead think of it as a basket without anything inside of it yet. We will use the append() function to fill it.



In [ ]:

    
datalist = []

# Your code here

Next, call np.loadtxt on a single .dat file and add it to datalist using the command

datalist.append(loadedfile_)

where loadedfile_ is the name you gave your loaded file. Use the shape command to find the dimensions of the array.

Now, figure out how to automate this task for all six files using a while-loop!



In [ ]:

So, to summarize, not only can you manipulate arrays, but now you can save them and load them. In a way, those are some of the most important skills in scientific computing. Almost everything you'll be doing requires you know this, and now that you've mastered it, you're well on your way to being an expert in computational physics and astronomy!