Class 3: NumPy (and a quick string example)

Brief introduction to the NumPy module.

Preliminary example

I recently found myself needing to copy and paste names and email addresses from an email header. I required the names and emails to formatted like this:

Name 1    Email 1
Name 2    Email 2
Name 3    Email 3

But what I had was this:

"Carl Friedrich Gauss" <approximatelynormal@email.com>, "Leonhard Euler <e@email.com>, "Bernhard Riemann" <zeta@email.com>

Sure, I could manually go through delete the characters that aren't required. The manual approach would be fine for a small list but the exercise would quickly become obnoxious as the list of names increases.

Python is great for modifying strings. The string method that we want to use is replace(). replace() has two required arguments: old and new. old is the sbstring that is to be replaced and new is what replaces the original substring. The replace() method does not change the value of the original string, but returns a new string.

For example, suppose that we want to remove the every 'p' from the string 'apple'.


In [1]:
# Create a variable that stores the strong called 'apple'
a = 'apple'

# Create a copy of a with the ps removed and reassign the value of a
a = a.replace('p','')
print(a)


ale

You can apply the replace() method multiple times:


In [2]:
# Create a variable that stores the strong called 'apple'
a = 'apple'

# Create a copy of a with the ps, l, and e removed and reassign the value of a
a = a.replace('p','').replace('l','').replace('e','')
print(a)


a

Now we have the tools to solve the email problem.


In [3]:
# Original character string
string = '"Carl Friedrich Gauss" <approximatelynormal@email.com>, "Leonhard Euler" <e@email.com>, "Bernhard Riemann" <zeta@email.com>'

# Remove <, >, and " from string and overwrite and print the result
string = string.replace('<','').replace('>','').replace('"<"','')

# Create a new variable called string_formatted with the commas replaced by the new line character '\n'
string_formatted = string.replace(', ','\n')

# Print string_formatted
print(string_formatted)


"Carl Friedrich Gauss" approximatelynormal@email.com
"Leonhard Euler" e@email.com
"Bernhard Riemann" zeta@email.com

A related problem might be to extract only the email address from the orginal string. To do this, we can use replace() method to remove the '<', '>', and ',' characters. Then we use the split() method to break the string apart at the spaces. The we loop over the resulting list of strings and take only the strings with '@' characters in them.


In [4]:
string = '"Carl Friedrich Gauss" <approximatelynormal@email.com>, "Leonhard Euler" <e@email.com>, "Bernhard Riemann" <zeta@email.com>'

string = string.replace('<','').replace('>','').replace('"<"','').replace(',','')
for s in string.split():
    if '@' in s:
        print(s)


approximatelynormal@email.com
e@email.com
zeta@email.com

Numpy

NumPy is a powerful Python module for scientific computing. Among other things, NumPy defines an N-dimensional array object that is especially convenient to use for plotting functions and for simulating and storing time series data. NumPy also defines many useful mathematical functions like, for example, the sine, cosine, and exponential functions and has excellent functions for probability and statistics including random number generators, and many cumulative density functions and probability density functions.

Importing NumPy

The standard way to import NumPy so that the namespace is np. This is for the sake of brevity.


In [5]:
import numpy as np

NumPy arrays

A NumPy ndarray is a homogeneous multidimensional array. Here, homogeneous means that all of the elements of the array have the same type. An nadrray is a table of numbers (like a matrix but with possibly more dimensions) indexed by a tuple of positive integers. The dimensions of NumPy arrays are called axes and the number of axes is called the rank. For this course, we will work almost exclusively with 1-dimensional arrays that are effectively vectors. Occasionally, we might run into a 2-dimensional array.

Basics

The most straightforward way to create a NumPy array is to call the array() function which takes as an argument a list. For example:


In [6]:
# Create a variable called a1 equal to a numpy array containing the numbers 1 through 5
a1 = np.array([1,2,3,4,5])
print(a1)

# Find the type of a1
print(type(a1))

# find the shape of a1
print(np.shape(a1))

# Use ndim to find the rank or number of dimensions of a1
print(np.ndim(a1))


[1 2 3 4 5]
<class 'numpy.ndarray'>
(5,)
1

In [7]:
# Create a variable called a2 equal to a 2-dimensionl numpy array containing the numbers 1 through 4
a2 = np.array([[1,2],[3,4]])
print(a2)

# find the shape of a2
print(np.shape(a2))

# Use ndim to find the rank or number of dimensions of a2
print(np.ndim(a2))


[[1 2]
 [3 4]]
(2, 2)
2

In [8]:
# Create a variable called c an empty numpy array
a3 = np.array([])
print(a3)

# find the shape of a3
print(np.shape(a3))

# Use ndim to find the rank or number of dimensions of a3
print(np.ndim(a3))


[]
(0,)
1

Special functions for creating arrays

Numpy has several built-in functions that can assist you in creating certain types of arrays: arange(), zeros(), and ones(). Of these, arrange() is probably the most useful because it allows you a create an array of numbers by specifying the initial value in the array, the maximum value in the array, and a step size between elements. arrange() has three arguments: start, stop, and step:

arange([start,] stop[, step,])

The stop argument is required. The default for start is 0 and the default for step is 1. Note that the values in the created array will stop one increment below stop. That is, if arrange() is called with stop equal to 9 and step equal to 0.5, then the last value in the returned array will be 8.5.


In [9]:
# Create a variable called b that is equal to a numpy array containing the numbers 1 through 5
b = np.arange(1,6,1)
print(b)


[1 2 3 4 5]

In [10]:
# Create a variable called c that is equal to a numpy array containing the numbers 0 through 10
c = np.arange(11)
print(c)


[ 0  1  2  3  4  5  6  7  8  9 10]

The zeros() and ones() take as arguments the desired shape of the array to be returned and fill that array with either zeros or ones.


In [11]:
# Construct a 1x5 array of zeros
print(np.zeros(5))


[ 0.  0.  0.  0.  0.]

In [12]:
# Construct a 2x2 array of ones
print(np.zeros([2,2]))


[[ 0.  0.]
 [ 0.  0.]]

Math with NumPy arrays

A nice aspect of NumPy arrays is that they are optimized for mathematical operations. The following standard Python arithemtic operators +, -, *, /, and ** operate element-wise on NumPy arrays as the following examples indicate.


In [13]:
# Define two 1-dimensional arrays
A = np.array([2,4,6])
B = np.array([3,2,1])
C = np.array([-1,3,2,-4])

In [14]:
# Multiply A by a constant
print(3*A)


[ 6 12 18]

In [15]:
# Exponentiate A
print(A**2)


[ 4 16 36]

In [16]:
# Add  A and B together
print(A+B)


[5 6 7]

In [17]:
# Exponentiate A with B
print(A**B)


[ 8 16  6]

In [18]:
# Add A and C together
print(A+C)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-49f4be3229fb> in <module>()
      1 # Add A and C together
----> 2 print(A+C)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

The error in the preceding example arises because addition is element-wise and A and C don't have the same shape.


In [19]:
# Compute the sine of the values in A
print(np.sin(A))


[ 0.90929743 -0.7568025  -0.2794155 ]

Iterating through Numpy arrays

NumPy arrays are iterable objects just like lists, strings, tuples, and dictionaries which means that you can use for loops to iterate through the elements of them.


In [20]:
# Use a for loop with a NumPy array to print the numbers 0 through 4
for x in np.arange(5):
    print(x)


0
1
2
3
4

Example: Basel problem

One of my favorite math equations is:

\begin{align} \sum_{n=1}^{\infty} \frac{1}{n^2} & = \frac{\pi^2}{6} \end{align}

We can use an iteration through a NumPy array to approximate the lefthand-side and verify the validity of the expression.


In [21]:
# Set N equal to the number of terms to sum
N = 1000

# Initialize a variable called summation equal to 0
summation = 0

# loop over the numbers 1 through N
for n in np.arange(1,N+1):
    summation = summation + 1/n**2

# Print the approximation and the exact solution
print('approx:',summation)
print('exact: ',np.pi**2/6)


approx: 1.64393456668
exact:  1.6449340668482264