NumPy

NumPy is a library with various math and numerical functions. A library is a bunch of functions that someone else wrote and you would like to use. To use these functions, you must first import them. It is typically good practice to place all imports in the first cell.


In [1]:
import numpy as np

The line above tells the computer to make the functions available and nickname the master object np. To call a function from the master object, we use the syntax np.function(). To find out what functions numpy has for you to use, go to their documentation at https://docs.scipy.org/doc/numpy-dev/user/quickstart.html. Learning to use documentation and proper googling is a very important tool for any programmer. We will cover some effective googling techniques and how to find and use documentation later.

The main thing that numpy gives us are arrays. They are more flexible than lists and have better functionality as we will see. To create a new array we have a couple of methods:

#First make a list and then copy it into a new array
list = [1,2,3,4]
array = np.array(list) # This tells the computer to make a copy of list and turn it into an array

#Use one of NumPy's special array types

arrayofZeros = np.zeros((x-dim, y-dim, etc...)) #creates an x by y by etc. array of all zeros

arrayofOnes = np.ones((x-dim, y-dim, etc...)) #creates an x by y by etc. array of all ones 

emptyArray = np.empty((x-dim, y-dim, etc...)) #creates an x by y by etc. array of whatever happened to be in memory at the time of instantiation.

rangeArray = np.arange(start, stop, step) # Works just like range() starting at **start** and ending at **stop** return values in step size of **step**

linearspaceArray = np.linspace(start, stop, # of vals) # Creates a linear spaced array between start and stop with a # of vals in the array.

diagonalArray = np.diagflat(#input list, set, array, etc. goes here) #creates a 2-d matrix with the input list, set, array, etc. as the main diagonal.

NumPy arrays act a little differently than lists and other containers. The first major difference between arrays and other containers is that arrays may only contain one type of thing. That is to say, we may no longer be sloppy about placing an int and a string in the same array. Further, NumPy arrays treat operators differently than other containers. Operations are carried out only between arrays of the same size and are computed elementwise. For example, the sum of two 3 by 1 arrays(could be vectors in $\!R^3$) would be the sum of the 1st, 2nd, and 3rd components added individually. Let's solidify these ideas with some examples.


In [2]:
print("Just an Array: \n",np.array([0,1,2,34,5]))

print("An Array of Zeros: \n",np.zeros((2,3)))

print("An Array of Ones: \n",np.ones((2,)))

print("A Clever Way to Build an Array: \n",np.pi*np.ones((4,3)))

print("A Bunch of Random Junk: \n",np.empty((2,2)))

print("A Range of Values: \n",np.arange(0,100, 3))

print("A Linearly-Spaced Range of Values: \n",np.linspace(0,100, 33))

print("A Diagonal Array: \n",np.diagflat(np.linspace(1,10,10)))


Just an Array: 
 [ 0  1  2 34  5]
An Array of Zeros: 
 [[ 0.  0.  0.]
 [ 0.  0.  0.]]
An Array of Ones: 
 [ 1.  1.]
A Clever Way to Build an Array: 
 [[ 3.14159265  3.14159265  3.14159265]
 [ 3.14159265  3.14159265  3.14159265]
 [ 3.14159265  3.14159265  3.14159265]
 [ 3.14159265  3.14159265  3.14159265]]
A Bunch of Random Junk: 
 [[  4.94065646e-324   9.88131292e-324]
 [  1.67982320e-322   2.47032823e-323]]
A Range of Values: 
 [ 0  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72
 75 78 81 84 87 90 93 96 99]
A Linearly-Spaced Range of Values: 
 [   0.       3.125    6.25     9.375   12.5     15.625   18.75    21.875
   25.      28.125   31.25    34.375   37.5     40.625   43.75    46.875
   50.      53.125   56.25    59.375   62.5     65.625   68.75    71.875
   75.      78.125   81.25    84.375   87.5     90.625   93.75    96.875
  100.   ]
A Diagonal Array: 
 [[  1.   0.   0.   0.   0.   0.   0.   0.   0.   0.]
 [  0.   2.   0.   0.   0.   0.   0.   0.   0.   0.]
 [  0.   0.   3.   0.   0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   4.   0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   5.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.   6.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.   0.   7.   0.   0.   0.]
 [  0.   0.   0.   0.   0.   0.   0.   8.   0.   0.]
 [  0.   0.   0.   0.   0.   0.   0.   0.   9.   0.]
 [  0.   0.   0.   0.   0.   0.   0.   0.   0.  10.]]

In [3]:
#Let's check out some NumPy operations
a = np.array([[1,2],[3,4]])
print("First Array: \n",a)
b = np.array([[4,3],[2,1]])
print("Second Array: \n",b)
print("Sum:\n",a+b)
print("Product:\n",a*b)
print("Power:\n",a**b)


First Array: 
 [[1 2]
 [3 4]]
Second Array: 
 [[4 3]
 [2 1]]
Sum:
 [[5 5]
 [5 5]]
Product:
 [[4 6]
 [6 4]]
Power:
 [[1 8]
 [9 4]]

Now that we have seen how NumPy sees operations, let's practice a bit.

Exercises

  1. Create at least 2 arrays with each different method.

  2. In the next cell are two arrays of measurements, you happen to know that their sum over their product squared is a quantity of interest, calculate this quantity for every pair of elements in the arrays.

  3. np.random.normal(mean, std, (x-dim, y-dim, etc...)) creates an x by y by etc... array of normally distributed random numbers with mean mean, and standard deviation std. Using this function, create an appropriatly sized array of random "noise" to multiply with the data from 2. Compute the interesting quantity with the "noisy" data.

  4. np.mean(#some array here) computes the mean value of all the elements of the array provided. Compute the average difference between your "noisy" interesting result from 3. and your original interesting result from 2. You have just modeled a system with simulated noise!


In [4]:
p = np.array([1,2,3,5,1,2,3,1,2,2,6,3,1,1,5,1,1,3,2,1])
l = 100*np.array([-0.06878584, -0.13865453, -1.61421586,  1.02892411,  0.31529163, -0.06186747, -0.15273951,  1.67466332, -1.88215846,  0.67427142,  1.2747444,  -0.0391945, -0.81211282, -0.38412292, -1.01116052,  0.25611357,  0.3126883,   0.8011353,  0.64691918,  0.34564225])

Some Other Interesting NumPy Functions

  1. np.dot(array1, array2) #Computes the dot product of array1 and array2.
  2. np.cross(array1, array2) #Computes the cross product of array1 and array2.
  3. np.eye(size) #Creates a size by size identity matrix/array

NumPy Array Slicing

NumPy slicing is only slightly different than list slicing. With NumPy arrays, to access a single element you perform the typical

array = np.array([1,2,3,4])

array[index]
array[0] #Returns 1

When you want to take a range of elements, you use the following syntax

array[start:stop+1]
array[1:3] #Returns [2,3]

Or if you would like to move in steps.

array[start:stop+1:step]
array[0:4:2] #Returns [1,3]

In [5]:
array = np.array([1,2,3,4])
print(array[0])
print(array[1:4])
print(array[0:4:2])


1
[2 3 4]
[1 3]

Masking

Masking is a special type of slicing which uses boolean values to decide whether to show or hide the values in another array. A mask must be a boolean array of the same size as the original array. To apply a mask to an array, yous use the folllowing syntax:

mask = np.array([True, False])
array = np.array([25, 30])
array[mask] #Returns [25]

Let's check this masking action out


In [6]:
mask = np.array([[1,1,1,1,0,0,1],[1,0,0,0,0,1,0]], dtype=bool)
#^^^This converts the ones and zeros into trues and falses because I'm lazy^^^
array = np.array([[5,7,3,4,5,7,1],np.random.randn(7)])
print(array)
print(mask)


[[ 5.          7.          3.          4.          5.          7.          1.        ]
 [-0.35375145 -0.89398603 -1.880961    1.63080288 -0.41551318  0.58896879
  -1.2460269 ]]
[[ True  True  True  True False False  True]
 [ True False False False False  True False]]

In [7]:
print(array[mask])


[ 5.          7.          3.          4.          1.         -0.35375145
  0.58896879]

Let's say that we have measured some quantity with a computer and generated a really long numpy array, like, really long. It just so happens that we are interested in how many of these numbers are greater than zero. We could try to make a mask with the methods used above, but the people who made masks gave us a tool to do it without ever making a mask.(That's what it does behind the scenes, but we'll just ignore that) I have the data in the next cell, let's check it out.


In [8]:
data = np.random.normal(0,3,10000) #Wow, I made 10,000 measurements, wouldn't mastoridis be proud.

data[data>0].size #This returns only the elements of data that are greater than 0


Out[8]:
4976

This is a powerful tool that you should keep in the back of your head that can often greatly simplify problems.

Universal Functions

Universal functions are NumPy functions that help in applying functions to every element in an array. sin(), cos(), exp(), are all universal functions and when applied to an array, they take the sin(), cos(), or exp() of each element withing the array.


In [10]:
import matplotlib.pyplot as plt
%matplotlib inline
x = np.linspace(0,2*np.pi,1000)
y = np.sin(x)
plt.subplot(211)
plt.plot(x)
plt.subplot(212)
plt.plot(x,y)


Out[10]:
[<matplotlib.lines.Line2D at 0x7ff1f1dd25f8>]

A list of all the universal functions is included at the end of this notebook.

Exercises

  1. Create a couple of arrays of various type and size and play with them until you feel comfortable moving on.

  2. You know that a certain quantity can be calculated using the following formula:

    f(x)=x^e^sin(x^2)-sin(x*ln(x))

    Given that you measured x in the cell below, calculate f(x)

  3. Using the same x as above, write a function to transform on x. Then create a mask that will keep any value of the reuslting array whose value is greater than 2*$\pi^2$


In [16]:
x = np.random.rand(1000)*np.linspace(0,10,1000)

Universal Functions

Function Description
add(a,b), + Addition
subtract(a,b), - Subtraction
multiply(a,b), * Multiplication
divide(a,b), / Division
power(a,b), ** Power
mod(a,b), % Modulo/Remainder
abs(a) Absolute Value
sqrt(a) Square Root
conj(a) Complex Conjugate
exp(a) Exponential
log(a) Natural Log
log2(a) Log base 2
log10(a) Log base 10
sin(a) Sine
cos(a) Cosine
tan(a) Tangent
minimum(a,b) Minimum
maximum(a,b) Maximum
isreal(a) Tests for zero complex component
iscomplex(a) Tests for zero real component
isfinite(a) Tests for finiteness
isinf(a) Tests for infiniteness
isnan(a) Tests for Not a Number
floor(a) Rounds down to next integer value
ceil(a) Rounds up to next integer value
trunc(a) Truncate all noninteger bits

Other Valuable NumPy Functions

Function Description
sum(a) Sums all elements of a
prod(a) Takes the product of all elements of a
min(a) Finds the minimum value in a
max(a) Finds the maximum value in a
argmin(a) Returns the index or location of the minimum value in a
argmax(a) Returns the index or location of the maximum value in a
dot(a,b) Takes the dot product of a and b
cross(a,b) Takes the cross product of a and b
einsum(subs, arrs) Takes the Einstein sum over subscripts and a list of arrays
mean(a) Computes the average value of all the elements in a
median(a) Finds the median value in a
average(a, weights) Computes the weighted average of a
std(a) Computes the standard deviation of a
var(a) Computes the variance of a
unique(a) Returns the unique elements of a in a sorted manner
asarray(a, dtype) Makes a copy of given array converting every element to type dtype
atleast_1d(a) Tests that the array is at least one-dimensional
atleast_2d(a) ""
atleast_3d(a) ""
append(a,b) Appends b to the end of a
save(file, a) Saves an array to a file
load(file) Loads an array saved as a file

Challenge Exercise

A prime number seive is an algorithm that will find prime numbers. Your challenge is to recreate the Sieve of Eratosthenes. Use all you learned about NumPy and loops and create a function that takes a max value, as in the sieve, and returns a list of primes. Good Luck!


In [ ]: