```
In [6]:
```import numpy as np

`numpy`

package is the `array`

class. Let's examine that first. We can make an array out of a sequence, like a list.

```
In [7]:
```d = [1, 2, 3, 4, 5]
np.array(d)

```
Out[7]:
```

```
In [8]:
```d = [1, 2, 3.1415, 4, 5]
np.array(d)

```
Out[8]:
```

You can query the datatype by examining the `dtype`

attribute of the array.

```
In [9]:
```d = [1, 2, 3.1415, 4, 5]
arr = np.array(d)
arr.dtype

```
Out[9]:
```

Array types may be defined explicity in the call

```
In [10]:
```arr = np.array([1, 2, 3, 4, 5], dtype='float32')
arr

```
Out[10]:
```

Complex numbers are noted with a lowercase `j`

or uppercase `J`

, like this

```
In [11]:
```cmplx = np.array([1.0+2.0j, 3.0+4.0J])
print(cmplx)
cmplx.dtype

```
Out[11]:
```

```
In [12]:
```arr = np.array([[1., 2., 3.,], [4., 5., 6.]])
arr

```
Out[12]:
```

```
In [13]:
```arr.size # The number of elements in the array

```
Out[13]:
```

```
In [14]:
```arr.shape # The shape of the array (i.e., the size of each dimension)

```
Out[14]:
```

```
In [15]:
```arr.ndim # The number of dimensions of the array

```
Out[15]:
```

```
In [16]:
```arr.shape = (3, 2)
arr

```
Out[16]:
```

```
In [17]:
```arr.shape = (6,)
arr

```
Out[17]:
```

```
In [18]:
```arr.shape = (1, 6)
arr # Note that there are *two* square brackets in the output sequence. This is a row vector.

```
Out[18]:
```

```
In [19]:
```arr.shape = (6, 1)
arr # this is also a 2D array, like a column vector

```
Out[19]:
```

Arrays are indexed in a similar way to sequences, with `start:stop:stride`

notation, except that this is used for each dimension in the array. Colons denote all the values in a particular dimension, slices indicate some particular subset of the data in that particular dimension.

A common use case is to get a single row or column from a 2D array (a table of data).

```
In [21]:
```arr = np.arange(60).reshape(6, 10)
arr

```
Out[21]:
```

```
In [22]:
```arr[:, 4] # the 5th column

```
Out[22]:
```

```
In [23]:
```arr[2, :] # the 3rd row

```
Out[23]:
```

```
In [24]:
```arr[2] # Trailing colons do not need to be explicitly typed. This is equivalent to the last example.

```
Out[24]:
```

```
In [25]:
```arr[4, 7] # an individual element in the table

```
Out[25]:
```

```
In [ ]:
```

Generally, you will want to think of arrays as representing dimensions in space and time. The conventional way to think of this is that the dimensions are $(t, z, y, x)$; missing dimensions are omitted. This will help make plotting and analysis easier. Some examples might be:

```
temp[:, :, :, :] # A 4D array (time, height, latitude, longitude)
press[:, :] # A 2D array (time, height)
humid[:, :] # A 2D array (latitude, longitude)
```

```
In [27]:
```arr = np.array([[1., 2., 3.,], [4., 5., 6.]]) # reset the array to our 2x3 array.
arr.mean() # The mean of all of the elements in the array

```
Out[27]:
```

`Mean`

takes the optional argument `axis`

that can be used to take the mean along a single axis of the array. Just like with indexing, the axes are reference in a zero-based system; `axis=0`

means the first dimension.

```
In [32]:
```arr.mean(axis=0) # The mean

```
Out[32]:
```

`arr.mean(axis=0)`

takes the average in the 'row' direction, resulting in a 1D array that is the average across the rows.

```
In [ ]:
```

You can also use the `reshape`

method to change the shape of an array.

```
In [33]:
``````
arr
```

```
Out[33]:
```

```
In [34]:
```arr.reshape(3, 2)

```
Out[34]:
```

`min`

and `max`

methods. Sometimes it is useful to find the indices of these minima and maxima. For this use `argmin`

and `argmax`

, like

```
In [35]:
```x = np.random.rand(10)
imax = x.argmax()
print(imax, x[imax], x.max())

```
```

```
In [36]:
```o = np.ones((3, 4, 5)) # The argument is a shape, so is a tuple with the length of each dimension as an argument
b = np.ones((2, 3), dtype=np.bool)
z = np.zeros((2, 3), dtype=np.float32)
b

```
Out[36]:
```

`np.ones_like`

and `np.zeros_like`

.

```
In [37]:
```zb = np.zeros_like(b)
zb

```
Out[37]:
```

`k`

(default=0). This example creates a tri-diagonal array similar to that used for finite difference calculations

```
In [38]:
```np.diag(-2*np.ones(6)) + np.diag(np.ones(5), k=-1) + np.diag(np.ones(5), k=1)

```
Out[38]:
```

There are also a number of ways to generate sequences of numbers.

`np.arange([start,] stop [[, stride]])`

Create a sequence of numbers, similar to`range`

`np.linspace(min, max, length)`

Create a uniform series of specified`length`

between`min`

and`max`

, inclusive.`np.logspace(minpow, maxpow, length)`

Create a uniform series in logspace of specified`length`

between`10**minpow`

and`10**maxpow`

, inclusive.

```
In [53]:
```np.arange(10.)

```
Out[53]:
```

```
In [54]:
```np.arange(2, 10, 2)

```
Out[54]:
```

```
In [55]:
```np.linspace(2, 4, 17)

```
Out[55]:
```

You can create arrays of random numbers easily with methods in `np.random`

.

`np.random.rand(d0, d1, ..., d2)`

: Create an array of the given shape`d0, ..., dn`

and populate it with random samples from a uniform distribution over [0,1).`np.random.randint(low, high=None, size=None)`

: Return random integers from`low`

(inclusive) to`high`

(exclusive). If`high`

is None then return integers from [0,`low`

).`size`

is an int or tuple of ints to give the output shape.`np.randon.randn(d0, d1, ..., d2)`

: Create an array of the given shape`d0, ..., dn`

and populate it with random samples from the "standard normal" distribution.`np.random.random(size=None)`

: Return random floats of`size`

(int or tuple of ints) in the interval [0, 1).

```
In [47]:
```np.random.rand(2, 4)

```
Out[47]:
```

```
In [49]:
```np.random.randint(1, 50, (2, 4))

```
Out[49]:
```

```
In [ ]:
```

```
In [54]:
```x = np.random.rand(4, 5, 6)
y = np.random.rand(4, 5, 6)
print(np.concatenate((x, y)).shape)
print()
print(np.concatenate((x, y), axis=0).shape)
print(np.concatenate((x, y), axis=1).shape)
print(np.concatenate((x, y), axis=2).shape)

```
```

There are a number of convenience functions that act like concatenate for specific axes:

`np.vstack`

– vertical stack (stack along axis=0)`np.hstack`

– horizontal stack (stack along axis=1)`np.dstack`

– depth stack (stack along axis=2)

```
In [55]:
```print(np.vstack((x, y)).shape)
print(np.hstack((x, y)).shape)
print(np.dstack((x, y)).shape)

```
```

`np.split`

or `np.array_split`

. There are also convenience functions to split horizontally, vertically, and with depth.

```
In [42]:
```x = np.random.rand(12, 2, 5)
[a.shape for a in np.split(x, 4, axis=0)]

```
Out[42]:
```

Create an array, A, of shape (40, 50, 60). The array slices for first ten entries in the axis=1 direction of A should be filled with 1's, for the next ten filled with 2's, and on up to 5's.

Split it along axis=1 into five sections.

Concatenate two of these back together along axis 1.

What is the resulting shape of each array?

[Advanced: can you calculate this on one line?]

```
In [ ]:
```

```
In [57]:
```x = np.random.rand(5, 5)
print(x)
ind = x > 0.5
print(ind)

```
```

`True`

values have.

```
In [60]:
```x = np.random.rand(5, 5)
y = np.sin(x)
y[x > 0.5]
# or, equivalently, as two lines
idx = x > 0.5
y[idx]

```
Out[60]:
```

`True`

values in the boolean array), use the `np.where`

command.

```
In [61]:
```x = np.random.rand(5, 5)
idx = np.where(x > 0.5)
idx

```
Out[61]:
```

`np.where`

always returns a tuple of indices for each dimension. This is a little strange for 1D arrays, but is done for consistency across all input values. Often, you will want to explicitly pull out the (single) array of indices from the tuple, like

```
In [ ]:
```x = np.random.rand(10)
idx = np.where(x>0.5)[0]
print(idx)

*What happens with the [0] is missing behind the call to where?*

```
In [ ]:
```

The data for an array may be stored in memory using `C`

or `FORTRAN`

ordered memory. Typically, there is no need to think about this, some details can be found here.

However, it is important to remember that subsets of an array can produce a different 'view' of the array that addresses the same memory as the original array. This can lead to some unexpected behaviors. One way to think of this is that assignment in Python is more like a C-pointer (i.e., a reference to a memory location) than an actual value.

```
In [29]:
```a = np.arange(10.0)
b = a[::2]
print(a)
print(b)

```
```

```
In [30]:
```a[4] = -999 # this will modify b as well...
print(a)
print(b)

```
```

```
In [31]:
```b[-1] = -888 # this will modify a as well...
print(a)
print(b)

```
```

`copy`

of the array, like

```
In [32]:
```a = np.arange(10.0)
b = a.copy()[::2] # or np.copy(a)
a[4] = -999 # this will NOT modify b now
print(a)
print(b)

```
```

(Largely taken from SciPy docs)

Generally arrays should be the same shape for them to be multiplied together.

```
In [61]:
```a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

```
Out[61]:
```

The term broadcasting describes how `numpy`

treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

For example, the simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

```
In [62]:
```a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

```
Out[62]:
```

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies, so that broadcasting operations are as memory and computationally efficient as possible.

The code in the second example is more efficient than that in the first because broadcasting moves less memory around during the multiplication (b is a scalar rather than an array).

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

- they are equal, or
- one of them is 1

If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

```
Image (3d array): 256 x 256 x 3
Scale (1d array): 3
Result (3d array): 256 x 256 x 3
```

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:

```
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
```

Here are some more examples:

```
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
```

Let's create an example with arrays of random numbers.

```
In [65]:
```A = np.random.rand(15, 3, 5)
B = np.random.rand(3, 1)
print(A.shape, B.shape)
Result = A * B
print(Result.shape)

```
```

Here are examples of shapes that do not broadcast:

```
A (1d array): 3
B (1d array): 4 # trailing dimensions do not match
A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatched
```

```
In [ ]:
```

Notice that the rules for broadcasting are based on the location of singleton dimensions. Singleton dimensions are implied forward (to the left), but not backward (to the right). So, the first example here works but not the second:

```
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 5
```

*Compare with large set of examples above. How can the bottom example here be fixed?*

`np.newaxis`

in the appropriate space when indexing the array. For example:

```
In [77]:
```A = np.random.rand(5, 4)
B = np.random.rand(5)
A*B

```
```

```
In [78]:
```print(B.shape)
print(B[:,np.newaxis].shape)

```
```

```
In [82]:
```(A*B[:,np.newaxis]).shape

```
Out[82]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [99]:
```a = np.arange(12).reshape(3, 4)
print(a)
b = a.flatten()
print(b)

```
```

`flat`

attribute on the other hand gives a view of the array in 1D. It looks like an iterator object (like `range`

and `zip`

). This allows

```
In [100]:
```a.flat[6] = -999
print(a)

```
```

In contrast, this does not work as expected. *WHY?*

```
In [101]:
```a.flatten()[5] = -888
print(a)

```
```

```
In [102]:
```a.T.flat[6] = -998
print(a)

```
```

`T`

attribute (equivalent to the `a.transpose()`

method) gives a view of the array transposed (similar to MATLAB's tick notation).

```
In [103]:
```print(a.T)

```
```

Masked arrays are ways to create arrays with missing values. MATLAB™ uses NaNs (NaN stands for 'Not a Number'), and the NaNs are the values of the arrays at those points. This approach also works in Python. Masked arrays are preferred since they retain the masked array values, and also some plotting routines require masked arrays when plotting arrays with missing values. Masked arrays are usually created through some condition, like

```
In [105]:
```arr = np.random.randn(7, 8)
cond = arr > 0.1 # `cond` is True for the random values greater than 0.5
marr = np.ma.masked_where(cond, arr)
print(marr)

```
```

```
In [108]:
```marr.mean(axis=0)

```
Out[108]:
```

The mask can also be supplied explicity when creating the masked array,

```
In [ ]:
```marr = np.ma.masked_array([1, 2, 3, 4, 5], mask=[True, True, False, False, True])
marr

One of the basic commands in `numpy`

for loading in data is the `loadtxt`

command. There are other ways to do this, such as the `genfromtxt`

command, but `loadtxt`

is sufficient for most purposes, and is easy to use.

```
In [18]:
```data = np.loadtxt('../data/CTD.txt', comments='*')
data[:,1] # a column of data representing temperature

```
Out[18]:
```

Read in the oceanographic data file '../data/CTD.txt' into an array. You can look at the data file itself to see what variables are stored in each column.

Using this data, write a function to calculate the linear equation of state. This is an approximation of the density of water, as it depends on salinity, temperature, and some empirical constants. We will use the following form for the linear equation of state:

$\rho = 1027[1+7.6\times 10^{-4}(S-35) -1.7\times 10^{-4}(T-25)]$

where $\rho$ is the density, $S$ is the salinity, and $T$ is the temperature.

This is more free form than the homework, so you should set up all of the associated code to call the function, and write out the function yourself. Don't forget docstrings! For a check, the first value of your density array in order should equal 1021.7519981630001 and the last should equal 1028.0471353619998.

```
In [ ]:
```

```
In [9]:
```import numpy as np
x = np.random.rand(100)
y = 5 + 3*x + 0.1*np.random.randn(100) # A straight line with some noise
p = np.polyfit(x, y, 3) # fit a straight line (order is 1)
print(p) # The coefficients of the polynomial, with highest order first. (i.e,. [slope, intercept])

```
```

Let's plot it to make sure this makes sense:

```
In [8]:
```import matplotlib.pyplot as plt
%matplotlib inline
# plot data
plt.plot(x, y, '.')
# plot fitted line
plt.plot(x, p[0]*x + p[1])
# plt.legend(('Data', 'Fitted line'))

```
Out[8]:
```

```
In [138]:
```np.polyval(p, 1)

```
Out[138]:
```

`np.polynomial.Polynomial`

class to work with polynomials. Note, these define polynomials the opposite way, with the *lowest* order first. The Polynomial class gives an excellent example of operator overloading, and the flexibility of classes.

```
In [139]:
```p1 = np.polynomial.Polynomial([5, 3]) # y = 5 + 3 x
p2 = np.polynomial.Polynomial([3, 6, 8, 2]) # y = 3 + 6 x + 8 x**2 + 2 x**3

You can use the Polynomial object to evaluate the value of the polynomial at various input values:

```
In [148]:
```print('Evaluation')
print('p1(0.0) = ', p1(0))
print('p2(5.0) = ', p2(5))

```
```

We can use this to make a plot to see the function:

```
In [151]:
```x = np.linspace(0,10)
plt.plot(x, p1(x), x, p2(x))
plt.legend(['p1', 'p2'])

```
Out[151]:
```

Other things we can do:

```
In [152]:
```print('Roots')
print('Roots of p2 = ', p2.roots())
print()
print('Operations')
print('p1 + p2 = ', p1 + p2)
print('p1 * p2 = ', p1 * p2)
print()
print('Calculus')
print('Derivative of p1', p1.deriv(1))
print('Integral of p2', p2.integ(4, k=[4, 3, 2, 1]))

```
```

Vectorization and array broadcasting are two big reasons that `numpy`

can be efficient and fast. With these tools, you can avoid writing for loops (which are slow).

The best way to do mathematical operations using `numpy`

arrays is to do `vector`

operations. That is, mathematical operations are defined to be element by element, and this is done much faster than looping. As a rule of thumb, you should be very concerned if your code has more than one significant `for`

loop in the numerical analysis section.

Here is a way to do multiply 2 big arrays using for loops, which is not how you should do it. The sum at the end is included for comparison with the subsequent approach.

```
In [21]:
```a = np.arange(102400.0).reshape(4, 8, 1600, 2) # a 4D array using sequential numbers
b = np.random.rand(4, 8, 1600, 2) # a 4D array using random numbers
li, lj, lk, lm = b.shape # size of b in each dimension
sol = np.zeros(b.shape)
for i in range(li):
for j in range(lj):
for k in range(lk):
for m in range(lm):
sol[i,j,k,m] = a[i,j,k,m]*b[i,j,k,m]
print(sol.sum())

```
```

```
In [22]:
```sol = a * b # element-by-element multiplication. This operation is about as fast as it can be on your computer.
print(sol.sum())

```
```

```
In [19]:
```b = np.random.randn(5000, 2000)
%time u, s, v = np.linalg.svd(b)

```
```

`%time`

runs the line once and gives the time required. However, calculation times vary depending on many things including the numbers involved and the state of your computer at the moment. In this case, the `%timeit`

function can be used to perform the test a number of times to get an average calculation time.

```
In [20]:
```%timeit b = np.random.randn(50, 20); u, s, v = np.linalg.svd(b)

```
```

For statements that are longer than a single line, the `time.time`

function can be used.

```
In [100]:
```import time
t_start = time.time()
time.sleep(0.25) # Do nothing for 0.25 seconds
t_stop = time.time()
print('{:6.4f} seconds have passed.'.format(t_stop-t_start))

```
```

Earlier, we discussed using array operations instead of looping because it is faster. Let's compare.

Calculate the time it takes to calculate the $a$ and $b$ arrays with dimensions [4, 8, 1600, 2] by both methods demonstrated: using a series of 4

`for`

loops, one for each dimension of the arrays and using array operations. Compare the times by calculating a ratio.

```
In [ ]:
```

```
In [78]:
```a = np.random.randn(100, 100)
e, v = np.linalg.eig(a)
b = np.random.randn(500, 200)
u, s, v = np.linalg.svd(b)

`np.dot`

function. In this case, matrices do *not* need to be the same shape, but must follow the rules of matrix multiplication. E.g., the operation dot(<4x5 array>, <5x12 array>) results in a 4x12 array; i.e., the inner dimensions must match (technically last and second-to-last, for arrays with more than two dimensions).

```
In [81]:
```x = np.random.rand(4, 5)
y = np.random.rand(5, 12)
res = np.dot(x, y)
print(res.shape)
# np.dot(y, x) # This gives an error -- order is important.

```
```

Output from a numerical model of the northwestern Gulf of Mexico are saved in a file

`../data/model.npz`

. Read in this file using`np.load`

. Among other things, it contains`h`

, the depths within the numerical domain, and`ssh`

, the sea surface heights at two time steps. The sea surface height gives the deviation above and below sea level from a reference water level (which changes in time as the water moves), and the depths of the seabed are also given with respect to that reference water level.Find the full water column depth, between the seabed and the sea surface, for the two given times.

You can use as a comparison that at the first time step the [0,0] value of this array should be 3007.6088347392124, and at the second time step the [0,-1] value should be 605.25282427018749. Note that there is a differences between the two time steps though it is generally quite small since it is the difference between time steps in the numerical circulation model.

Don't forget array conventions, repeated here for convenience:

```
Generally, you will want to think of arrays as representing dimensions in space and time. The conventional way to think of this is that the dimensions are $(t, z, y, x)$; missing dimensions are omitted. This will help make plotting and analysis easier. Some examples might be:
temp[:, :, :, :] # A 4D array (time, height, latitude, longitude)
press[:, :] # A 2D array (time, height)
humid[:, :] # A 2D array (latitude, longitude)
```

```
In [ ]:
```