The Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types (strings, lists, sets, dictionaries, arbitrary length integers, etc) and a very comprehensive standard library. It was not, however, designed specifically for mathematical and scientific computing. Neither the language nor its standard library have facilities for the efficient representation of multidimensional datasets, tools for linear algebra and general matrix manipulations (essential building blocks of virtually all scientific computing).

For example, Python lists are very flexible containers that can be nested arbitrarily deep and can hold any Python object in them, but they are poorly suited to represent common mathematical constructs like vectors and matrices. It is for this reason that NumPy exists.

**Learning outcome:** by the end of this section, you will be able to create NumPy arrays with the aid of the NumPy documentation.

NumPy is the fundamental package for scientific computing with Python. Its primary purpose is to provide a powerful N-dimensional array object; the focus for this workshop.

To begin with let's import NumPy, check where it is being imported from and check the version.

```
In [ ]:
```# NumPy is generally imported as 'np'.
import numpy as np
print(np)
print(np.__version__)

Here is a link to the NumPy documentation for v1.11: https://docs.scipy.org/doc/numpy-1.11.0/reference/

There are many online forums with tips and suggestions for solving problems with NumPy, such as http://stackoverflow.com/

NumPy provides many different ways to create arrays. These are listed in the documentation at: https://docs.scipy.org/doc/numpy-1.11.0/user/basics.creation.html#arrays-creation.

Try out some of these ways of creating NumPy arrays. See if you can produce:

- a NumPy array from a list of numbers,
- a 3-dimensional NumPy array filled with a constant value -- either
`0`

or`1`

, - a NumPy array filled with a constant value --
**not**`0`

or`1`

. (**Hint:**this can be achieved using the last array you created, or you could use`np.empty`

and find a way of filling the array with a constant value), - a NumPy array of 8 elements with a range of values starting from
`0`

and a spacing of`3`

between each element, and - a NumPy array of 10 elements that are logarithmically spaced.

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

In NumPy,

```
c = a * b
```

calculates the element-wise product of `a`

and `b`

, at near-C speeds, but with the code simplicity we expect from the Python language.

This demonstrates a core feature of NumPy, called **vectorization**. This removes the need for loops iterating through elements of arrays, which can make code easier to read, as well as performing fast calculations.

Many scientific Python libraries use NumPy as their core array representation. From plotting libraries such as matplotlib, to parallel processing libraries such as Dask, to data interoperability libraries such as Iris, NumPy arrays are at the core of how these libraries operate and communicate.

**Learning outcome:** by the end of this section, you will be able to manipulate NumPy array objects through indexing and explain key features and properties of NumPy arrays.

The multidimensional array object is at the core of all of NumPy's functionality. Let's explore this object some more.

```
In [ ]:
```arr = np.ones((3, 2, 4))
print("Array shape:", arr.shape)
print("Array element dtype:", arr.dtype)

Use the code cells below to explore features of this array.

- What is the
`type`

of the array created above? (Hint: you can find the type of an object in Python using`type(<object>)`

, where`<object>`

is replaced with the name of the variable containing the object.) - Look this type up in the NumPy documentation (see https://docs.scipy.org/doc/numpy-1.11.0/reference/generated/numpy.ndarray.html#numpy.ndarray).

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```arr = np.array([1, 2, 3, 4, 5, 6])
print(arr)

```
In [ ]:
```print("arr[2] = {}".format(arr[2]))
print("arr[2:5] = {}".format(arr[2:5]))
print("arr[::2] = {}".format(arr[::2]))

```
In [ ]:
```lst_2d = [[1, 2, 3], [4, 5, 6]]
arr_2d = arr.reshape(2, 3)
print("2D list:")
print(lst_2d)

```
In [ ]:
```print("2D array:")
print(arr_2d)

```
In [ ]:
```print("Single array element:")
print(arr_2d[1, 2])

```
In [ ]:
```print("Single row:")
print(arr_2d[1])
print("First two columns:")
print(arr_2d[:, :2])

`:`

" for all of the remaining dimensions:

```
In [ ]:
```print('Second row: {} is equivalent to {}'.format(arr_2d[1], arr_2d[1, :]))

**ellipsis**. Ellipsis can be specified explicitly using "`...`

", which automatically expands to "`:`

" for each dimension unspecified in the slice:

```
In [ ]:
```arr1 = np.empty((4, 6, 3))
print('Original shape: ', arr1.shape)
print(arr1[...].shape)
print(arr1[..., 0:2].shape)
print(arr1[2:4, ..., ::2].shape)
print(arr1[2:4, :, ..., ::-1].shape)

```
In [ ]:
```print(arr_2d)

```
In [ ]:
```bools = arr_2d % 2 == 0
print(bools)

```
In [ ]:
```print(arr_2d[bools])

```
In [ ]:
```

```
In [ ]:
```

Question: why do the following examples produce different results?

```
In [ ]:
```print(lst_2d[0:2][1])
print(arr_2d[0:2, 1])

A NumPy array has a fixed data type, called `dtype`

. This is the type of all the elements of the array.

This is in contrast to Python lists, which can hold elements of different types.

- What happens in Python when you add an integer to a float?
- What happens when you put an integer into a NumPy float array?
- What happens when you do numerical calculations between arrays of different types?

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```x = np.linspace(0, 9, 3)
y = np.linspace(4, 8, 3)
x2d, y2d = np.meshgrid(x, y)
print(x2d)
print(y2d)

- properties :
`shape`

,`dtype`

. - arrays are homogeneous; all elements have the same type:
`dtype`

. - indexing arrays to produce further arrays: views on the original arrays.
- multi-dimensional indexing (slicing) and boolean indexing.
- combinations to form 2D arrays:
`meshgrid`

```
In [ ]:
```arr1 = np.arange(4)
arr2 = np.arange(4)
print('{} + {} = {}'.format(arr1, arr2, arr1 + arr2))

Explore arithmetic operations between arrays:

- Create a number of arrays of different shapes and dtypes. Make sure some of them include values of
`0`

. - Make use of all the basic Python mathematical operators (i.e.
`+`

,`-`

,`*`

,`/`

,`//`

,`%`

).

What operations work? What operations do not work? Can you explain why operations do or do not work?

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

It makes intrinsic sense that you should be able to add a constant to all values in an array:

```
In [ ]:
```arr = np.arange(4)
const = 5
print("Original array: {}".format(arr))
print("")
print("Array + const: {}".format(arr + const))

There are times when you need to perform calculations between NumPy arrays of different sizes. NumPy allows you to do this easily using a powerful piece of functionality called **broadcasting**.

Broadcasting is a way of treating the arrays as if they had the same dimensions and thus have elements all corresponding. It is then easy to perform the calculation, element-wise. It does this by matching dimensions in one array to the other where possible, and using repeated values where there is no corresponding dimension in the other array.

Broadcasting applies these three rules:

If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.

If the shape of the two arrays does not match in any dimension, either array with shape equal to 1 in a given dimension is stretched to match the other shape.

If in any dimension the sizes disagree and neither has shape equal to 1, an error is raised.

Note that all of this happens without ever actually creating the expanded arrays in memory! This broadcasting behavior is in practice enormously powerful, especially given that when NumPy broadcasts to create new dimensions or to 'stretch' existing ones, it doesn't actually duplicate the data. In the example above the operation is carried out as if the scalar 1.5 was a 1D array with 1.5 in all of its entries, but no actual array is ever created. This can save lots of memory in cases when the arrays in question are large. As such this can have significant performance implications.

```
In [ ]:
```arr1 = np.ones((2, 3))
arr2 = np.ones((2, 1))
# (arr1 + arr2).shape

```
In [ ]:
```arr1 = np.ones((2, 3))
arr2 = np.ones(3)
# (arr1 + arr2).shape

```
In [ ]:
```arr1 = np.ones((1, 3))
arr2 = np.ones((2, 1))
# (arr1 + arr2).shape

```
In [ ]:
```arr1 = np.ones((1, 3))
arr2 = np.ones((1, 2))
# (arr1 + arr2).shape

NumPy allows you to change the shape of an array, so long as the total number of elements in the array does not change. For example, we could reshape a flat array with 12 elements to a 2D array with shape `(2, 6)`

, or `(3, 2, 2)`

, or even `(3, 4, 1)`

. We could not, however, reshape it to have shape `(2, 5)`

, because the total number of elements would not be kept constant.

```
In [ ]:
```

- arithmetic operations are performed in an element-by-element fashion,
- operations can be performed between arrays of different shapes,
- the arrays' dimensions are aligned according to fixed rules; where one input lacks a given dimension, values are repeated,
- reshaping can be used to get arrays to combine as required.

Numpy arrays support many common statistical calculations. For a list of common operations, see: https://docs.scipy.org/doc/numpy/reference/routines.statistics.html.

The simplest operations consist of calculating a single statistical value from an array of numbers -- such as a mean value, a variance or a minimum.

For example:

```
In [ ]:
```a = np.arange(12).reshape((3, 4))
mean = np.mean(a)
print(a)
print(mean)

Used without any further arguments, statistical functions simply reduce the whole array to a single value. In practice, however, we very often want to calculate statistics over only *some* of the dimensions. The most common requirement is to calculate a statistic along a single array dimension, while leaving all the other dimensions intact. This is referred to as "collapsing" or "reducing" the chosen dimension.

This is done by adding an "axis" keyword specifying the dimension to collapse:

```
In [ ]:
```print(np.mean(a, axis=1))

- What other similar statistical operations exist (see above link)?
- A mean value can also be calculated with
`<array>.mean()`

. Is that the same thing? - Create a 3D array (that could be considered to describe
`[time, x, y]`

) and find the mean over all`x`

and`y`

at each timestep. - What shape does the result have?

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

Real-world measurements processes often result in certain datapoint values being uncertain or simply "missing". This is usually indicated by additional data quality information, stored alongside the data values.

In these cases we often need to make calculations that count only the valid datapoints. NumPy provides a special "masked array" type for this type of calculation. Here's a link to the documentation for NumPy masked arrays: https://docs.scipy.org/doc/numpy-1.11.0/reference/maskedarray.generic.html#maskedarray-generic-constructing.

To construct a masked array we need some data and a mask. The data can be any kind of NumPy array, but the mask array must contain a boolean-type values only (either `True`

and `False`

or `1`

and `0`

). Let's make each of these and convert them together into a NumPy masked array:

```
In [ ]:
```data = np.arange(4)
mask = np.array([0, 0, 1, 0])
print('Data: {}'.format(data))
print('Mask: {}'.format(mask))
masked_data = np.ma.masked_array(data, mask)
print('Masked data: {}'.format(masked_data))

The mask is applied where the values in the mask array are ** True**. Masked arrays are printed with a double-dash

`--`

denoting the locations in the array where the mask has been applied.The statistics of masked data are different:

```
In [ ]:
```print('Unmasked average: {}'.format(np.mean(data)))
print('Masked average: {}'.format(np.ma.mean(masked_data)))

*different* way, using a distinct "missing data" value appearing in the data. There is special support for converting between this type of representation and NumPy masked arrays. Every masked array has a `fill_value`

property and a `filled()`

method to fill the masked points with the fill value.

- Create a masked array from the numbers 0-11, where all the values less than 5 are masked.
- Create a masked array of positive values, with a value of
`-1.0`

to represent missing points. - Look up the masked array creation documentation. What routines exist to produce masked arrays like the ones you've just created more efficiently?
- Use
`np.ma.filled()`

to create a 'plain' (i.e. unmasked) array from a masked array. - How can you create a plain array from a masked array, but using a
*different*fill-value for masked points? - Try performing a mathematical operation between two masked arrays. What happens to the 'fill_value' properties when you do so?

```
In [ ]:
```

```
In [ ]:
```

- most statistical functions are available in two different forms, as in
`array.mean()`

and also`np.mean(array)`

, the choice being mostly a question of style. - statistical operations operate over, and remove (or "collapse") the array dimensions that they are applied to.
- an "axis" keyword specifies operation over dimensions : this can be one; multiple; or all.
- NOTE: not all operations permit operation over specifically selected dimensions

- Statistical operations are not really part of NumPy itself, but are defined by the higher-level Scipy project.
- Missing datapoints can be represented using "masked arrays"
- these are useful for calculation, but usually require converting to another form for data storage

```
In [ ]:
```arr = np.arange(8)
arr_view = arr.reshape(2, 4)
# Print the "view" array from reshape.
print('Before\n{}'.format(arr_view))
# Update the first element of the original array.
arr[0] = 1000
# Print the "view" array from reshape again,
# noticing the first value has changed.
print('After\n{}'.format(arr_view))

What this means is that if one array (`arr`

) is modified, the other (`arr_view`

) will also be updated : the same memory is being shared. This is a valuable tool which enables the system memory overhead to be managed, which is particularly useful when handling lots of large arrays. The lack of copying allows for very efficient vectorized operations.

Remember, this behaviour is automatic in most of NumPy, so it requires some consideration in your code, it can lead to some bugs that are hard to track down. For example, if you are changing some elements of an array that you are using elsewhere, you may want to explicitly copy that array before making changes. If in doubt, you can always copy the data to a different block of memory with the `copy()`

method.

For example:

```
In [ ]:
```arr = np.arange(8)
arr_view = arr.reshape(2, 4).copy()
# Print the "view" array from reshape.
print('Before\n{}'.format(arr_view))
# Update the first element of the original array.
arr[0] = 1000
# Print the "view" array from reshape again,
# noticing the first value has changed.
print('After\n{}'.format(arr_view))

```
In [ ]:
```%%timeit
x = range(500)

Repeat that, specifying only 100 loops and fastest time of 5 runs

```
In [ ]:
```%%timeit -n 100 -r 5
x = range(500)

This gives us an easy way to evaluate performance for implementations ...

```
In [ ]:
```rands = np.random.random(1000000).reshape(100, 100, 100)

```
In [ ]:
```%%timeit -n 10 -r 5
overPointEightLoop = 0
for i in range(100):
for j in range(100):
for k in range(100):
if rands[i, j, k] > 0.8:
overPointEightLoop +=1

```
In [ ]:
```%%timeit -n 10 -r 5
overPointEightWhere = rands[rands > 0.8].size

- NumPy enables vectorised calculations that are fast:
- Python loops are much slower.

- NumPy can only use the system memory available to hold arrays:
- very large arrays can result in a
`MemoryError`

.

- very large arrays can result in a
- Many NumPy operations return views on exisiting array:
- a view shares memory with the original array,
- changing array contents in place will affect all related views,
- sometimes it is useful to explicitly
`copy`

arrays.

In this exercise, you are tasked with implementing the simple trapezoid rule formula for numerical integration. If we want to compute the definite integral

$$ \int_{a}^{b}f(x)dx $$we can partition the integration interval $[a,b]$ into smaller subintervals. We then approximate the area under the curve for each subinterval by calculating the area of the trapezoid created by linearly interpolating between the two function values at each end of the subinterval:

For a pre-computed $y$ array (where $y = f(x)$ at discrete samples) the trapezoidal rule equation is:

$$ \int_{a}^{b}f(x)dx\approx\frac{1}{2}\sum_{i=1}^{n}\left(x_{i}-x_{i-1}\right)\left(y_{i}+y_{i-1}\right). $$In pure python, this can be written as:

```
def trapz_slow(x, y):
area = 0.
for i in range(1, len(x)):
area += (x[i] - x[i-1]) * (y[i] + y[i-1])
return area / 2
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

```
In [ ]:
```

Write a function `trapzf(f, a, b, npts=100)`

that accepts a function `f`

, the endpoints `a`

and `b`

and the number of samples to take `npts`

. Sample the function uniformly at these
points and return the value of the integral.

Use the trapzf function to identify the minimum number of sampling points needed to approximate the integral $\int_0^3 x^2$ with an absolute error of $<=0.0001$. (A loop is necessary here.)

```
In [ ]:
```

Here are a selection of topics we chose not to include in this workshop, or only touched upon briefly. You may come across some of them or find these links and terms useful in the future.

- NaNs and NaN functions
- Random number arrays
- matrix multiplication (for example using dot)
- C and Fortran storage order -- see "C order" and "Fortran order" in the glossary
- file I/O
- Universal Functions (ufunc)
- indexing with arrays of indices