**Instructions:** Create a new directory called `NumPy`

with a notebook called `NumPyTour`

. Give it a heading 1 cell title **Introduction to Scientific and Numerical Python**. Read this page, typing in the code in the code cells and executing them as you go.

**Do not copy/paste**.

Type the commands yourself to get the practice doing it. This will also slow you down so you can think about the commands and what they are doing as you type them.</font>

Save your notebook when you are done, then try the accompanying exercises.

Some of the submodules available in SciPy that we could use in this course include:

**File input/output:**`scipy.io`

**Special functions:**`scipy.special`

**Linear algebra operations:**`scipy.linalg`

**Fast Fourier transforms:**`scipy.fftpack`

**Optimization and fit:**`scipy.optimize`

**Statistics and random numbers:**`scipy.stats`

**Interpolation:**`scipy.interpolate`

**Numerical integration:**`scipy.integrate`

**Signal processing:**`scipy.signal`

**Image processing:**`scipy.ndimage`

**Sparse Matrices:**`scipy.sparse`

**Physical Constants:**`scipy.constants`

**Spatial metrics:**`scipy.spatial`

We'll learn more about some of those throughout this course, picking them up as they are needed to help us solve specific physics problems. But before we get into the SciPy libraries, we need to learn how to efficiently compute with the NumPy library. Both are installed by default with the Anaconda distribution of Python.

```
In [ ]:
```import numpy as np

```
In [ ]:
```lst = [10, 20, 30, 40]
arr = np.array([10, 20, 30, 40])

Elements of a one-dimensional array are accessed with the same syntax as a list:

```
In [ ]:
```arr[0]

```
In [ ]:
```lst[0]

```
In [ ]:
```arr[-1]

```
In [ ]:
```arr[2:]

```
In [ ]:
```lst[-1] = 'a string inside a list'
lst

```
In [ ]:
```arr[-1] = 'a string inside an array'

The information about the type of an array is contained in its `dtype`

attribute:

```
In [ ]:
```arr.dtype

`dtype`

is fixed and it can only store elements of the same type. For this example where the `dtype`

is integer, if we store a floating point number it will be automatically converted into an integer:

```
In [ ]:
```arr[-1] = 1.234
arr

`zeros`

or `ones`

and specify the desired type:

```
In [ ]:
```np.zeros(5, dtype=float)

```
In [ ]:
```np.zeros(3, dtype=int)

```
In [ ]:
```np.zeros(3, dtype=complex)

```
In [ ]:
```print '5 ones:', np.ones(5)

```
In [ ]:
```a = np.empty(4)
a.fill(5.5)
a

```
In [ ]:
```a = np.empty(3,dtype=int)
a.fill(7)
a

`arange`

function, which works like the builtin `range`

but returns an array instead of a list:

```
In [ ]:
```np.arange(5)

But notice something important about how it is used:

```
In [ ]:
```c = np.arange(2,16,2)
c

The max number in the range is exclusive because python numbering starts at zero.

`linspace`

and `logspace`

functions create linearly and logarithmically-spaced grids respectively, with a fixed number of points and including both ends of the specified interval:

```
In [ ]:
```print "A linear grid between 0 and 1 with 4 elements:"
print np.linspace(0, 1, 4)

```
In [ ]:
```print "A logarithmic grid between 10**1 and 10**3 with 4 elements:"
print np.logspace(1, 3, 4)

`np.random`

module contains a number of functions that can be used to this effect, for example this will produce an array of 5 random samples taken from a standard normal distribution (0 mean and variance 1):

```
In [ ]:
```np.random.randn(5)

```
In [ ]:
```norm10 = np.random.normal(10, 3, 5)
norm10

```
In [ ]:
```lst2 = [[1, 2], [3, 4]]
arr2 = np.array([[1, 2], [3, 4]])
arr2

`[ ]`

operator, multidimensional arrays support a much more natural indexing syntax with a single `[ ]`

and a set of indices separated by commas:

```
In [ ]:
```print lst2[0][1]
print arr2[0,1]

```
In [ ]:
```np.zeros((2,3))

```
In [ ]:
```arr = np.arange(8).reshape(2, 4)
print arr

**But note that reshaping (like most NumPy operations) provides a** ** view of the same memory**:

```
In [ ]:
```arr = np.arange(8)
arr2 = arr.reshape(2, 4)
arr[0] = 1000
print arr
print arr2

This lack of copying allows for very efficient vectorized operations.

```
In [ ]:
```print 'Slicing in the second row:', arr2[1, 2:4]
print 'All rows, third column :', arr2[:, 2]

```
In [ ]:
```print 'First row: ', arr2[0]
print 'Second row: ', arr2[1]

```
In [ ]:
```#Print some properties of the array arr2
print 'Data type :', arr2.dtype
print 'Total number of elements :', arr2.size
print 'Number of dimensions :', arr2.ndim
print 'Shape (dimensionality) :', arr2.shape
print 'Memory used (in bytes) :', arr2.nbytes

```
In [ ]:
```#Print some useful information that the arrays can calculate for us
print 'Minimum and maximum :', arr2.min(), arr2.max()
print 'Sum and product of all elements :', arr2.sum(), arr2.prod()
print 'Mean and standard deviation :', arr2.mean(), arr2.std()

```
In [ ]:
```print 'For the following array:\n', arr2
print 'The sum of elements along the rows is :', arr2.sum(axis=1)
print 'The sum of elements along the columns is :', arr2.sum(axis=0)

`axis`

parameter is the dimension which will be consumed once the operation has been carried out. This is why to sum along the rows we use `axis=1`

.

`.T`

attribute, which allows you to access the transpose of the array:

```
In [ ]:
```print 'Array:\n', arr2
print 'Transpose:\n', arr2.T

NumPy can also create some useful matrices:

```
In [ ]:
```#identity matrix
c = np.eye(3)
c

```
In [ ]:
```#diagonal matrix wth elements of an array
d = np.diag(np.array([1,2,3,4]))
d

```
In [ ]:
```e = np.diag(np.linspace(0,1,6))
e

To access the elements of a multidimensional (in this case 2D) array:

```
In [ ]:
```e[1,1]

```
In [ ]:
```e[2,2]

To select a range of elements in the array, you can use slices, just like for Python lists:

```
In [ ]:
```a = np.arange(10)
a

```
In [ ]:
```a[:4]

```
In [ ]:
```a[2:4]

```
In [ ]:
```a[5:]

```
In [ ]:
```a[:]

You can also give the step size:

```
In [ ]:
```a[2:9:3]

`norm10`

we want to replace all values above 9 with the value 0. We can do so by first finding the mask that indicates where this condition is true or false:

```
In [ ]:
```norm10 = np.random.normal(10, 3, 5)
norm10

```
In [ ]:
```mask = norm10 > 9
mask

Now that we have this mask, we can use it to either read those values or to reset them to 0:

```
In [ ]:
```print 'Values above 9:', norm10[mask]

```
In [ ]:
```print 'Resetting all values above 9 to 0...'
norm10[mask] = 0
print norm10

```
In [ ]:
```arr1 = np.arange(4)
arr2 = np.arange(10, 14)
print arr1, '+', arr2, '=', arr1+arr2

```
In [ ]:
```print arr1, '*', arr2, '=', arr1*arr2

We may also multiply an array by a scalar:

```
In [ ]:
```1.5 * arr1

```
In [ ]:
```print np.arange(3)
print np.arange(3) + 5

```
In [ ]:
```print np.ones((3,3))
print " + "
print np.arange(3)
print " = "
print np.ones((3, 3)) + np.arange(3)

We can also broadcast in two directions at a time:

```
In [ ]:
```np.arange(3).reshape((3, 1)) + np.arange(3)

Broadcasting rules can do the following:

If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with additional elements until it matches the other on its leading (left) side.

If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

For example, when we do

`np.arange(3) + 5`

The scalar 5 is

- first 'promoted' to a 1-dimensional array of length 1
- then, this array is 'stretched' to length 3 to match the first array.

After these two operations are complete, the addition can proceed as now both operands are one-dimensional arrays of length 3.

The general rule is: when operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward, creating dimensions of length 1 as needed. Two dimensions are considered compatible when

- they are equal to begin with, or
- one of them is 1; in this case numpy will do the 'stretching' to make them equal.

If these conditions are not met, a `ValueError: frames are not aligned`

exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

`scipy.special`

module that includes Bessel, Airy, Fresnel, Laguerre and other classical special functions. For example, sampling the sine function at 100 points between 0 and $2\pi$ is as simple as:

```
In [ ]:
```x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

*Universal Functions* or `ufuncs`

in NumPy. They operate element-wise on an array. We've already seen examples of these in the various arithmetic operations:

```
In [ ]:
```x = np.random.random(4)
print x
print x + 1 # add 1 to each element of x

```
In [ ]:
```print x * 2 # multiply each element of x by 2

```
In [ ]:
```print x * x # multiply each element of x by itself

```
In [ ]:
```print x[1:] - x[:-1]

These are *binary* `ufuncs`

: they take two arguments. There are also *unary* `ufuncs`

:

```
In [ ]:
```-x

```
In [ ]:
```np.sin(x)

Ufuncs are very fast:

```
In [ ]:
```x = np.random.random(10000)

```
In [ ]:
```%%timeit
# compute element-wise x + 1 via a ufunc
y = np.zeros_like(x)
y = x + 1

```
In [ ]:
```%%timeit
# compute element-wise x + 1 via a loop
y = np.zeros_like(x)
for i in range(len(x)):
y[i] = x[i] + 1

`ufuncs`

are faster than Python functions involving loops, because *the looping happens in compiled code*. This is only possible when types are known beforehand, which is why NumPy arrays must be typed.

- Trigonometric functions (
`np.sin`

,`np.cos`

, etc.) - Scipy special functions (
`scipy.special.j0`

,`scipy.special.gammaln`

, etc.) - Element-wise minimum/maximum (
`np.minimum`

,`np.maximum`

) - User-defined
`ufuncs`

```
In [ ]:
```x = np.random.random(5)
print x
print np.minimum(x, 0.5)
print np.maximum(x, 0.5)

How is this different from `min()`

and `max()`

?

```
In [ ]:
```print np.min(x)
print np.max(x)

`dot`

method whose behavior is that of the scalar dot product when its arguments are vectors (one-dimensional arrays) and the traditional matrix multiplication when one or both of its arguments are two-dimensional arrays:

```
In [ ]:
```v1 = np.array([2, 3, 4])
v2 = np.array([1, 0, 1])
print v1, '.', v2, '=', np.dot(v1, v2)

`v1`

should be viewed as a column vector in traditional linear algebra notation; NumPy makes no distinction between row and column vectors and simply verifies that the dimensions match the required rules of matrix multiplication, in this case we have a 2x3 matrix multiplied by a 3-vector, which produces a 2-vector:

```
In [ ]:
```A = np.arange(6).reshape(2, 3)
print A, 'x', v1, '=', np.dot(A, v1)

```
In [ ]:
```print np.dot(A, A.T)

and $A^T\times A$:

```
In [ ]:
```print np.dot(A.T, A)

`numpy.linalg`

module includes additional functionality such as determinants, matrix norms, Cholesky, eigenvalue and singular value decompositions, etc. For even more linear algebra tools, `scipy.linalg`

contains the majority of the tools that one finds in the fast numerical packages written in Fortran.

Aggregates are functions over arrays which return smaller arrays. NumPy has several built-in:

```
In [ ]:
```# 10 x 10 array drawn from a standard normal
x = np.random.randn(10, 10)

```
In [ ]:
```#Mean of the values
print x.mean()

```
In [ ]:
```#Standard deviation
print x.std()

```
In [ ]:
```#Variance
print x.var()

```
In [ ]:
```#Sum all elements
print x.sum()

```
In [ ]:
```#Multiply all elements
print x.prod()

```
In [ ]:
```#Median
print np.median(x)

`min`

, `max`

, and `sum`

aggregate built-in. These are much more general than the versions in NumPy, so they are much slower:

```
In [ ]:
```x = np.random.random(10000)
%timeit np.sum(x)
%timeit sum(x)

**Moral of the story:** Dynamic type-checking is slow. Make sure to use NumPy's `sum`

, `min`

, and `max`

.

You can also take aggregates along certain axes:

```
In [ ]:
```x = np.random.rand(3, 5)
print x

```
In [ ]:
```print x.sum(0) # sum along columns

```
In [ ]:
```print x.sum(1) # sum along rows

```
In [ ]:
```print np.mean(x, 1) # mean along rows

Imagine you have an array of data where negative values indicate some kind of error:

```
In [ ]:
```x = np.array([1, 2, 3, -999, 2, 4, -999])

How might you clean this array, setting all negative values to, say, zero?

```
In [ ]:
```for i in range(len(x)):
if x[i] < 0:
x[i] = 0
print x

A faster way is to construct a *boolean mask*:

```
In [ ]:
```x = np.array([1, 2, 3, -999, 2, 4, -999])
mask = (x < 0)
print mask

And the mask can be used directly to set the value you desire:

```
In [ ]:
```x[mask] = 0
print x

Or do it directly:

```
In [ ]:
```x = np.array([1, 2, 3, -999, 2, 4, -999])
x[x < 0] = 0
print x

```
In [ ]:
```x = np.random.random(5)
print x

```
In [ ]:
```x[x > 0.5] = np.nan
print x

```
In [ ]:
```x[np.isnan(x)] = np.inf
print x

```
In [ ]:
```np.nan == np.nan

```
In [ ]:
```x[np.isinf(x)] = 0
print x

```
In [ ]:
```x = np.array([1, 0, -np.inf, np.inf, np.nan])
print "input ", x
print "x < 0 ", (x < 0)
print "x > 0 ", (x > 0)
print "isinf ", np.isinf(x)
print "isnan ", np.isnan(x)
print "isposinf", np.isposinf(x)
print "isneginf", np.isneginf(x)

You can also use bitwise boolean operations on masks with parentheses:

```
In [ ]:
```x = np.arange(16).reshape((4, 4))
print x

```
In [ ]:
```print (x < 5)

```
In [ ]:
```#Or the complement of that operation:
print ~(x < 5)

```
In [ ]:
```#Elements evaluate to true if they match both conditions
print (x < 10) & (x % 2 == 0)

```
In [ ]:
```#select elements within a range
print (x > 3) & (x < 8)

Sum over a mask to find the number of True elements:

```
In [ ]:
```x = np.random.random(100)
print "array is length", len(x), "and has"
print (x > 0.5).sum(), "elements that are greater than 0.5"

Clip is a useful function. See if you can figure out what it does.

```
In [ ]:
```x = np.random.random(10)
x = np.clip(x, 0.3, 0.6)
print np.sum(x < 0.3)
print np.sum(x > 0.6)

```
In [ ]:
``````
x
```

You can also turn a mask into information about the indices using the `where`

function.

```
In [ ]:
```x = np.random.random((3, 3))
print x

```
In [ ]:
```print np.where(x < 0.3)

```
In [ ]:
```print x[x < 0.3]

```
In [ ]:
```print x[np.where(x < 0.3)]

Another useful trick (poker, anyone?):

```
In [ ]:
```i = np.arange(6)
print "Original order:",i
np.random.shuffle(i)
print "Randomly shuffled:",i

The advantage of using NumPy is all about moving loops into compiled code:

Use NumPy

*ufuncs*to your advantage (eliminate loops!)Use NumPy

*aggregates*to your advantage (eliminate loops!)Use NumPy

*broadcasting*to your advantage (eliminate loops!)Use NumPy

*slicing and masking*to your advantage (eliminate loops!)