Advanced NumPy tutorial

PROTO204, July 3rd, 2017

Bartosz Teleńczuk and OSS comunity

e-mail: mail@telenczuk.pl
website: http://neuroscience.telenczuk.pl

Requirements

Python 3.x
Jupyter Notebook
NumPy >= 1.10
matplotlib

If you use anaconda you can install them with:

conda create -n advanced_numpy python=3 notebook numpy matplotlib
source activate advanced_numpy

Setup

Download the archive with materials python-workshop-master.zip and save it to your Desktop.

Hint: Alternatively, if you know git, you can also clone the repository.
Unzip the file.

Open a terminal and change to the created folder:

$ cd
$ cd Desktop/python-workshop-master/Day_1_Scientific_Python

Run Jupyter notebook.
```
 $ jupyter notebook
```

Other materials

Gaël Varoquaux, Emmanuelle Gouillart and Olav Vahtras (editors), SciPy Lectures
Software Carpentry community, Programming with Python
NumPy community, NumPy Docs
Nicolas Rougier, 100 NumPy exercises
Nicolas Rougier, From Python to Numpy
Bartosz Teleńczuk, Advanced NumPy lesson

What is NumPy?

memory-efficient container for multi-dimensional homogeneous (mainly numerical) data (NumPy array)
fast vectorised operations on arrays
library general purpose functions: data reading/writing, linear algebra, FFT etc. (for more wait for SciPy lecture)
main applications: signal processing, image processing, analysis of raw data from measurment instruments

Importing NumPy



In [ ]:

    
import numpy as np



In [ ]:

    
new_array = np.array([1, 2, 3, 4])
print(new_array)

Exercise

Create the following array and store in a new variable called a:

[0, 5, 8, 10]

Loading data

We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this:



In [ ]:

    
%load data/inflammation-01.csv



In [ ]:

    
data = np.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

Explore array



In [ ]:

    
print(data)



In [ ]:

    
print(data.dtype)
print(data.shape)

We can plot the data using matplotlib library:



In [ ]:

    
import matplotlib.pyplot as plt
plt.matshow(data)
plt.show()

Note that the figure appears only after you call plt.show() function. In Jupyter notebook you can show figure directly in the notebook using this command:



In [ ]:

    
%matplotlib inline
plt.matshow(data)

Indexing

Note that the NumPy arrays are zero-indexed:



In [ ]:

    
data[0, 0]

It means that that the third element in the first row has an index of [0, 2]:



In [ ]:

    
data[0, 2]

We can also assign the element with a new value:



In [ ]:

    
data[0, 2] = 100.
print(data[0, 2])

NumPy (and Python in general) checks the bounds of the array:



In [ ]:

    
print(data.shape)
data[60, 0]

Finally, we can ask for several elements at once:



In [ ]:

    
data[0, [0, 10]]

Slices

You can select ranges of elements using slices. To select first two columns from the first row, you can use:



In [ ]:

    
data[0, 0:2]

Note that the returned array does not include third column (with index 2).

You can skip the first or last index (which means, take the values from the beginning or to the end):



In [ ]:

    
data[0, :2]

If you omit both indices in the slice leaving out only the colon (:), you will get all columns of this row:



In [ ]:

    
data[0, :]

We now can plot the values in this row as a line plot:



In [ ]:

    
plt.plot(data[0, :])

Filtering data

It's also possible to select elements (filter) based on a condition. For example, to select all measurments above 10 in the first patient we can use:



In [ ]:

    
patient_data = data[0, :]
patient_data[patient_data>10]

We can also substitute the measurement with a new value:



In [ ]:

    
patient_data[patient_data>10] = 10
print(patient_data)

**Warning** Please note that changing `patient_data` in the previous example, will also modify the original `data` array from which the row was extracted. The reason is that taking a slice does not copy a data, only gives a new view on it.

Quiz

Imagine the following array a:

>> print(a)
[0, 5, 8, 10]

Which of the following commands will give this output:

[5, 8]

a) print(a[1, 2])

b) print(a[2:3])

c) print(a[1:2])

d) print(a[[1, 2]])

e) print(a[a<10])

You can test your guess by creating the a array:

a = np.array([0, 5, 8, 10])

Operations

By default additions/subtractions/etc. are elementwise:



In [ ]:

    
doubledata = data + data
print(doubledata)

Operations by scalar:



In [ ]:

    
tripledata = data * 3
print(tripledata)

Some functions can be applied elementwise:



In [ ]:

    
expdata = np.exp(data)
print(expdata)

**Warning** Standard Python installation also includes the `math` library, but it does not play nicely with NumPy array, so avoid using it with NumPy arrays.

Some functions (such as mean, max, etc.) aggregate the data return arrays of less dimensions or scalars:



In [ ]:

    
meandata = np.mean(data)
print(meandata)

By default the NumPy mean function It's also possbile to average over a single axis:



In [ ]:

    
np.mean(data, 0)

Exercise

Average the inflammation data over the first ten patients (rows) and plot them across time (columns). Then repeat it for the next ten patients and so on. Try putting all averages on a single plot.

Broadcasting

It’s possible to do operations on arrays of different sizes. In some cases NumPy can transform these arrays automatically so that they behave like same-sized arrays. This conversion is called broadcasting. For example we can



In [ ]:

    
data - np.mean(data, 0)

Exercise

Given the following array:

a = np.array([[2, 3, 1], [4, 1, 1]])

For each column of a subtract mean across rows. Next, from each row subtract its mean across columns.

Tip: You can use a.T to transpose the array.