PROTO204, July 3rd, 2017
Bartosz Teleńczuk and OSS comunity
e-mail: mail@telenczuk.pl
website: http://neuroscience.telenczuk.pl
If you use anaconda you can install them with:
conda create -n advanced_numpy python=3 notebook numpy matplotlib
source activate advanced_numpy
Download the archive with materials python-workshop-master.zip and save it to your Desktop.
Hint: Alternatively, if you know git, you can also clone the repository.
Unzip the file.
Open a terminal and change to the created folder:
$ cd
$ cd Desktop/python-workshop-master/Day_1_Scientific_Python
Run Jupyter notebook.
$ jupyter notebook
memory-efficient container for multi-dimensional homogeneous (mainly numerical) data (NumPy array)
fast vectorised operations on arrays
library general purpose functions: data reading/writing, linear algebra, FFT etc. (for more wait for SciPy lecture)
main applications: signal processing, image processing, analysis of raw data from measurment instruments
In [ ]:
import numpy as np
In [ ]:
new_array = np.array([1, 2, 3, 4])
print(new_array)
We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this:
In [ ]:
%load data/inflammation-01.csv
In [ ]:
data = np.loadtxt(fname='data/inflammation-01.csv', delimiter=',')
In [ ]:
print(data)
In [ ]:
print(data.dtype)
print(data.shape)
We can plot the data using matplotlib library:
In [ ]:
import matplotlib.pyplot as plt
plt.matshow(data)
plt.show()
Note that the figure appears only after you call plt.show() function. In Jupyter notebook you can show figure directly in the notebook using this command:
In [ ]:
%matplotlib inline
plt.matshow(data)
Note that the NumPy arrays are zero-indexed:
In [ ]:
data[0, 0]
It means that that the third element in the first row has an index of [0, 2]:
In [ ]:
data[0, 2]
We can also assign the element with a new value:
In [ ]:
data[0, 2] = 100.
print(data[0, 2])
NumPy (and Python in general) checks the bounds of the array:
In [ ]:
print(data.shape)
data[60, 0]
Finally, we can ask for several elements at once:
In [ ]:
data[0, [0, 10]]
You can select ranges of elements using slices. To select first two columns from the first row, you can use:
In [ ]:
data[0, 0:2]
Note that the returned array does not include third column (with index 2).
You can skip the first or last index (which means, take the values from the beginning or to the end):
In [ ]:
data[0, :2]
If you omit both indices in the slice leaving out only the colon (:), you will get all columns of this row:
In [ ]:
data[0, :]
We now can plot the values in this row as a line plot:
In [ ]:
plt.plot(data[0, :])
It's also possible to select elements (filter) based on a condition. For example, to select all measurments above 10 in the first patient we can use:
In [ ]:
patient_data = data[0, :]
patient_data[patient_data>10]
We can also substitute the measurement with a new value:
In [ ]:
patient_data[patient_data>10] = 10
print(patient_data)
By default additions/subtractions/etc. are elementwise:
In [ ]:
doubledata = data + data
print(doubledata)
Operations by scalar:
In [ ]:
tripledata = data * 3
print(tripledata)
Some functions can be applied elementwise:
In [ ]:
expdata = np.exp(data)
print(expdata)
Some functions (such as mean, max, etc.) aggregate the data return arrays of less dimensions or scalars:
In [ ]:
meandata = np.mean(data)
print(meandata)
By default the NumPy mean function It's also possbile to average over a single axis:
In [ ]:
np.mean(data, 0)
It’s possible to do operations on arrays of different sizes. In some cases NumPy can transform these arrays automatically so that they behave like same-sized arrays. This conversion is called broadcasting. For example we can
In [ ]:
data - np.mean(data, 0)