Welcome to the jupyter notebook! To run any cell, press Shit+Enter or Ctrl+Enter.
IMPORTANT : Please have a look at Help->User Interface Tour and Help->Keyboard Shortcuts in the toolbar above that will help you get started.
In [1]:
# Useful starting lines
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
A cell contains any type of python inputs (expression, function definitions, etc...). Running a cell is equivalent to input this block in the python interpreter. The notebook will print the output of the last executed line.
In [2]:
1
Out[2]:
In [3]:
x = [2,3,4]
def my_function(l):
l.append(12)
In [4]:
my_function(x)
x
Out[4]:
In [5]:
# Matplotlib is used for plotting, plots are directly embedded in the
# notebook thanks to the '%matplolib inline' command at the beginning
plt.hist(np.random.randn(10000), bins=40)
plt.xlabel('X label')
plt.ylabel('Y label')
Out[5]:
IMPORTANT : the numpy documentation is quite good. The Notebook system is really good to help you. Use the Auto-Completion with Tab, and use Shift+Tab to get the complete documentation about the current function (when the cursor is between the parenthesis of the function for instance).
For example, you want to multiply two arrays. np.mul + Tab complete to the only valid function np.multiply. Then using Shift+Tab you learn np.multiply is actually the element-wise multiplication and is equivalent to the * operator.
In [6]:
np.multiply
Out[6]:
Creating ndarrays (np.zeros, np.ones) is done by giving the shape as an iterable (List or Tuple). An integer is also accepted for one-dimensional array.
np.eye creates an identity matrix.
You can also create an array by giving iterables to it.
(NB : The random functions np.random.rand and np.random.randn are exceptions though)
In [7]:
np.zeros(4)
Out[7]:
In [8]:
np.eye(3)
Out[8]:
In [9]:
np.array([[1,3,4],[2,5,6]])
Out[9]:
In [79]:
np.arange(10) # NB : np.array(range(10)) is a slightly more complicated equivalent
Out[79]:
In [80]:
np.random.randn(3, 4) # normal distributed values
Out[80]:
In [81]:
# 3-D tensor
tensor_3 = np.ones((2, 4, 2))
tensor_3
Out[81]:
A ndarray python object is just a reference to the data location and its characteristics.
All numpy operations applying on an array can be called np.function(a) or a.function() (i.e np.sum(a) or a.sum())
It has an attribute shape that returns a tuple of the different dimensions of the ndarray. It also has an attribute dtype that describes the type of data of the object (default type is float64)
WARNING because of the object structure, unless you call copy() copying the reference is not copying the data.
In [13]:
tensor_3.shape, tensor_3.dtype
Out[13]:
In [14]:
a = np.array([[1.0, 2.0], [5.0, 4.0]])
b = np.array([[4, 3], [2, 1]])
(b.dtype, a.dtype) # each array has a data type (casting rules apply for int -> float)
Out[14]:
In [15]:
np.array(["Mickey", "Mouse"]) # can hold more than just numbers
Out[15]:
In [16]:
a = np.array([[1.0, 2.0], [5.0, 4.0]])
b = a # Copying the reference only
b[0,0] = 3
a
Out[16]:
In [17]:
a = np.array([[1.0, 2.0], [5.0, 4.0]])
b = a.copy() # Deep-copy of the data
b[0,0] = 3
a
Out[17]:
When trying to apply operators for arrays with different sizes, they are very specific rules that you might want to understand in the future : http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
In [18]:
np.ones((2, 4)) * np.random.randn(2, 4)
Out[18]:
In [19]:
np.eye(3) - np.ones((3,3))
Out[19]:
In [20]:
print(a)
print(a.shape) # Get shape
print(a.shape[0]) # Get size of first dimension
For people uncomfortable with the slicing of arrays, please have a look at the 'Indexing and Slicing' section of http://www.python-course.eu/numpy.php
In [21]:
print(a[0]) # Get first line (slice for the first dimension)
print(a[:, 1]) # Get second column (slice for the second dimension)
print(a[0, 1]) # Get first line second column element
In [22]:
a = np.array([[1.0, 2.0], [5.0, 4.0]])
b = np.array([[4, 3], [2, 1]])
v = np.array([0.5, 2.0])
In [23]:
print(a)
print(a.T) # Equivalent : a.tranpose(), np.transpose(a)
print(a.ravel())
In [24]:
c = np.random.randn(4,5)
print(c.shape)
print(c[np.newaxis].shape) # Adding a dimension
print(c.T.shape)
print(c.reshape([10,2]).shape)
print(c)
print(c.reshape([10,2]))
In [25]:
a.reshape((-1, 1)) # a[-1] means 'whatever needs to go there'
Out[25]:
In [26]:
np.sum(a), np.sum(a, axis=0), np.sum(a, axis=1) # reduce-operations reduce the whole array if no axis is specified
Out[26]:
In [27]:
np.dot(a, b) # matrix multiplication
Out[27]:
In [28]:
# Other ways of writing matrix multiplication, the '@' operator for matrix multiplication
# was introduced in Python 3.5
np.allclose(a.dot(b), a @ b)
Out[28]:
In [29]:
# For other linear algebra operations, use the np.linalg module
np.linalg.eig(a) # Eigen-decomposition
Out[29]:
In [30]:
print(np.linalg.inv(a)) # Inverse
np.allclose(np.linalg.inv(a) @ a, np.identity(a.shape[1])) # a^-1 * a = Id
Out[30]:
In [31]:
np.linalg.solve(a, v) # solves ax = v
Out[31]:
In [32]:
np.hstack([a, b])
Out[32]:
In [33]:
np.vstack([a, b])
Out[33]:
In [34]:
np.vstack([a, b]) + v # broadcasting
Out[34]:
In [35]:
np.hstack([a, b]) + v # does not work
In [36]:
np.hstack([a, b]) + v.T # transposing a 1-D array achieves nothing
In [37]:
np.hstack([a, b]) + v.reshape((-1, 1)) # reshaping to convert v from a (2,) vector to a (2,1) matrix
Out[37]:
In [38]:
np.hstack([a, b]) + v[:, np.newaxis] # equivalently, we can add an axis
Out[38]:
In [39]:
r = np.random.random_integers(0, 9, size=(3, 4))
In [40]:
r
Out[40]:
In [41]:
r[0], r[1]
Out[41]:
In [42]:
r[0:2]
Out[42]:
In [43]:
r[1][2] # regular python
Out[43]:
In [44]:
r[1, 2] # numpy
Out[44]:
In [45]:
r[:, 1:3]
Out[45]:
In [46]:
r > 5 # Binary element-wise result
Out[46]:
In [47]:
r[r > 5] # Use the binary mask as filter
Out[47]:
In [48]:
r[r > 5] = 999 # Modify the corresponding values with a constant
In [49]:
r
Out[49]:
In [52]:
# Get the indices where the condition is true, gives a tuple whose length
# is the number of dimensions of the input array
np.where(r == 999)
Out[52]:
In [57]:
print(np.where(np.arange(10) < 5)) # Is a 1-tuple
np.where(np.arange(10) < 5)[0] # Accessing the first element gives the indices array
Out[57]:
In [59]:
np.where(r == 999, -10, r+1000) # Ternary condition, if True take element from first array, otherwise from second
Out[59]:
In [62]:
r[(np.array([1,2]), np.array([2,2]))] # Gets the view corresponding to the indices. NB : iterable of arrays as indexing
Out[62]:
Thanks to all these tools, you should be able to avoid writing almost any for-loops which are extremely costly in Python (even more than in Matlab, because good JIT engines are yet to come). In case you really need for-loops for array computation (usually not needed but it happens) have a look at http://numba.pydata.org/ (For advanced users)
In [69]:
numbers = np.random.randn(1000, 1000)
In [70]:
%%timeit # Naive version
my_sum = 0
for n in numbers.ravel():
if n>0:
my_sum += n
In [71]:
%timeit np.sum(numbers > 0)
In [76]:
X = np.random.randn(10000)
In [77]:
%%timeit # Naive version
my_result = np.zeros(len(X))
for i, x in enumerate(X.ravel()):
my_result[i] = 1 + x + x**2 + x**3 + x**4
In [78]:
%timeit 1 + X + X**2 + X**3 + X**4
Scipy is a collection of libraries more specialized than Numpy. It is the equivalent of toolboxes in Matlab.
Have a look at their collection : http://docs.scipy.org/doc/scipy-0.18.0/reference/
Many traditionnal functions are coded there.
In [82]:
X = np.random.randn(1000)
In [85]:
from scipy.fftpack import fft
plt.plot(fft(X).real)
Out[85]:
In [ ]: