NumPy

Numerical Python

Provides an efficient way to store and manipulate arrays. Numpy is all about VECTORIZATION. Mental model is different than regular python and works with:

• "vectors"
• "arrays"
• "views"
• ndarray provides efficient storage and manipulation of 1d arrays (vectors), (NxM) matrices, higher dimensional datasets

Example Using Object Oriented Techniques

In [ ]:

In [5]:
import random
class RandomWalker(object):
def __init__(self):
self.position = 0

def walk(self, n):
self.position = 0
for i in range(n):
yield self.position
self.position += 2*random.randint(0,1) - 1

Example of Cell Magic Using 3 '%%%'

In [10]:
%%%timeit
walker = RandomWalker()
walk = [position for position in walker.walk(10000)]

10 loops, best of 3: 13.4 ms per loop

Example Using Functional Programming

Remove class definition

In [12]:
def random_walk_f(n):
position = 0
walk = [position]
for i in range(n):
position = 2 * random.randint(0,1) - 1
walk.append(position)
return walk

In [13]:
%%%timeit
walk = random_walk_f(10000)

100 loops, best of 3: 11.4 ms per loop

small improvement in time

Vectorized Approach Like When You Did Things in MATLAB :(

Get rid of the loop

In [14]:
from itertools import accumulate
def random_walker_v(n):
steps = random.sample([1, -1] * n, n)
return list(accumulate(steps))

In [16]:
%%%timeit
walk = random_walker_v(10000)

100 loops, best of 3: 6.18 ms per loop

WOW 2x as fast

Numpy ifying

In [17]:
import numpy as np
def random_walker_np(n):
steps = 2 * np.random.randint(0, 2, size=n) - 1
return np.cumsum(steps)

In [18]:
%%%timeit
walk = random_walker_np(10000)

The slowest run took 2164.96 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 74.1 µs per loop

Getting Started with Basic Numpy Array

Create an array

Clobber the namespace so we dont have to np.

In [1]:
import numpy as np

Create an np array. You can pass any type of python seq: list, tuples, etc

In [25]:
a = np.array([0,1,2,3,4,5])

In [26]:
a

Out[26]:
array([0, 1, 2, 3, 4, 5])

Multidimensional array using list of lists

In [3]:
m = np.array([[1,2,3], [4,5,6]])

In [5]:
m.shape

Out[5]:
(2, 3)

In [ ]:

In [ ]:

In [27]:

Out[27]:
[0, 1, 2, 3, 4, 5]

In [28]:
# what type is a
type(a)

Out[28]:
numpy.ndarray

In [29]:
# what is the numerica type of the elements in the array
a.dtype

Out[29]:
dtype('int64')

In [30]:
# What shape (dimensions) is the array
a.shape

Out[30]:
(6,)

In [33]:
# Bytes per element. 32bit integers should be 4 bytes
a.itemsize

Out[33]:
8

In [34]:
# Total size in bytes of the array
a.nbytes

Out[34]:
48

In [35]:
# Beware of type coercion
# a holds dtypes int32
print(a)
a[0] = 10.38383
print(a)

[0 1 2 3 4 5]
[10  1  2  3  4  5]

In [37]:
x = np.array([0,1,1.5,3])
y = np.array([1,2,3,1])

Operations

In [7]:

In [8]:
# Element wise subtraction

In [ ]:

Do Some Vector Math Not for Loop Math

In [43]:
%%%timeit
dy = y[1:] - y[:-1]

The slowest run took 29.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 813 ns per loop

%%capture captures the result of the operation into a var

In [40]:
%%capture timeit_result
%timeit python_list1 = range(1,1000)
%timeit python_list2 = np.arange(1,1000)

In [41]:
print(timeit_result)

The slowest run took 9.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 196 ns per loop
The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.4 us per loop

Statistical Analysis

In [19]:
data_set = random.random((2,3))
print(data_set)

[[ 0.93613618  0.33032079  0.19598773]
[ 0.36707494  0.7528012   0.96362384]]

In [ ]:

In [18]:
# example of namespace....cant access np.max and builtin max is being used

In [ ]:

In [17]:
max(data_set[0])

Out[17]:
0.95525652078806722