NumPy

Numerical Python

Provides an efficient way to store and manipulate arrays. Numpy is all about VECTORIZATION. Mental model is different than regular python and works with:

  • "vectors"
  • "arrays"
  • "views"
  • "ufuncs" advanced
  • ndarray provides efficient storage and manipulation of 1d arrays (vectors), (NxM) matrices, higher dimensional datasets

Example Using Object Oriented Techniques


In [ ]:


In [5]:
import random
class RandomWalker(object):
        def __init__(self):
            self.position = 0
            
        def walk(self, n):
            self.position = 0
            for i in range(n):
                yield self.position
                self.position += 2*random.randint(0,1) - 1

Example of Cell Magic Using 3 '%%%'


In [10]:
%%%timeit
walker = RandomWalker()
walk = [position for position in walker.walk(10000)]


10 loops, best of 3: 13.4 ms per loop

Line magic uses 1 '%'

Example Using Functional Programming

Remove class definition


In [12]:
def random_walk_f(n):
    position = 0
    walk = [position]
    for i in range(n):
        position = 2 * random.randint(0,1) - 1
        walk.append(position)
    return walk

In [13]:
%%%timeit
walk = random_walk_f(10000)


100 loops, best of 3: 11.4 ms per loop

small improvement in time

Vectorized Approach Like When You Did Things in MATLAB :(

Get rid of the loop


In [14]:
from itertools import accumulate
def random_walker_v(n):
    steps = random.sample([1, -1] * n, n)
    return list(accumulate(steps))

In [16]:
%%%timeit
walk = random_walker_v(10000)


100 loops, best of 3: 6.18 ms per loop

WOW 2x as fast

Numpy ifying


In [17]:
import numpy as np
def random_walker_np(n):
    steps = 2 * np.random.randint(0, 2, size=n) - 1
    return np.cumsum(steps)

In [18]:
%%%timeit
walk = random_walker_np(10000)


The slowest run took 2164.96 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 74.1 µs per loop

Getting Started with Basic Numpy Array

Create an array

Clobber the namespace so we dont have to np.


In [1]:
import numpy as np

Create an np array. You can pass any type of python seq: list, tuples, etc


In [25]:
a = np.array([0,1,2,3,4,5])

In [26]:
a


Out[26]:
array([0, 1, 2, 3, 4, 5])

Multidimensional array using list of lists


In [3]:
m = np.array([[1,2,3], [4,5,6]])

In [5]:
m.shape


Out[5]:
(2, 3)

In [ ]:


In [ ]:


In [27]:
ad = a.data
list(ad)


Out[27]:
[0, 1, 2, 3, 4, 5]

In [28]:
# what type is a
type(a)


Out[28]:
numpy.ndarray

In [29]:
# what is the numerica type of the elements in the array
a.dtype


Out[29]:
dtype('int64')

In [30]:
# What shape (dimensions) is the array
a.shape


Out[30]:
(6,)

In [33]:
# Bytes per element. 32bit integers should be 4 bytes
a.itemsize


Out[33]:
8

In [34]:
# Total size in bytes of the array
a.nbytes


Out[34]:
48

In [35]:
# Beware of type coercion
# a holds dtypes int32
print(a)
a[0] = 10.38383
print(a)


[0 1 2 3 4 5]
[10  1  2  3  4  5]

In [37]:
x = np.array([0,1,1.5,3])
y = np.array([1,2,3,1])

Reshape and Resize

Operations


In [7]:
# Element wise addition

In [8]:
# Element wise subtraction

In [ ]:

Do Some Vector Math Not for Loop Math


In [43]:
%%%timeit
dy = y[1:] - y[:-1]


The slowest run took 29.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 813 ns per loop

%%capture captures the result of the operation into a var


In [40]:
%%capture timeit_result
%timeit python_list1 = range(1,1000)
%timeit python_list2 = np.arange(1,1000)

In [41]:
print(timeit_result)


The slowest run took 9.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 196 ns per loop
The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.4 us per loop

Statistical Analysis


In [19]:
data_set = random.random((2,3))
print(data_set)


[[ 0.93613618  0.33032079  0.19598773]
 [ 0.36707494  0.7528012   0.96362384]]

In [ ]:


In [18]:
# example of namespace....cant access np.max and builtin max is being used

In [ ]:


In [17]:
max(data_set[0])


Out[17]:
0.95525652078806722