# NumPy

## Numerical Python

Provides an efficient way to store and manipulate arrays. Numpy is all about VECTORIZATION. Mental model is different than regular python and works with:

• "vectors"
• "arrays"
• "views"
• ndarray provides efficient storage and manipulation of 1d arrays (vectors), (NxM) matrices, higher dimensional datasets

### Example Using Object Oriented Techniques

``````

In [ ]:

``````
``````

In :

import random
class RandomWalker(object):
def __init__(self):
self.position = 0

def walk(self, n):
self.position = 0
for i in range(n):
yield self.position
self.position += 2*random.randint(0,1) - 1

``````

### Example of Cell Magic Using 3 '%%%'

``````

In :

%%%timeit
walker = RandomWalker()
walk = [position for position in walker.walk(10000)]

``````
``````

10 loops, best of 3: 13.4 ms per loop

``````

### Example Using Functional Programming

Remove class definition

``````

In :

def random_walk_f(n):
position = 0
walk = [position]
for i in range(n):
position = 2 * random.randint(0,1) - 1
walk.append(position)
return walk

``````
``````

In :

%%%timeit
walk = random_walk_f(10000)

``````
``````

100 loops, best of 3: 11.4 ms per loop

``````

small improvement in time

### Vectorized Approach Like When You Did Things in MATLAB :(

Get rid of the loop

``````

In :

from itertools import accumulate
def random_walker_v(n):
steps = random.sample([1, -1] * n, n)
return list(accumulate(steps))

``````
``````

In :

%%%timeit
walk = random_walker_v(10000)

``````
``````

100 loops, best of 3: 6.18 ms per loop

``````

WOW 2x as fast

### Numpy ifying

``````

In :

import numpy as np
def random_walker_np(n):
steps = 2 * np.random.randint(0, 2, size=n) - 1
return np.cumsum(steps)

``````
``````

In :

%%%timeit
walk = random_walker_np(10000)

``````
``````

The slowest run took 2164.96 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 74.1 µs per loop

``````

# Getting Started with Basic Numpy Array

## Create an array

Clobber the namespace so we dont have to np.

``````

In :

import numpy as np

``````

Create an np array. You can pass any type of python seq: list, tuples, etc

``````

In :

a = np.array([0,1,2,3,4,5])

``````
``````

In :

a

``````
``````

Out:

array([0, 1, 2, 3, 4, 5])

``````

### Multidimensional array using list of lists

``````

In :

m = np.array([[1,2,3], [4,5,6]])

``````
``````

In :

m.shape

``````
``````

Out:

(2, 3)

``````
``````

In [ ]:

``````
``````

In [ ]:

``````
``````

In :

``````
``````

Out:

[0, 1, 2, 3, 4, 5]

``````
``````

In :

# what type is a
type(a)

``````
``````

Out:

numpy.ndarray

``````
``````

In :

# what is the numerica type of the elements in the array
a.dtype

``````
``````

Out:

dtype('int64')

``````
``````

In :

# What shape (dimensions) is the array
a.shape

``````
``````

Out:

(6,)

``````
``````

In :

# Bytes per element. 32bit integers should be 4 bytes
a.itemsize

``````
``````

Out:

8

``````
``````

In :

# Total size in bytes of the array
a.nbytes

``````
``````

Out:

48

``````
``````

In :

# Beware of type coercion
# a holds dtypes int32
print(a)
a = 10.38383
print(a)

``````
``````

[0 1 2 3 4 5]
[10  1  2  3  4  5]

``````
``````

In :

x = np.array([0,1,1.5,3])
y = np.array([1,2,3,1])

``````

## Operations

``````

In :

``````
``````

In :

# Element wise subtraction

``````
``````

In [ ]:

``````

### Do Some Vector Math Not for Loop Math

``````

In :

%%%timeit
dy = y[1:] - y[:-1]

``````
``````

The slowest run took 29.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 813 ns per loop

``````

%%capture captures the result of the operation into a var

``````

In :

%%capture timeit_result
%timeit python_list1 = range(1,1000)
%timeit python_list2 = np.arange(1,1000)

``````
``````

In :

print(timeit_result)

``````
``````

The slowest run took 9.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 196 ns per loop
The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.4 us per loop

``````

## Statistical Analysis

``````

In :

data_set = random.random((2,3))
print(data_set)

``````
``````

[[ 0.93613618  0.33032079  0.19598773]
[ 0.36707494  0.7528012   0.96362384]]

``````
``````

In [ ]:

``````
``````

In :

# example of namespace....cant access np.max and builtin max is being used

``````
``````

In [ ]:

``````
``````

In :

max(data_set)

``````
``````

Out:

0.95525652078806722

``````