Why NumPy?

Python can be slow:


In [ ]:
from __future__ import division, print_function
import itertools

In [ ]:
lst1 = range(1000000)
lst2 = lst1[::-1]  #Reverse the list

In [ ]:
%timeit [l1 + l2 for l1, l2 in itertools.izip(lst1, lst2)]

In [ ]:
#Cycles per addition (approx)
((109e-3)*(2.4e9))/1000000

Why is Python so slow?


In [ ]:
from IPython.core.display import Image
Image(filename="python_memory_model.png")

Can we do any better?


In [ ]:
import numpy as np

In [ ]:
arr1 = np.array(lst1)
arr2 = np.array(lst2)

In [ ]:
%timeit arr1+arr2

In [ ]:
((1.97e-3)*(2.4e9))/1000000

In [ ]:
Image(filename="numpy_memory_model.png")

But, there are some tradeoffs to this.


In [ ]:
a = np.array([(2<<30)-1],dtype=np.int32)

In [ ]:
a

In [ ]:
a+1

Uh-oh.


In [ ]:
a = np.array([-.5])

In [ ]:
a

In [ ]:
a/0

In [ ]:
a**.5

Performance is not a free lunch either


In [ ]:
temperatures_f = np.array([i for i in xrange(32,212)])

In [ ]:
temperatures_c = (temperatures_f -32)*5/9.0

Each of the above three arithmetic operations creates a temporary value

Numpy offers a number of convenient ways to create arrays


In [ ]:
a = np.arange(0, 20, 2, dtype=None)

In [ ]:
np.empty((4,5), dtype=float, order=None)

In [ ]:
np.zeros((4,5), dtype=float, order=None)

In [ ]:
np.ones((4,5), dtype=float, order=None)

In [ ]:
np.asarray([[i for i in xrange(20)], [j for j in xrange(10)]], dtype=None)
#These will fail as "collection" and iterable is not defined
#np.array(collection, dtype=None, copy=True, order=None)
#np.fromiter(iterable, dtype, count=-1)

In [ ]:
a = np.arange(12)
a

In [ ]:
a = a.reshape(3,4)
a

In [ ]:
(a*10).reshape(2,6)

In [ ]:
a

In [ ]:
a * [2,4,6,8] #The 4-vector will be broadcast

In [ ]:
a * [2,3,4] #This will cause an error, as a 3-vector cannot be broadcast (the dimensions do not match)

We can get views of the data by indexing


In [ ]:
a = np.arange(12)
a

In [ ]:
b = a[::2]
b

In [ ]:
b[2] = -1
b

In [ ]:
a

In [ ]:
b.flags['OWNDATA']

In [ ]:
b.base is a

We can index by a list of ints, and get an array of those items


In [ ]:
a = np.arange(10)*10
a

In [ ]:
b = a[[4,3,-2]]
b

In [ ]:
b.flags["OWNDATA"] #Note that this gives us a copy of the data

We can index by a list of boolean values as well


In [ ]:
a = a.reshape((5,2))

In [ ]:
a[(a%3)==0].shape

In [ ]:
b = ((a%3)==0)

In [ ]:
a[b]

Let's dive a bit deeper into the memory layout


In [ ]:
a = np.arange(3000000)
a.shape = (5,3,200000)
a

In [ ]:
b = a.swapaxes(1,2) #Swap the last two axes
print(b.shape)
print(a.shape)
print(b.flags["OWNDATA"])

When changing the shape, it helps to remember how arrays are laid out in memory


In [ ]:
a.shape = (5,600000)

In [ ]:
a.shape = (5,3,200000) #Reset the shape of a, before we reshape a different way

In [ ]:
a.shape = (1000000,3)

In [ ]:
print(b.shape) 
b.shape = (1000000,3)

If we take a look at the flags of the arrays, we can see why this error message happened


In [ ]:
a.flags

In [ ]:
b.flags

We can also do some fun graphing with matplotlib


In [ ]:
import random
%pylab inline --no-import-all

In [ ]:
pylab.ion()
pylab.figure()
pylab.plot([random.gauss(10, 3) for i in xrange(30)], 'g')
pylab.ioff()

SymPy lets us do symbolic manipulation


In [ ]:
from sympy import symbols, limit, log, integrate, Integral, sqrt

In [ ]:
x, y = symbols('x y')

In [ ]:
limit (x*log (x),x,0)

In [ ]:
limit (x*log (x),x,20)

In [ ]:
integrate(x/(x**2+2*x+1), x)

In [ ]:
from sympy import latex, init_printing

In [ ]:
integrate(x/(x**2+2*x+1), x)

That works, but it's a bit ugly. Can we do any better?


In [ ]:
init_printing()

In [ ]:
integrate(x/(x**2+2*x+1), x)

In [ ]:
Integral(sqrt(1/x), x)

We can solve equations symbolically


In [ ]:
from sympy import solve, Eq
solve(Eq(x**2, 1), x)

DiffEqs? No sweat!


In [ ]:
from sympy import dsolve, Function, sin
f, g = symbols('f g', cls=Function)
diffeq = Eq(f(x).diff(x, x) - 2*f(x).diff(x) + f(x), sin(x))
diffeq

In [ ]:
dsolve(diffeq, f(x))

In [ ]:
from sympy import Matrix
M = Matrix(( [1, 2, 1], [6, -1, 0], [-1, -2, -1] ))
M.eigenvals()

In [ ]:
M.eigenvects()

In [ ]:
M = Matrix(( [1, 2, 3], [3, 6, 2], [2, 0, 1] ))
M.eigenvals()