Why NumPy?

Python can be slow:



In [ ]:

    
from __future__ import division, print_function
import itertools



In [ ]:

    
lst1 = range(1000000)
lst2 = lst1[::-1]  #Reverse the list



In [ ]:

    
%timeit [l1 + l2 for l1, l2 in itertools.izip(lst1, lst2)]



In [ ]:

    
#Cycles per addition (approx)
((109e-3)*(2.4e9))/1000000

Why is Python so slow?



In [ ]:

    
from IPython.core.display import Image
Image(filename="python_memory_model.png")

Can we do any better?



In [ ]:

    
import numpy as np



In [ ]:

    
arr1 = np.array(lst1)
arr2 = np.array(lst2)



In [ ]:

    
%timeit arr1+arr2



In [ ]:

    
((1.97e-3)*(2.4e9))/1000000



In [ ]:

    
Image(filename="numpy_memory_model.png")

But, there are some tradeoffs to this.



In [ ]:

    
a = np.array([(2<<30)-1],dtype=np.int32)



In [ ]:

    
a



In [ ]:

    
a+1

Uh-oh.



In [ ]:

    
a = np.array([-.5])



In [ ]:

    
a



In [ ]:

    
a/0



In [ ]:

    
a**.5

Performance is not a free lunch either



In [ ]:

    
temperatures_f = np.array([i for i in xrange(32,212)])



In [ ]:

    
temperatures_c = (temperatures_f -32)*5/9.0

Each of the above three arithmetic operations creates a temporary value

Numpy offers a number of convenient ways to create arrays



In [ ]:

    
a = np.arange(0, 20, 2, dtype=None)



In [ ]:

    
np.empty((4,5), dtype=float, order=None)



In [ ]:

    
np.zeros((4,5), dtype=float, order=None)



In [ ]:

    
np.ones((4,5), dtype=float, order=None)



In [ ]:

    
np.asarray([[i for i in xrange(20)], [j for j in xrange(10)]], dtype=None)
#These will fail as "collection" and iterable is not defined
#np.array(collection, dtype=None, copy=True, order=None)
#np.fromiter(iterable, dtype, count=-1)



In [ ]:

    
a = np.arange(12)
a



In [ ]:

    
a = a.reshape(3,4)
a



In [ ]:

    
(a*10).reshape(2,6)



In [ ]:

    
a



In [ ]:

    
a * [2,4,6,8] #The 4-vector will be broadcast



In [ ]:

    
a * [2,3,4] #This will cause an error, as a 3-vector cannot be broadcast (the dimensions do not match)

We can get views of the data by indexing



In [ ]:

    
a = np.arange(12)
a



In [ ]:

    
b = a[::2]
b



In [ ]:

    
b[2] = -1
b



In [ ]:

    
a



In [ ]:

    
b.flags['OWNDATA']



In [ ]:

    
b.base is a

We can index by a list of ints, and get an array of those items



In [ ]:

    
a = np.arange(10)*10
a



In [ ]:

    
b = a[[4,3,-2]]
b



In [ ]:

    
b.flags["OWNDATA"] #Note that this gives us a copy of the data

We can index by a list of boolean values as well



In [ ]:

    
a = a.reshape((5,2))



In [ ]:

    
a[(a%3)==0].shape



In [ ]:

    
b = ((a%3)==0)



In [ ]:

    
a[b]

Let's dive a bit deeper into the memory layout



In [ ]:

    
a = np.arange(3000000)
a.shape = (5,3,200000)
a



In [ ]:

    
b = a.swapaxes(1,2) #Swap the last two axes
print(b.shape)
print(a.shape)
print(b.flags["OWNDATA"])

When changing the shape, it helps to remember how arrays are laid out in memory



In [ ]:

    
a.shape = (5,600000)



In [ ]:

    
a.shape = (5,3,200000) #Reset the shape of a, before we reshape a different way



In [ ]:

    
a.shape = (1000000,3)



In [ ]:

    
print(b.shape) 
b.shape = (1000000,3)

If we take a look at the flags of the arrays, we can see why this error message happened



In [ ]:

    
a.flags



In [ ]:

    
b.flags

We can also do some fun graphing with matplotlib



In [ ]:

    
import random
%pylab inline --no-import-all



In [ ]:

    
pylab.ion()
pylab.figure()
pylab.plot([random.gauss(10, 3) for i in xrange(30)], 'g')
pylab.ioff()

SymPy lets us do symbolic manipulation



In [ ]:

    
from sympy import symbols, limit, log, integrate, Integral, sqrt



In [ ]:

    
x, y = symbols('x y')



In [ ]:

    
limit (x*log (x),x,0)



In [ ]:

    
limit (x*log (x),x,20)



In [ ]:

    
integrate(x/(x**2+2*x+1), x)



In [ ]:

    
from sympy import latex, init_printing



In [ ]:

    
integrate(x/(x**2+2*x+1), x)

That works, but it's a bit ugly. Can we do any better?



In [ ]:

    
init_printing()



In [ ]:

    
integrate(x/(x**2+2*x+1), x)



In [ ]:

    
Integral(sqrt(1/x), x)

We can solve equations symbolically



In [ ]:

    
from sympy import solve, Eq
solve(Eq(x**2, 1), x)

DiffEqs? No sweat!



In [ ]:

    
from sympy import dsolve, Function, sin
f, g = symbols('f g', cls=Function)
diffeq = Eq(f(x).diff(x, x) - 2*f(x).diff(x) + f(x), sin(x))
diffeq



In [ ]:

    
dsolve(diffeq, f(x))



In [ ]:

    
from sympy import Matrix
M = Matrix(( [1, 2, 1], [6, -1, 0], [-1, -2, -1] ))
M.eigenvals()



In [ ]:

    
M.eigenvects()



In [ ]:

    
M = Matrix(( [1, 2, 3], [3, 6, 2], [2, 0, 1] ))
M.eigenvals()