NumPy and SciPy examples

Basic NumPy

NumPy defines a basic data type called an array (actually a numpy.ndarray)



In [3]:

    
import numpy as np
a = np.zeros(3)  # Create an array of zeros
print a









    



[ 0.  0.  0.]



In [4]:

    
type(a)









    Out[4]:





numpy.ndarray

Note that array data must be homogeneous

The most important data types are:

float64: 64 bit floating point number
float32: 32 bit floating point number
int64: 64 bit integer
int32: 32 bit integer
bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc

On most machines, the default dtype for arrays is float64



In [5]:

    
a = np.zeros(3)
type(a[0])









    Out[5]:





numpy.float64

When we create an array such as



In [6]:

    
z = np.zeros(10)

z is a "flat" array with no dimension--- neither row nor column vecto



In [7]:

    
z.shape









    Out[7]:





(10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma)

To give it dimension, we can change the shape attribute

For example, let's make it a column vector



In [8]:

    
z.shape = (10, 1)



In [9]:

    
z









    Out[9]:





array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])



In [10]:

    
z = np.zeros(4)
z.shape = (2, 2)
z









    Out[10]:





array([[ 0.,  0.],
       [ 0.,  0.]])

Creating arrays

Creating empty arrays --- initializing memory:



In [11]:

    
z = np.empty(3)
z









    Out[11]:





array([  6.92366233e-310,   6.92366233e-310,   0.00000000e+000])

These are just garbage numbers --- whatever was in those memory slots

Here's how to make a regular gird sequence



In [12]:

    
z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
z









    Out[12]:





array([ 2. ,  2.5,  3. ,  3.5,  4. ])



In [13]:

    
z = np.identity(2)
z









    Out[13]:





array([[ 1.,  0.],
       [ 0.,  1.]])

Arrays can be made from lists or tuples



In [14]:

    
z = np.array([10, 20]) 
z









    Out[14]:





array([10, 20])



In [15]:

    
z = np.array((10, 20), dtype=float) 
z









    Out[15]:





array([ 10.,  20.])



In [16]:

    
z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z









    Out[16]:





array([[1, 2],
       [3, 4]])

Array indexing



In [17]:

    
z = np.linspace(1, 2, 5)
z









    Out[17]:





array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])



In [18]:

    
z[0]  # First element









    Out[18]:





1.0



In [19]:

    
z[-1]  # Special syntax for last element









    Out[19]:





2.0



In [20]:

    
z[0:2]  # Meaning: Two elements, starting from element 0









    Out[20]:





array([ 1.  ,  1.25])



In [21]:

    
z = np.array([[1, 2], [3, 4]])
z









    Out[21]:





array([[1, 2],
       [3, 4]])



In [22]:

    
z[0, 0]









    Out[22]:





1



In [25]:

    
z[0,:]  # First row









    Out[25]:





array([1, 2])



In [23]:

    
z[:,0]  # First column









    Out[23]:





array([1, 3])



In [26]:

    
z = np.linspace(2, 4, 5)
z









    Out[26]:





array([ 2. ,  2.5,  3. ,  3.5,  4. ])



In [27]:

    
d = np.array([0, 1, 1, 0, 0], dtype=bool)
d









    Out[27]:





array([False,  True,  True, False, False], dtype=bool)



In [28]:

    
z[d]









    Out[28]:





array([ 2.5,  3. ])

Array methods



In [30]:

    
A = np.array((4, 3, 2, 1))
A









    Out[30]:





array([4, 3, 2, 1])



In [31]:

    
A.sort()



In [32]:

    
A









    Out[32]:





array([1, 2, 3, 4])



In [33]:

    
A.mean(), A.sum(), A.max(), A.cumsum(), A.var()









    Out[33]:





(2.5, 10, 4, array([ 1,  3,  6, 10]), 1.25)



In [36]:

    
A.shape = (2, 2)
A









    Out[36]:





array([[1, 2],
       [3, 4]])



In [37]:

    
A.T  # Transpose, equivalent to A.transpose()









    Out[37]:





array([[1, 3],
       [2, 4]])

Operations on arrays



In [38]:

    
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])



In [39]:

    
a + b









    Out[39]:





array([ 6,  8, 10, 12])



In [40]:

    
a - b









    Out[40]:





array([-4, -4, -4, -4])



In [41]:

    
a + 10









    Out[41]:





array([11, 12, 13, 14])



In [42]:

    
a.shape = 2, 2
b.shape = 2, 2



In [43]:

    
a









    Out[43]:





array([[1, 2],
       [3, 4]])



In [44]:

    
b









    Out[44]:





array([[5, 6],
       [7, 8]])



In [45]:

    
a * b # Pointwise multiplication!!









    Out[45]:





array([[ 5, 12],
       [21, 32]])



In [46]:

    
np.dot(a, b) # Matrix multiplication









    Out[46]:





array([[19, 22],
       [43, 50]])

Note that in this is slated to change to a @ b in future versions of NumPy

Comparisons



In [48]:

    
z = np.array([2, 3])
y = np.array([2, 3])
z == y









    Out[48]:





array([ True,  True], dtype=bool)



In [49]:

    
y[0] = 3
z == y









    Out[49]:





array([False,  True], dtype=bool)



In [50]:

    
z = np.linspace(0, 10, 5)
z









    Out[50]:





array([  0. ,   2.5,   5. ,   7.5,  10. ])



In [51]:

    
z > 3









    Out[51]:





array([False, False,  True,  True,  True], dtype=bool)



In [52]:

    
z[z > 3]  # Conditional extraction









    Out[52]:





array([  5. ,   7.5,  10. ])

SciPy

Let's just cover some simple examples --- references for further reading are below

Statistics and distributions

Display figures in browser:



In [31]:

    
%matplotlib inline

Let's use scipy.stats to generate some data from the Beta distribution



In [32]:

    
from scipy.stats import beta
q = beta(5, 5)      # Beta(a, b), with a = b = 5
obs = q.rvs(2000)   # 2000 observations

Now let's histogram it and compare it to the original density



In [33]:

    
from pylab import hist, plot, show
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()

Other methods



In [34]:

    
type(q)









    Out[34]:





scipy.stats._distn_infrastructure.rv_frozen



In [36]:

    
dir(q)  # Let's see all its methods









    Out[36]:





['__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'args',
 'cdf',
 'dist',
 'entropy',
 'interval',
 'isf',
 'kwds',
 'logcdf',
 'logpdf',
 'logpmf',
 'logsf',
 'mean',
 'median',
 'moment',
 'pdf',
 'pmf',
 'ppf',
 'rvs',
 'sf',
 'stats',
 'std',
 'var']



In [54]:

    
q.cdf(0.5)









    Out[54]:





0.50000000000000011



In [55]:

    
q.pdf(0.5)









    Out[55]:





2.4609375000000009



In [56]:

    
q.mean()









    Out[56]:





0.5

Basic linear regression:



In [43]:

    
from scipy.stats import linregress
n = 200
alpha, beta, sigma = 1, 2, 0.1
x = np.random.randn(n)  # n standard normals
y = alpha + beta * x + sigma * np.random.randn(n)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
print "gradient =", gradient
print "intercept =", intercept









    



gradient = 2.01866616159
intercept = 1.00087870427

Roots and fixed points

Let's choose an arbitrary function to work with



In [44]:

    
def f(x):
     return np.sin(4 * (x - 0.25)) + x + x**20 - 1
x = np.linspace(0, 1, 100)
plot(x, f(x))
plot(x, 0 * x)









    Out[44]:





[<matplotlib.lines.Line2D at 0x21e9750>]



In [45]:

    
from scipy.optimize import bisect  # Bisection algorithm --- slow but robust
bisect(f, 0, 1)









    Out[45]:





0.4082935042797544



In [46]:

    
from scipy.optimize import newton  #  Newton's method --- fast but less robust
newton(f, 0.2)   # Start the search at initial condition x = 0.2









    Out[46]:





0.40829350427935679



In [47]:

    
newton(f, 0.7)   # Start the search at x = 0.7 instead









    Out[47]:





0.70017000000002816

Here we see that the algorithm gets it wrong --- newton is fast but not robust

Let's try a hybrid method



In [65]:

    
from scipy.optimize import brentq
brentq(f, 0, 1) # Hybrid method









    Out[65]:





0.40829350427936706



In [66]:

    
timeit bisect(f, 0, 1)









    



10000 loops, best of 3: 160 µs per loop



In [67]:

    
timeit newton(f, 0.2)









    



10000 loops, best of 3: 38.9 µs per loop



In [68]:

    
timeit brentq(f, 0, 1)









    



10000 loops, best of 3: 42.5 µs per loop

Note that the hybrid method is robust but still quite fast...

Numerical optimization and integration



In [69]:

    
from scipy.optimize import fminbound
fminbound(lambda x: x**2, -1, 2)  # Search in [-1, 2]









    Out[69]:





0.0



In [70]:

    
from scipy.integrate import quad
integral, error = quad(lambda x: x**2, 0, 1)
integral









    Out[70]:





0.33333333333333337

More information

linear algebra: http://docs.scipy.org/doc/scipy/reference/linalg.html
numerical integration: http://docs.scipy.org/doc/scipy/reference/integrate.html
interpolation: http://docs.scipy.org/doc/scipy/reference/interpolate.html
optimization: http://docs.scipy.org/doc/scipy/reference/optimize.html
distributions and random number generation: http://docs.scipy.org/doc/scipy/reference/stats.html
signal processing: http://docs.scipy.org/doc/scipy/reference/signal.html



In [ ]: