NumPy and SciPy examples

Basic NumPy

NumPy defines a basic data type called an array (actually a numpy.ndarray)


In [3]:
import numpy as np
a = np.zeros(3)  # Create an array of zeros
print a


[ 0.  0.  0.]

In [4]:
type(a)


Out[4]:
numpy.ndarray

Note that array data must be homogeneous

The most important data types are:

  • float64: 64 bit floating point number
  • float32: 32 bit floating point number
  • int64: 64 bit integer
  • int32: 32 bit integer
  • bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc

On most machines, the default dtype for arrays is float64


In [5]:
a = np.zeros(3)
type(a[0])


Out[5]:
numpy.float64

When we create an array such as


In [6]:
z = np.zeros(10)

z is a "flat" array with no dimension--- neither row nor column vecto


In [7]:
z.shape


Out[7]:
(10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma)

To give it dimension, we can change the shape attribute

For example, let's make it a column vector


In [8]:
z.shape = (10, 1)

In [9]:
z


Out[9]:
array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])

In [10]:
z = np.zeros(4)
z.shape = (2, 2)
z


Out[10]:
array([[ 0.,  0.],
       [ 0.,  0.]])

Creating arrays

Creating empty arrays --- initializing memory:


In [11]:
z = np.empty(3)
z


Out[11]:
array([  6.92366233e-310,   6.92366233e-310,   0.00000000e+000])

These are just garbage numbers --- whatever was in those memory slots

Here's how to make a regular gird sequence


In [12]:
z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
z


Out[12]:
array([ 2. ,  2.5,  3. ,  3.5,  4. ])

In [13]:
z = np.identity(2)
z


Out[13]:
array([[ 1.,  0.],
       [ 0.,  1.]])

Arrays can be made from lists or tuples


In [14]:
z = np.array([10, 20]) 
z


Out[14]:
array([10, 20])

In [15]:
z = np.array((10, 20), dtype=float) 
z


Out[15]:
array([ 10.,  20.])

In [16]:
z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z


Out[16]:
array([[1, 2],
       [3, 4]])

Array indexing


In [17]:
z = np.linspace(1, 2, 5)
z


Out[17]:
array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])

In [18]:
z[0]  # First element


Out[18]:
1.0

In [19]:
z[-1]  # Special syntax for last element


Out[19]:
2.0

In [20]:
z[0:2]  # Meaning: Two elements, starting from element 0


Out[20]:
array([ 1.  ,  1.25])

In [21]:
z = np.array([[1, 2], [3, 4]])
z


Out[21]:
array([[1, 2],
       [3, 4]])

In [22]:
z[0, 0]


Out[22]:
1

In [25]:
z[0,:]  # First row


Out[25]:
array([1, 2])

In [23]:
z[:,0]  # First column


Out[23]:
array([1, 3])

In [26]:
z = np.linspace(2, 4, 5)
z


Out[26]:
array([ 2. ,  2.5,  3. ,  3.5,  4. ])

In [27]:
d = np.array([0, 1, 1, 0, 0], dtype=bool)
d


Out[27]:
array([False,  True,  True, False, False], dtype=bool)

In [28]:
z[d]


Out[28]:
array([ 2.5,  3. ])

Array methods


In [30]:
A = np.array((4, 3, 2, 1))
A


Out[30]:
array([4, 3, 2, 1])

In [31]:
A.sort()

In [32]:
A


Out[32]:
array([1, 2, 3, 4])

In [33]:
A.mean(), A.sum(), A.max(), A.cumsum(), A.var()


Out[33]:
(2.5, 10, 4, array([ 1,  3,  6, 10]), 1.25)

In [36]:
A.shape = (2, 2)
A


Out[36]:
array([[1, 2],
       [3, 4]])

In [37]:
A.T  # Transpose, equivalent to A.transpose()


Out[37]:
array([[1, 3],
       [2, 4]])

Operations on arrays


In [38]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

In [39]:
a + b


Out[39]:
array([ 6,  8, 10, 12])

In [40]:
a - b


Out[40]:
array([-4, -4, -4, -4])

In [41]:
a + 10


Out[41]:
array([11, 12, 13, 14])

In [42]:
a.shape = 2, 2
b.shape = 2, 2

In [43]:
a


Out[43]:
array([[1, 2],
       [3, 4]])

In [44]:
b


Out[44]:
array([[5, 6],
       [7, 8]])

In [45]:
a * b # Pointwise multiplication!!


Out[45]:
array([[ 5, 12],
       [21, 32]])

In [46]:
np.dot(a, b) # Matrix multiplication


Out[46]:
array([[19, 22],
       [43, 50]])

Note that in this is slated to change to a @ b in future versions of NumPy

Comparisons


In [48]:
z = np.array([2, 3])
y = np.array([2, 3])
z == y


Out[48]:
array([ True,  True], dtype=bool)

In [49]:
y[0] = 3
z == y


Out[49]:
array([False,  True], dtype=bool)

In [50]:
z = np.linspace(0, 10, 5)
z


Out[50]:
array([  0. ,   2.5,   5. ,   7.5,  10. ])

In [51]:
z > 3


Out[51]:
array([False, False,  True,  True,  True], dtype=bool)

In [52]:
z[z > 3]  # Conditional extraction


Out[52]:
array([  5. ,   7.5,  10. ])

SciPy

Let's just cover some simple examples --- references for further reading are below

Statistics and distributions

Display figures in browser:


In [31]:
%matplotlib inline

Let's use scipy.stats to generate some data from the Beta distribution


In [32]:
from scipy.stats import beta
q = beta(5, 5)      # Beta(a, b), with a = b = 5
obs = q.rvs(2000)   # 2000 observations

Now let's histogram it and compare it to the original density


In [33]:
from pylab import hist, plot, show
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()


Other methods


In [34]:
type(q)


Out[34]:
scipy.stats._distn_infrastructure.rv_frozen

In [36]:
dir(q)  # Let's see all its methods


Out[36]:
['__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'args',
 'cdf',
 'dist',
 'entropy',
 'interval',
 'isf',
 'kwds',
 'logcdf',
 'logpdf',
 'logpmf',
 'logsf',
 'mean',
 'median',
 'moment',
 'pdf',
 'pmf',
 'ppf',
 'rvs',
 'sf',
 'stats',
 'std',
 'var']

In [54]:
q.cdf(0.5)


Out[54]:
0.50000000000000011

In [55]:
q.pdf(0.5)


Out[55]:
2.4609375000000009

In [56]:
q.mean()


Out[56]:
0.5

Basic linear regression:


In [43]:
from scipy.stats import linregress
n = 200
alpha, beta, sigma = 1, 2, 0.1
x = np.random.randn(n)  # n standard normals
y = alpha + beta * x + sigma * np.random.randn(n)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
print "gradient =", gradient
print "intercept =", intercept


gradient = 2.01866616159
intercept = 1.00087870427

Roots and fixed points

Let's choose an arbitrary function to work with


In [44]:
def f(x):
     return np.sin(4 * (x - 0.25)) + x + x**20 - 1
x = np.linspace(0, 1, 100)
plot(x, f(x))
plot(x, 0 * x)


Out[44]:
[<matplotlib.lines.Line2D at 0x21e9750>]

In [45]:
from scipy.optimize import bisect  # Bisection algorithm --- slow but robust
bisect(f, 0, 1)


Out[45]:
0.4082935042797544

In [46]:
from scipy.optimize import newton  #  Newton's method --- fast but less robust
newton(f, 0.2)   # Start the search at initial condition x = 0.2


Out[46]:
0.40829350427935679

In [47]:
newton(f, 0.7)   # Start the search at x = 0.7 instead


Out[47]:
0.70017000000002816

Here we see that the algorithm gets it wrong --- newton is fast but not robust

Let's try a hybrid method


In [65]:
from scipy.optimize import brentq
brentq(f, 0, 1) # Hybrid method


Out[65]:
0.40829350427936706

In [66]:
timeit bisect(f, 0, 1)


10000 loops, best of 3: 160 µs per loop

In [67]:
timeit newton(f, 0.2)


10000 loops, best of 3: 38.9 µs per loop

In [68]:
timeit brentq(f, 0, 1)


10000 loops, best of 3: 42.5 µs per loop

Note that the hybrid method is robust but still quite fast...

Numerical optimization and integration


In [69]:
from scipy.optimize import fminbound
fminbound(lambda x: x**2, -1, 2)  # Search in [-1, 2]


Out[69]:
0.0

In [70]:
from scipy.integrate import quad
integral, error = quad(lambda x: x**2, 0, 1)
integral


Out[70]:
0.33333333333333337

More information


In [ ]: