# NumPy and SciPy examples

## Basic NumPy

NumPy defines a basic data type called an array (actually a numpy.ndarray)

``````

In [3]:

import numpy as np
a = np.zeros(3)  # Create an array of zeros
print a

``````
``````

[ 0.  0.  0.]

``````
``````

In [4]:

type(a)

``````
``````

Out[4]:

numpy.ndarray

``````

Note that array data must be homogeneous

The most important data types are:

• float64: 64 bit floating point number
• float32: 32 bit floating point number
• int64: 64 bit integer
• int32: 32 bit integer
• bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc

On most machines, the default dtype for arrays is `float64`

``````

In [5]:

a = np.zeros(3)
type(a[0])

``````
``````

Out[5]:

numpy.float64

``````

When we create an array such as

``````

In [6]:

z = np.zeros(10)

``````

`z` is a "flat" array with no dimension--- neither row nor column vecto

``````

In [7]:

z.shape

``````
``````

Out[7]:

(10,)

``````

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma)

To give it dimension, we can change the `shape` attribute

For example, let's make it a column vector

``````

In [8]:

z.shape = (10, 1)

``````
``````

In [9]:

z

``````
``````

Out[9]:

array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])

``````
``````

In [10]:

z = np.zeros(4)
z.shape = (2, 2)
z

``````
``````

Out[10]:

array([[ 0.,  0.],
[ 0.,  0.]])

``````

### Creating arrays

Creating empty arrays --- initializing memory:

``````

In [11]:

z = np.empty(3)
z

``````
``````

Out[11]:

array([  6.92366233e-310,   6.92366233e-310,   0.00000000e+000])

``````

These are just garbage numbers --- whatever was in those memory slots

Here's how to make a regular gird sequence

``````

In [12]:

z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
z

``````
``````

Out[12]:

array([ 2. ,  2.5,  3. ,  3.5,  4. ])

``````
``````

In [13]:

z = np.identity(2)
z

``````
``````

Out[13]:

array([[ 1.,  0.],
[ 0.,  1.]])

``````

Arrays can be made from lists or tuples

``````

In [14]:

z = np.array([10, 20])
z

``````
``````

Out[14]:

array([10, 20])

``````
``````

In [15]:

z = np.array((10, 20), dtype=float)
z

``````
``````

Out[15]:

array([ 10.,  20.])

``````
``````

In [16]:

z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z

``````
``````

Out[16]:

array([[1, 2],
[3, 4]])

``````

### Array indexing

``````

In [17]:

z = np.linspace(1, 2, 5)
z

``````
``````

Out[17]:

array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])

``````
``````

In [18]:

z[0]  # First element

``````
``````

Out[18]:

1.0

``````
``````

In [19]:

z[-1]  # Special syntax for last element

``````
``````

Out[19]:

2.0

``````
``````

In [20]:

z[0:2]  # Meaning: Two elements, starting from element 0

``````
``````

Out[20]:

array([ 1.  ,  1.25])

``````
``````

In [21]:

z = np.array([[1, 2], [3, 4]])
z

``````
``````

Out[21]:

array([[1, 2],
[3, 4]])

``````
``````

In [22]:

z[0, 0]

``````
``````

Out[22]:

1

``````
``````

In [25]:

z[0,:]  # First row

``````
``````

Out[25]:

array([1, 2])

``````
``````

In [23]:

z[:,0]  # First column

``````
``````

Out[23]:

array([1, 3])

``````
``````

In [26]:

z = np.linspace(2, 4, 5)
z

``````
``````

Out[26]:

array([ 2. ,  2.5,  3. ,  3.5,  4. ])

``````
``````

In [27]:

d = np.array([0, 1, 1, 0, 0], dtype=bool)
d

``````
``````

Out[27]:

array([False,  True,  True, False, False], dtype=bool)

``````
``````

In [28]:

z[d]

``````
``````

Out[28]:

array([ 2.5,  3. ])

``````

### Array methods

``````

In [30]:

A = np.array((4, 3, 2, 1))
A

``````
``````

Out[30]:

array([4, 3, 2, 1])

``````
``````

In [31]:

A.sort()

``````
``````

In [32]:

A

``````
``````

Out[32]:

array([1, 2, 3, 4])

``````
``````

In [33]:

A.mean(), A.sum(), A.max(), A.cumsum(), A.var()

``````
``````

Out[33]:

(2.5, 10, 4, array([ 1,  3,  6, 10]), 1.25)

``````
``````

In [36]:

A.shape = (2, 2)
A

``````
``````

Out[36]:

array([[1, 2],
[3, 4]])

``````
``````

In [37]:

A.T  # Transpose, equivalent to A.transpose()

``````
``````

Out[37]:

array([[1, 3],
[2, 4]])

``````

### Operations on arrays

``````

In [38]:

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

``````
``````

In [39]:

a + b

``````
``````

Out[39]:

array([ 6,  8, 10, 12])

``````
``````

In [40]:

a - b

``````
``````

Out[40]:

array([-4, -4, -4, -4])

``````
``````

In [41]:

a + 10

``````
``````

Out[41]:

array([11, 12, 13, 14])

``````
``````

In [42]:

a.shape = 2, 2
b.shape = 2, 2

``````
``````

In [43]:

a

``````
``````

Out[43]:

array([[1, 2],
[3, 4]])

``````
``````

In [44]:

b

``````
``````

Out[44]:

array([[5, 6],
[7, 8]])

``````
``````

In [45]:

a * b # Pointwise multiplication!!

``````
``````

Out[45]:

array([[ 5, 12],
[21, 32]])

``````
``````

In [46]:

np.dot(a, b) # Matrix multiplication

``````
``````

Out[46]:

array([[19, 22],
[43, 50]])

``````

Note that in this is slated to change to `a @ b` in future versions of NumPy

### Comparisons

``````

In [48]:

z = np.array([2, 3])
y = np.array([2, 3])
z == y

``````
``````

Out[48]:

array([ True,  True], dtype=bool)

``````
``````

In [49]:

y[0] = 3
z == y

``````
``````

Out[49]:

array([False,  True], dtype=bool)

``````
``````

In [50]:

z = np.linspace(0, 10, 5)
z

``````
``````

Out[50]:

array([  0. ,   2.5,   5. ,   7.5,  10. ])

``````
``````

In [51]:

z > 3

``````
``````

Out[51]:

array([False, False,  True,  True,  True], dtype=bool)

``````
``````

In [52]:

z[z > 3]  # Conditional extraction

``````
``````

Out[52]:

array([  5. ,   7.5,  10. ])

``````

## SciPy

Let's just cover some simple examples --- references for further reading are below

### Statistics and distributions

Display figures in browser:

``````

In [31]:

%matplotlib inline

``````

Let's use `scipy.stats` to generate some data from the Beta distribution

``````

In [32]:

from scipy.stats import beta
q = beta(5, 5)      # Beta(a, b), with a = b = 5
obs = q.rvs(2000)   # 2000 observations

``````

Now let's histogram it and compare it to the original density

``````

In [33]:

from pylab import hist, plot, show
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()

``````
``````

``````

Other methods

``````

In [34]:

type(q)

``````
``````

Out[34]:

scipy.stats._distn_infrastructure.rv_frozen

``````
``````

In [36]:

dir(q)  # Let's see all its methods

``````
``````

Out[36]:

['__class__',
'__delattr__',
'__dict__',
'__doc__',
'__format__',
'__getattribute__',
'__hash__',
'__init__',
'__module__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'args',
'cdf',
'dist',
'entropy',
'interval',
'isf',
'kwds',
'logcdf',
'logpdf',
'logpmf',
'logsf',
'mean',
'median',
'moment',
'pdf',
'pmf',
'ppf',
'rvs',
'sf',
'stats',
'std',
'var']

``````
``````

In [54]:

q.cdf(0.5)

``````
``````

Out[54]:

0.50000000000000011

``````
``````

In [55]:

q.pdf(0.5)

``````
``````

Out[55]:

2.4609375000000009

``````
``````

In [56]:

q.mean()

``````
``````

Out[56]:

0.5

``````

Basic linear regression:

``````

In [43]:

from scipy.stats import linregress
n = 200
alpha, beta, sigma = 1, 2, 0.1
x = np.random.randn(n)  # n standard normals
y = alpha + beta * x + sigma * np.random.randn(n)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
print "intercept =", intercept

``````
``````

intercept = 1.00087870427

``````

### Roots and fixed points

Let's choose an arbitrary function to work with

``````

In [44]:

def f(x):
return np.sin(4 * (x - 0.25)) + x + x**20 - 1
x = np.linspace(0, 1, 100)
plot(x, f(x))
plot(x, 0 * x)

``````
``````

Out[44]:

[<matplotlib.lines.Line2D at 0x21e9750>]

``````
``````

In [45]:

from scipy.optimize import bisect  # Bisection algorithm --- slow but robust
bisect(f, 0, 1)

``````
``````

Out[45]:

0.4082935042797544

``````
``````

In [46]:

from scipy.optimize import newton  #  Newton's method --- fast but less robust
newton(f, 0.2)   # Start the search at initial condition x = 0.2

``````
``````

Out[46]:

0.40829350427935679

``````
``````

In [47]:

newton(f, 0.7)   # Start the search at x = 0.7 instead

``````
``````

Out[47]:

0.70017000000002816

``````

Here we see that the algorithm gets it wrong --- `newton` is fast but not robust

Let's try a hybrid method

``````

In [65]:

from scipy.optimize import brentq
brentq(f, 0, 1) # Hybrid method

``````
``````

Out[65]:

0.40829350427936706

``````
``````

In [66]:

timeit bisect(f, 0, 1)

``````
``````

10000 loops, best of 3: 160 µs per loop

``````
``````

In [67]:

timeit newton(f, 0.2)

``````
``````

10000 loops, best of 3: 38.9 µs per loop

``````
``````

In [68]:

timeit brentq(f, 0, 1)

``````
``````

10000 loops, best of 3: 42.5 µs per loop

``````

Note that the hybrid method is robust but still quite fast...

### Numerical optimization and integration

``````

In [69]:

from scipy.optimize import fminbound
fminbound(lambda x: x**2, -1, 2)  # Search in [-1, 2]

``````
``````

Out[69]:

0.0

``````
``````

In [70]:

integral, error = quad(lambda x: x**2, 0, 1)
integral

``````
``````

Out[70]:

0.33333333333333337

``````

``````

In [ ]:

``````