NumPy defines a basic data type called an array (actually a numpy.ndarray)
In [3]:
import numpy as np
a = np.zeros(3) # Create an array of zeros
print a
In [4]:
type(a)
Out[4]:
Note that array data must be homogeneous
The most important data types are:
There are also dtypes to represent complex numbers, unsigned integers, etc
On most machines, the default dtype for arrays is float64
In [5]:
a = np.zeros(3)
type(a[0])
Out[5]:
When we create an array such as
In [6]:
z = np.zeros(10)
z
is a "flat" array with no dimension--- neither row nor column vecto
In [7]:
z.shape
Out[7]:
Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma)
To give it dimension, we can change the shape
attribute
For example, let's make it a column vector
In [8]:
z.shape = (10, 1)
In [9]:
z
Out[9]:
In [10]:
z = np.zeros(4)
z.shape = (2, 2)
z
Out[10]:
Creating empty arrays --- initializing memory:
In [11]:
z = np.empty(3)
z
Out[11]:
These are just garbage numbers --- whatever was in those memory slots
Here's how to make a regular gird sequence
In [12]:
z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements
z
Out[12]:
In [13]:
z = np.identity(2)
z
Out[13]:
Arrays can be made from lists or tuples
In [14]:
z = np.array([10, 20])
z
Out[14]:
In [15]:
z = np.array((10, 20), dtype=float)
z
Out[15]:
In [16]:
z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists
z
Out[16]:
In [17]:
z = np.linspace(1, 2, 5)
z
Out[17]:
In [18]:
z[0] # First element
Out[18]:
In [19]:
z[-1] # Special syntax for last element
Out[19]:
In [20]:
z[0:2] # Meaning: Two elements, starting from element 0
Out[20]:
In [21]:
z = np.array([[1, 2], [3, 4]])
z
Out[21]:
In [22]:
z[0, 0]
Out[22]:
In [25]:
z[0,:] # First row
Out[25]:
In [23]:
z[:,0] # First column
Out[23]:
In [26]:
z = np.linspace(2, 4, 5)
z
Out[26]:
In [27]:
d = np.array([0, 1, 1, 0, 0], dtype=bool)
d
Out[27]:
In [28]:
z[d]
Out[28]:
In [30]:
A = np.array((4, 3, 2, 1))
A
Out[30]:
In [31]:
A.sort()
In [32]:
A
Out[32]:
In [33]:
A.mean(), A.sum(), A.max(), A.cumsum(), A.var()
Out[33]:
In [36]:
A.shape = (2, 2)
A
Out[36]:
In [37]:
A.T # Transpose, equivalent to A.transpose()
Out[37]:
In [38]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
In [39]:
a + b
Out[39]:
In [40]:
a - b
Out[40]:
In [41]:
a + 10
Out[41]:
In [42]:
a.shape = 2, 2
b.shape = 2, 2
In [43]:
a
Out[43]:
In [44]:
b
Out[44]:
In [45]:
a * b # Pointwise multiplication!!
Out[45]:
In [46]:
np.dot(a, b) # Matrix multiplication
Out[46]:
Note that in this is slated to change to a @ b
in future versions of NumPy
In [48]:
z = np.array([2, 3])
y = np.array([2, 3])
z == y
Out[48]:
In [49]:
y[0] = 3
z == y
Out[49]:
In [50]:
z = np.linspace(0, 10, 5)
z
Out[50]:
In [51]:
z > 3
Out[51]:
In [52]:
z[z > 3] # Conditional extraction
Out[52]:
Let's just cover some simple examples --- references for further reading are below
Display figures in browser:
In [31]:
%matplotlib inline
Let's use scipy.stats
to generate some data from the Beta distribution
In [32]:
from scipy.stats import beta
q = beta(5, 5) # Beta(a, b), with a = b = 5
obs = q.rvs(2000) # 2000 observations
Now let's histogram it and compare it to the original density
In [33]:
from pylab import hist, plot, show
hist(obs, bins=40, normed=True)
grid = np.linspace(0.01, 0.99, 100)
plot(grid, q.pdf(grid), 'k-', linewidth=2)
show()
Other methods
In [34]:
type(q)
Out[34]:
In [36]:
dir(q) # Let's see all its methods
Out[36]:
In [54]:
q.cdf(0.5)
Out[54]:
In [55]:
q.pdf(0.5)
Out[55]:
In [56]:
q.mean()
Out[56]:
Basic linear regression:
In [43]:
from scipy.stats import linregress
n = 200
alpha, beta, sigma = 1, 2, 0.1
x = np.random.randn(n) # n standard normals
y = alpha + beta * x + sigma * np.random.randn(n)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
print "gradient =", gradient
print "intercept =", intercept
Let's choose an arbitrary function to work with
In [44]:
def f(x):
return np.sin(4 * (x - 0.25)) + x + x**20 - 1
x = np.linspace(0, 1, 100)
plot(x, f(x))
plot(x, 0 * x)
Out[44]:
In [45]:
from scipy.optimize import bisect # Bisection algorithm --- slow but robust
bisect(f, 0, 1)
Out[45]:
In [46]:
from scipy.optimize import newton # Newton's method --- fast but less robust
newton(f, 0.2) # Start the search at initial condition x = 0.2
Out[46]:
In [47]:
newton(f, 0.7) # Start the search at x = 0.7 instead
Out[47]:
Here we see that the algorithm gets it wrong --- newton
is fast but not robust
Let's try a hybrid method
In [65]:
from scipy.optimize import brentq
brentq(f, 0, 1) # Hybrid method
Out[65]:
In [66]:
timeit bisect(f, 0, 1)
In [67]:
timeit newton(f, 0.2)
In [68]:
timeit brentq(f, 0, 1)
Note that the hybrid method is robust but still quite fast...
In [69]:
from scipy.optimize import fminbound
fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]
Out[69]:
In [70]:
from scipy.integrate import quad
integral, error = quad(lambda x: x**2, 0, 1)
integral
Out[70]:
In [ ]: