this section uses content created by Rob Hicks http://rlhick.people.wm.edu/stories/linear-algebra-python-basics.html
The python universe has a huge number of libraries that extend the capabilities of python. Nearly all of these are open source, unlike packages like stata or matlab where some key libraries are proprietary (and can cost lots of money). In lots of my code, you will see this at the top:
In [1]:
%matplotlib inline
import math
import numpy as np
import matplotlib.pyplot as plt
##import seaborn as sbn
##from scipy import *
This code sets up Ipython Notebook environments (lines beginning with %
), and loads several libraries and functions. The core scientific stack in python consists of a number of free libraries. The ones I have loaded above include:
In [ ]:
x = .5
print x
Vectors and Lists
The numpy library (we will reference it by np) is the workhorse library for linear algebra in python. To creat a vector simply surround a python list ($[1,2,3]$) with the np.array function:
In [3]:
x_vector = np.array([1,2,3])
print x_vector
print type(x_vector)
We could have done this by defining a python list and converting it to an array:
In [4]:
c_list = [1,2]
print "The list:",c_list
print "Has length:", len(c_list)
c_vector = np.array(c_list)
print "The vector:", c_vector
print "Has shape:",c_vector.shape
In [5]:
z = [5,6]
print "This is a list, not an array:",z
print type(z)
In [11]:
A = np.array([[0, 1, 2], [5, 6, 7]])
print A.shape
print type(A)
In [20]:
v = np.array([1,2,3,4])
print v.shape
print len(v)
In [ ]:
To learn the basics, consider a small matrix of dimension $2 \times 2$, where $2 \times 2$ denotes the number of rows $\times$ the number of columns. Let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$. Consider adding a scalar value (e.g. 3) to the A. $$ \begin{equation} A+3=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}+3 =\begin{bmatrix} a_{11}+3 & a_{12}+3 \\ a_{21}+3 & a_{22}+3 \end{bmatrix} \end{equation} $$ The same basic principle holds true for A-3: $$ \begin{equation} A-3=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}-3 =\begin{bmatrix} a_{11}-3 & a_{12}-3 \\ a_{21}-3 & a_{22}-3 \end{bmatrix} \end{equation} $$ Notice that we add (or subtract) the scalar value to each element in the matrix A. A can be of any dimension.
This is trivial to implement, now that we have defined our matrix A:
In [21]:
A
Out[21]:
In [22]:
result = A + 3
#or
result = 3 + A
print result
Consider two small $2 \times 2$ matrices, where $2 \times 2$ denotes the # of rows $\times$ the # of columns. Let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$ and $B$=$\bigl( \begin{smallmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{smallmatrix} \bigr)$. To find the result of $A-B$, simply subtract each element of A with the corresponding element of B:
$$ \begin{equation} A -B = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} - \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}-b_{11} & a_{12}-b_{12} \\ a_{21}-b_{21} & a_{22}-b_{22} \end{bmatrix} \end{equation} $$Addition works exactly the same way:
$$ \begin{equation} A + B = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12} \\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix} \end{equation} $$An important point to know about matrix addition and subtraction is that it is only defined when $A$ and $B$ are of the same size. Here, both are $2 \times 2$. Since operations are performed element by element, these two matrices must be conformable- and for addition and subtraction that means they must have the same numbers of rows and columns. I like to be explicit about the dimensions of matrices for checking conformability as I write the equations, so write
$$ A_{2 \times 2} + B_{2 \times 2}= \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12} \\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}_{2 \times 2} $$Notice that the result of a matrix addition or subtraction operation is always of the same dimension as the two operands.
Let's define another matrix, B, that is also $2 \times 2$ and add it to A:
In [23]:
B = np.random.randn(2,2)
print B
In [24]:
A = np.array([[1,0], [0,1]])
A
Out[24]:
In [27]:
A*B
Out[27]:
In [ ]:
As before, let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$. Suppose we want to multiply A times a scalar value (e.g. $3 \times A$)
$$ \begin{equation} 3 \times A = 3 \times \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} 3a_{11} & 3a_{12} \\ 3a_{21} & 3a_{22} \end{bmatrix} \end{equation} $$is of dimension (2,2). Scalar multiplication is commutative, so that $3 \times A$=$A \times 3$. Notice that the product is defined for a matrix A of any dimension.
Similar to scalar addition and subtration, the code is simple:
In [ ]:
A * 3
Now, consider the $2 \times 1$ vector $C=\bigl( \begin{smallmatrix} c_{11} \\ c_{21} \end{smallmatrix} \bigr)$
Consider multiplying matrix $A_{2 \times 2}$ and the vector $C_{2 \times 1}$. Unlike the addition and subtraction case, this product is defined. Here, conformability depends not on the row and column dimensions, but rather on the column dimensions of the first operand and the row dimensions of the second operand. We can write this operation as follows
$$ \begin{equation} A_{2 \times 2} \times C_{2 \times 1} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}_{2 \times 2} \times \begin{bmatrix} c_{11} \\ c_{21} \end{bmatrix}_{2 \times 1} = \begin{bmatrix} a_{11}c_{11} + a_{12}c_{21} \\ a_{21}c_{11} + a_{22}c_{21} \end{bmatrix}_{2 \times 1} \end{equation} $$Alternatively, consider a matrix C of dimension $2 \times 3$ and a matrix A of dimension $3 \times 2$
$$ \begin{equation} A_{3 \times 2}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} , C_{2 \times 3} = \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ \end{bmatrix}_{2 \times 3} \end{equation} $$Here, A $\times$ C is
$$ \begin{align} A_{3 \times 2} \times C_{2 \times 3}=& \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} \times \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \end{bmatrix}_{2 \times 3} \\ =& \begin{bmatrix} a_{11} c_{11}+a_{12} c_{21} & a_{11} c_{12}+a_{12} c_{22} & a_{11} c_{13}+a_{12} c_{23} \\ a_{21} c_{11}+a_{22} c_{21} & a_{21} c_{12}+a_{22} c_{22} & a_{21} c_{13}+a_{22} c_{23} \\ a_{31} c_{11}+a_{32} c_{21} & a_{31} c_{12}+a_{32} c_{22} & a_{31} c_{13}+a_{32} c_{23} \end{bmatrix}_{3 \times 3} \end{align} $$So in general, $X_{r_x \times c_x} \times Y_{r_y \times c_y}$ we have two important things to remember:
Given these facts, you should convince yourself that matrix multiplication is not generally commutative, that the relationship $X \times Y = Y \times X$ does not hold in all cases. For this reason, we will always be very explicit about whether we are pre multiplying ($X \times Y$) or post multiplying ($Y \times X$) the vectors/matrices $X$ and $Y$.
For more information on this topic, see this http://en.wikipedia.org/wiki/Matrix_multiplication.
In [28]:
# Let's redefine A and C to demonstrate matrix multiplication:
A = np.arange(6).reshape((3,2))
C = np.random.randn(2,2)
print A.shape
print C.shape
We will use the numpy dot operator to perform the these multiplications. You can use it two ways to yield the same result:
In [29]:
print A.dot(C)
print np.dot(A,C)
In [ ]:
# What would happen to
C.dot(A)
The term matrix division is actually a misnomer. To divide in a matrix algebra world we first need to invert the matrix. It is useful to consider the analog case in a scalar work. Suppose we want to divide the $f$ by $g$. We could do this in two different ways: $$ \begin{equation} \frac{f}{g}=f \times g^{-1}. \end{equation} $$ In a scalar seeting, these are equivalent ways of solving the division problem. The second one requires two steps: first we invert g and then we multiply f times g. In a matrix world, we need to think about this second approach. First we have to invert the matrix g and then we will need to pre or post multiply depending on the exact situation we encounter (this is intended to be vague for now).
As before, consider the square $2 \times 2$ matrix $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{smallmatrix} \bigr)$. Let the inverse of matrix A (denoted as $A^{-1}$) be
$$ \begin{equation} A^{-1}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}^{-1}=\frac{1}{a_{11}a_{22}-a_{12}a_{21}} \begin{bmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{bmatrix} \end{equation} $$The inverted matrix $A^{-1}$ has a useful property: $$ \begin{equation} A \times A^{-1}=A^{-1} \times A=I \end{equation} $$ where I, the identity matrix (the matrix equivalent of the scalar value 1), is $$ \begin{equation} I_{2 \times 2}=\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{equation} $$ furthermore, $A \times I = A$ and $I \times A = A$.
An important feature about matrix inversion is that it is undefined if (in the $2 \times 2$ case), $a_{11}a_{22}-a_{12}a_{21}=0$. If this relationship is equal to zero the inverse of A does not exist. If this term is very close to zero, an inverse may exist but $A^{-1}$ may be poorly conditioned meaning it is prone to rounding error and is likely not well identified computationally. The term $a_{11}a_{22}-a_{12}a_{21}$ is the determinant of matrix A, and for square matrices of size greater than $2 \times 2$, if equal to zero indicates that you have a problem with your data matrix (columns are linearly dependent on other columns). The inverse of matrix A exists if A is square and is of full rank (ie. the columns of A are not linear combinations of other columns of A).
For more information on this topic, see this http://en.wikipedia.org/wiki/Matrix_inversion, for example, on inverting matrices.
In [ ]:
# note, we need a square matrix (# rows = # cols), use C:
C_inverse = np.linalg.inv(C)
print C_inverse
Check that $C\times C^{-1} = I$:
In [ ]:
print C.dot(C_inverse)
print "Is identical to:"
print C_inverse.dot(C)
At times it is useful to pivot a matrix for conformability- that is in order to matrix divide or multiply, we need to switch the rows and column dimensions of matrices. Consider the matrix $$ \begin{equation} A_{3 \times 2}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} \end{equation} $$ The transpose of A (denoted as $A^{\prime}$) is $$ \begin{equation} A^{\prime}=\begin{bmatrix} a_{11} & a_{21} & a_{31} \\ a_{12} & a_{22} & a_{32} \\ \end{bmatrix}_{2 \times 3} \end{equation} $$
In [34]:
A = np.arange(6).reshape((6,1))
B = np.arange(6).reshape((1,6))
In [35]:
A.dot(B)
Out[35]:
In [36]:
B.dot(A)
Out[36]:
In [ ]:
A = np.arange(6).reshape((3,2))
B = np.arange(8).reshape((2,4))
print "A is"
print A
print "The Transpose of A is"
print A.T
One important property of transposing a matrix is the transpose of a product of two matrices. Let matrix A be of dimension $N \times M$ and let B of of dimension $M \times P$. Then $$ \begin{equation} (AB)^{\prime}=B^{\prime}A^{\prime} \end{equation} $$ For more information, see this http://en.wikipedia.org/wiki/Matrix_transposition on matrix transposition. This is also easy to implement:
In [37]:
print B.T.dot(A.T)
print "Is identical to:"
print (A.dot(B)).T
In [40]:
B.shape
Out[40]:
In [41]:
B[0, 3]
Out[41]:
In [43]:
A = np.arange(12).reshape((3,4))
A
Out[43]:
In [49]:
A[2,:].shape
Out[49]:
In [53]:
A[:,1].reshape(1,3).shape
Out[53]:
In [ ]:
examples from https://www.tutorialspoint.com/numpy/numpy_indexing_and_slicing.htm
In [ ]:
a = np.arange(10)
s = slice(2,7,2)
print a[s]
In [54]:
a = np.arange(10)
b = a[2:7:2]
print b
In [ ]:
a = np.arange(10)
b = a[5]
print b
In [57]:
a = np.arange(10)
print a
print a[2:5]
In [ ]:
import numpy as np
a = np.arange(10)
print a[2:5]
In [58]:
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print a
# slice items starting from index
print 'Now we will slice the array from the index a[1:]'
print a[1:]
In [59]:
# array to begin with
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print 'Our array is:'
print a
print '\n'
# this returns array of items in the second column
print 'The items in the second column are:'
print a[...,1]
print '\n'
# Now we will slice all items from the second row
print 'The items in the second row are:'
print a[1,...]
print '\n'
# Now we will slice all items from column 1 onwards
print 'The items column 1 onwards are:'
print a[...,1:]
In [72]:
A = np.random.rand(5,5)*10
print A
print (A < 5)
In [73]:
print A[A < 5]
In [74]:
A[A<5] = 0
A
Out[74]:
In [ ]:
In [77]:
A[A>=5] = 1
In [78]:
A
Out[78]:
In [86]:
np.ones((10,5), int)
Out[86]:
In [87]:
np.zeros((10,5), int)
Out[87]:
In [91]:
np.eye(5, dtype="int")
Out[91]:
In [ ]:
In [ ]:
In [128]:
v1 = np.random.rand(100)
v2 = np.random.randn(100)
plt.plot(range(v1.shape[0]), v1, '.')
plt.plot(range(v2.shape[0]), v2, '.')
Out[128]:
In [130]:
plt.hist(v1)
;
Out[130]:
In [132]:
v2 = np.random.randn(10000)
plt.hist(v2, bins=100)
;
Out[132]:
In [138]:
v3 = np.random.beta(3,2, 1000)
plt.hist(v3, bins=100)
;
Out[138]:
In [149]:
ls -l HW03/
In [ ]:
In [151]:
%%sh
./HW03/preprocess_data.sh HW03/Camera.csv HW03/Camera_cleaned.csv
head HW03/Camera_cleaned.csv
In [152]:
DATA = np.genfromtxt('HW03/Camera_cleaned.csv', delimiter=';')
In [153]:
DATA.
Out[153]:
In [162]:
np.max(DATA[1:,2])
Out[162]:
In [ ]:
np.nanargmin()
In [ ]:
In [ ]:
In [94]:
### Pure iterative Python ###
points = [[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]]
qPoint = [4,5,3]
minIdx = -1
minDist = -1
for idx, point in enumerate(points): # iterate over all points
print "index is %d, point is %s" % (idx, point)
dist = sum([(dp-dq)**2 for dp,dq in zip(point,qPoint)])**0.5 # compute the euclidean distance for each point to q
if dist < minDist or minDist < 0: # if necessary, update minimum distance and index of the corresponding point
minDist = dist
minIdx = idx
print 'Nearest point to q: ', points[minIdx]
In [96]:
zip(point, qPoint)
Out[96]:
In [97]:
# # # Equivalent NumPy vectorization # # #
import numpy as np
points = np.array([[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]])
qPoint = np.array([4,5,3])
minIdx = np.argmin(np.linalg.norm(points-qPoint,axis=1)) # compute all euclidean distances at once and return the index of the smallest one
print 'Nearest point to q: ', points[minIdx]
In [103]:
print points.shape
print qPoint.shape
In [106]:
print points
print qPoint
print points-qPoint
In [113]:
from numpy.linalg import norm
In [115]:
norm(points-qPoint)
Out[115]:
In [122]:
1.0-points[0,:].dot(qPoint)/(norm(points[0,:])*norm(qPoint))
Out[122]:
In [ ]:
Linear regression is an approach for modeling the relationship between a scalar dependent variable $y$ and one or more explanatory variables (or independent variables) denoted $X$. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.$^1$ (This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.$^2$
We assume that the equation
$y_i = \beta_0 + \beta_1 X_i + \epsilon_i$ where $\epsilon_i \approx N(0, \sigma^2)$
$^1$ David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press. p. 26. A simple regression equation has on the right hand side an intercept and an explanatory variable with a slope coefficient. A multiple regression equation has two or more explanatory variables on the right hand side, each with its own slope coefficient
$^2$ Rencher, Alvin C.; Christensen, William F. (2012), "Chapter 10, Multivariate regression – Section 10.1, Introduction", Methods of Multivariate Analysis, Wiley Series in Probability and Statistics, 709 (3rd ed.), John Wiley & Sons, p. 19, ISBN 9781118391679.
In [123]:
n = 100 # numeber of samples
Xr = np.random.rand(n)*99.0
y = -7.3 + 2.5*Xr + np.random.randn(n)*27.0
plt.plot(Xr, y, "o", alpha=0.5)
Out[123]:
Let's add the bias, i.e. a column of $1$s to the explanatory variables
In [ ]:
X = np.vstack((np.ones(n), Xr)).T
print X.shape
X[0:10,:]
And compute the parametes $\beta_0$ and $\beta_1$ according to $$ \beta = (X^\prime X)^{-1} X^\prime y $$
Note:
This not only looks elegant but can also be written in Julia code. However, matrix inversion $M^{-1}$ requires $O(d^3)$ iterations for a $d\times d$ matrix.
https://www.coursera.org/learn/ml-regression/lecture/jOVX8/discussing-the-closed-form-solution
In [ ]:
beta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
In [ ]:
yhat = X.dot(beta)
yhat.shape
In [ ]:
plt.plot(X[:,1], y, "o", alpha=0.5)
plt.plot(X[:,1], yhat, "-", alpha=1, color="red")
In [ ]:
n = 100 # numeber of samples
X1 = np.random.rand(n)*99.0
X2 = np.random.rand(n)*51.0 - 26.8
X3 = np.random.rand(n)*5.0 + 6.1
X4 = np.random.rand(n)*1.0 - 0.5
X5 = np.random.rand(n)*300.0
y_m = -7.3 + 2.5*X1 + -7.9*X2 + 1.5*X3 + 10.0*X4 + 0.13*X5 + np.random.randn(n)*7.0
plt.hist(y_m, bins=20)
;
In [ ]:
X_m = np.vstack((np.ones(n), X1, X2, X3, X4, X5)).T
X_m.shape
In [ ]:
beta_m = np.linalg.inv(X_m.T.dot(X_m)).dot(X_m.T).dot(y_m)
beta_m
In [ ]:
yhat_m = X.dot(beta_m)
yhat_m.shape
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSD represents the sample standard deviation of the differences between predicted values and observed values. These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.$^1$
$^1$ Hyndman, Rob J. Koehler, Anne B.; Koehler (2006). "Another look at measures of forecast accuracy". International Journal of Forecasting. 22 (4): 679–688. doi:10.1016/j.ijforecast.2006.03.001.
In [ ]:
import math
RSMD = math.sqrt(np.square(yhat_m-y_m).sum()/n)
print RSMD
Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
In general, a regularization term $R(f)$ is introduced to a general loss function:
A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution, as depicted in the figure. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.
Regularization can be used to learn simpler models, induce models to be sparse, introduce group structure into the learning problem, and more.
We're going to add the L2 term $\lambda||\beta||_2^2$ to the regression equation, which yields to$^2$
$$ \beta = (X^\prime X + \lambda I)^{-1} X^\prime y $$$^1$ Bishop, Christopher M. (2007). Pattern recognition and machine learning (Corr. printing. ed.). New York: Springer. ISBN 978-0387310732.
$^2$ http://stats.stackexchange.com/questions/69205/how-to-derive-the-ridge-regression-solution
In [ ]:
p = X.shape[1] ## get number of parameters
lam = 10.0
p, lam
In [ ]:
beta2 = np.linalg.inv(X.T.dot(X) + lam*np.eye(p)).dot(X.T).dot(y)
In [ ]:
yhat2 = X.dot(beta2)
In [ ]:
RSMD2 = math.sqrt(np.square(yhat2-y).sum()/n)
print RSMD2
In [ ]:
##n = float(X.shape[0])
print " RMSE = ", math.sqrt(np.square(yhat-y).sum()/n)
print "Ridge RMSE = ", math.sqrt(np.square(yhat2-y).sum()/n)
plt.plot(X[:,1], y, "o", alpha=0.5)
plt.plot(X[:,1], yhat, "-", alpha=0.7, color="red")
plt.plot(X[:,1], yhat2, "-", alpha=0.7, color="green")
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: