Introduction to NumPy

Topics

  • Basic Synatx
    • creating vectors matrices
    • special: ones, zeros, identity eye
    • add, product, inverse
  • Mechanics: indexing, slicing, concatenating, reshape, zip
  • Numpy load, save data files
  • Random numbers $\rightarrow$ distributions
  • Similarity: Euclidian vs Cosine
  • Example Nearest Neighbor search
  • Example Linear Regression

References

Basics

this section uses content created by Rob Hicks http://rlhick.people.wm.edu/stories/linear-algebra-python-basics.html

Loading libraries

The python universe has a huge number of libraries that extend the capabilities of python. Nearly all of these are open source, unlike packages like stata or matlab where some key libraries are proprietary (and can cost lots of money). In lots of my code, you will see this at the top:


In [1]:
%matplotlib inline
import math
import numpy as np
import matplotlib.pyplot as plt
##import seaborn as sbn
##from scipy import *

This code sets up Ipython Notebook environments (lines beginning with %), and loads several libraries and functions. The core scientific stack in python consists of a number of free libraries. The ones I have loaded above include:

  1. sympy: provides for symbolic computation (solving algebra problems)
  2. numpy: provides for linear algebra computations
  3. matplotlib.pyplot: provides for the ability to graph functions and draw figures
  4. scipy: scientific python provides a plethora of capabilities
  5. seaborn: makes matplotlib figures even pretties (another library like this is called bokeh). This is entirely optional and is purely for eye candy.

Creating arrays, scalars, and matrices in Python

Scalars can be created easily like this:


In [ ]:
x = .5
print x

Vectors and Lists

The numpy library (we will reference it by np) is the workhorse library for linear algebra in python. To creat a vector simply surround a python list ($[1,2,3]$) with the np.array function:


In [3]:
x_vector = np.array([1,2,3])
print x_vector
print type(x_vector)


[1 2 3]
<type 'numpy.ndarray'>

We could have done this by defining a python list and converting it to an array:


In [4]:
c_list = [1,2]
print "The list:",c_list
print "Has length:", len(c_list)

c_vector = np.array(c_list)
print "The vector:", c_vector
print "Has shape:",c_vector.shape


The list: [1, 2]
Has length: 2
The vector: [1 2]
Has shape: (2,)

In [5]:
z = [5,6]
print "This is a list, not an array:",z
print type(z)


This is a list, not an array: [5, 6]
<type 'list'>

In [11]:
A = np.array([[0, 1, 2], [5, 6, 7]])
print A.shape
print type(A)


(2, 3)
<type 'numpy.ndarray'>

In [20]:
v = np.array([1,2,3,4])
print v.shape
print len(v)


(4,)
4

In [ ]:

Matrix Addition and Subtraction

Adding or subtracting a scalar value to a matrix

To learn the basics, consider a small matrix of dimension $2 \times 2$, where $2 \times 2$ denotes the number of rows $\times$ the number of columns. Let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$. Consider adding a scalar value (e.g. 3) to the A. $$ \begin{equation} A+3=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}+3 =\begin{bmatrix} a_{11}+3 & a_{12}+3 \\ a_{21}+3 & a_{22}+3 \end{bmatrix} \end{equation} $$ The same basic principle holds true for A-3: $$ \begin{equation} A-3=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}-3 =\begin{bmatrix} a_{11}-3 & a_{12}-3 \\ a_{21}-3 & a_{22}-3 \end{bmatrix} \end{equation} $$ Notice that we add (or subtract) the scalar value to each element in the matrix A. A can be of any dimension.

This is trivial to implement, now that we have defined our matrix A:


In [21]:
A


Out[21]:
array([[0, 1, 2],
       [5, 6, 7]])

In [22]:
result = A + 3
#or
result = 3 + A
print result


[[ 3  4  5]
 [ 8  9 10]]

Adding or subtracting two matrices

Consider two small $2 \times 2$ matrices, where $2 \times 2$ denotes the # of rows $\times$ the # of columns. Let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$ and $B$=$\bigl( \begin{smallmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{smallmatrix} \bigr)$. To find the result of $A-B$, simply subtract each element of A with the corresponding element of B:

$$ \begin{equation} A -B = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} - \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}-b_{11} & a_{12}-b_{12} \\ a_{21}-b_{21} & a_{22}-b_{22} \end{bmatrix} \end{equation} $$

Addition works exactly the same way:

$$ \begin{equation} A + B = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} + \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} = \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12} \\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix} \end{equation} $$

An important point to know about matrix addition and subtraction is that it is only defined when $A$ and $B$ are of the same size. Here, both are $2 \times 2$. Since operations are performed element by element, these two matrices must be conformable- and for addition and subtraction that means they must have the same numbers of rows and columns. I like to be explicit about the dimensions of matrices for checking conformability as I write the equations, so write

$$ A_{2 \times 2} + B_{2 \times 2}= \begin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12} \\ a_{21}+b_{21} & a_{22}+b_{22} \end{bmatrix}_{2 \times 2} $$

Notice that the result of a matrix addition or subtraction operation is always of the same dimension as the two operands.

Let's define another matrix, B, that is also $2 \times 2$ and add it to A:


In [23]:
B = np.random.randn(2,2)
print B


[[ 2.26703442 -0.83466857]
 [-0.17599679  0.21491604]]

In [24]:
A = np.array([[1,0], [0,1]])
A


Out[24]:
array([[1, 0],
       [0, 1]])

In [27]:
A*B


Out[27]:
array([[ 2.26703442, -0.        ],
       [-0.        ,  0.21491604]])

In [ ]:

Matrix Multiplication

Multiplying a scalar value times a matrix

As before, let $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{smallmatrix} \bigr)$. Suppose we want to multiply A times a scalar value (e.g. $3 \times A$)

$$ \begin{equation} 3 \times A = 3 \times \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} 3a_{11} & 3a_{12} \\ 3a_{21} & 3a_{22} \end{bmatrix} \end{equation} $$

is of dimension (2,2). Scalar multiplication is commutative, so that $3 \times A$=$A \times 3$. Notice that the product is defined for a matrix A of any dimension.

Similar to scalar addition and subtration, the code is simple:


In [ ]:
A * 3

Multiplying two matricies

Now, consider the $2 \times 1$ vector $C=\bigl( \begin{smallmatrix} c_{11} \\ c_{21} \end{smallmatrix} \bigr)$

Consider multiplying matrix $A_{2 \times 2}$ and the vector $C_{2 \times 1}$. Unlike the addition and subtraction case, this product is defined. Here, conformability depends not on the row and column dimensions, but rather on the column dimensions of the first operand and the row dimensions of the second operand. We can write this operation as follows

$$ \begin{equation} A_{2 \times 2} \times C_{2 \times 1} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}_{2 \times 2} \times \begin{bmatrix} c_{11} \\ c_{21} \end{bmatrix}_{2 \times 1} = \begin{bmatrix} a_{11}c_{11} + a_{12}c_{21} \\ a_{21}c_{11} + a_{22}c_{21} \end{bmatrix}_{2 \times 1} \end{equation} $$

Alternatively, consider a matrix C of dimension $2 \times 3$ and a matrix A of dimension $3 \times 2$

$$ \begin{equation} A_{3 \times 2}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} , C_{2 \times 3} = \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ \end{bmatrix}_{2 \times 3} \end{equation} $$

Here, A $\times$ C is

$$ \begin{align} A_{3 \times 2} \times C_{2 \times 3}=& \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} \times \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \end{bmatrix}_{2 \times 3} \\ =& \begin{bmatrix} a_{11} c_{11}+a_{12} c_{21} & a_{11} c_{12}+a_{12} c_{22} & a_{11} c_{13}+a_{12} c_{23} \\ a_{21} c_{11}+a_{22} c_{21} & a_{21} c_{12}+a_{22} c_{22} & a_{21} c_{13}+a_{22} c_{23} \\ a_{31} c_{11}+a_{32} c_{21} & a_{31} c_{12}+a_{32} c_{22} & a_{31} c_{13}+a_{32} c_{23} \end{bmatrix}_{3 \times 3} \end{align} $$

So in general, $X_{r_x \times c_x} \times Y_{r_y \times c_y}$ we have two important things to remember:

  • For conformability in matrix multiplication, $c_x=r_y$, or the columns in the first operand must be equal to the rows of the second operand.
  • The result will be of dimension $r_x \times c_y$, or of dimensions equal to the rows of the first operand and columns equal to columns of the second operand.

Given these facts, you should convince yourself that matrix multiplication is not generally commutative, that the relationship $X \times Y = Y \times X$ does not hold in all cases. For this reason, we will always be very explicit about whether we are pre multiplying ($X \times Y$) or post multiplying ($Y \times X$) the vectors/matrices $X$ and $Y$.

For more information on this topic, see this http://en.wikipedia.org/wiki/Matrix_multiplication.


In [28]:
# Let's redefine A and C to demonstrate matrix multiplication:
A = np.arange(6).reshape((3,2))
C = np.random.randn(2,2)

print A.shape
print C.shape


(3, 2)
(2, 2)

We will use the numpy dot operator to perform the these multiplications. You can use it two ways to yield the same result:


In [29]:
print A.dot(C)
print np.dot(A,C)


[[-0.75152313  0.74200997]
 [-1.44918176  0.39710511]
 [-2.14684039  0.05220024]]
[[-0.75152313  0.74200997]
 [-1.44918176  0.39710511]
 [-2.14684039  0.05220024]]

In [ ]:
# What would happen to
C.dot(A)

Matrix Division

The term matrix division is actually a misnomer. To divide in a matrix algebra world we first need to invert the matrix. It is useful to consider the analog case in a scalar work. Suppose we want to divide the $f$ by $g$. We could do this in two different ways: $$ \begin{equation} \frac{f}{g}=f \times g^{-1}. \end{equation} $$ In a scalar seeting, these are equivalent ways of solving the division problem. The second one requires two steps: first we invert g and then we multiply f times g. In a matrix world, we need to think about this second approach. First we have to invert the matrix g and then we will need to pre or post multiply depending on the exact situation we encounter (this is intended to be vague for now).

Inverting a Matrix

As before, consider the square $2 \times 2$ matrix $A$=$\bigl( \begin{smallmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{smallmatrix} \bigr)$. Let the inverse of matrix A (denoted as $A^{-1}$) be

$$ \begin{equation} A^{-1}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}^{-1}=\frac{1}{a_{11}a_{22}-a_{12}a_{21}} \begin{bmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{bmatrix} \end{equation} $$

The inverted matrix $A^{-1}$ has a useful property: $$ \begin{equation} A \times A^{-1}=A^{-1} \times A=I \end{equation} $$ where I, the identity matrix (the matrix equivalent of the scalar value 1), is $$ \begin{equation} I_{2 \times 2}=\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{equation} $$ furthermore, $A \times I = A$ and $I \times A = A$.

An important feature about matrix inversion is that it is undefined if (in the $2 \times 2$ case), $a_{11}a_{22}-a_{12}a_{21}=0$. If this relationship is equal to zero the inverse of A does not exist. If this term is very close to zero, an inverse may exist but $A^{-1}$ may be poorly conditioned meaning it is prone to rounding error and is likely not well identified computationally. The term $a_{11}a_{22}-a_{12}a_{21}$ is the determinant of matrix A, and for square matrices of size greater than $2 \times 2$, if equal to zero indicates that you have a problem with your data matrix (columns are linearly dependent on other columns). The inverse of matrix A exists if A is square and is of full rank (ie. the columns of A are not linear combinations of other columns of A).

For more information on this topic, see this http://en.wikipedia.org/wiki/Matrix_inversion, for example, on inverting matrices.


In [ ]:
# note, we need a square matrix (# rows = # cols), use C:
C_inverse = np.linalg.inv(C)
print C_inverse

Check that $C\times C^{-1} = I$:


In [ ]:
print C.dot(C_inverse)
print "Is identical to:"
print C_inverse.dot(C)

Transposing a Matrix

At times it is useful to pivot a matrix for conformability- that is in order to matrix divide or multiply, we need to switch the rows and column dimensions of matrices. Consider the matrix $$ \begin{equation} A_{3 \times 2}=\begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}_{3 \times 2} \end{equation} $$ The transpose of A (denoted as $A^{\prime}$) is $$ \begin{equation} A^{\prime}=\begin{bmatrix} a_{11} & a_{21} & a_{31} \\ a_{12} & a_{22} & a_{32} \\ \end{bmatrix}_{2 \times 3} \end{equation} $$


In [34]:
A = np.arange(6).reshape((6,1))
B = np.arange(6).reshape((1,6))

In [35]:
A.dot(B)


Out[35]:
array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4,  5],
       [ 0,  2,  4,  6,  8, 10],
       [ 0,  3,  6,  9, 12, 15],
       [ 0,  4,  8, 12, 16, 20],
       [ 0,  5, 10, 15, 20, 25]])

In [36]:
B.dot(A)


Out[36]:
array([[55]])

In [ ]:
A = np.arange(6).reshape((3,2))
B = np.arange(8).reshape((2,4))
print "A is"
print A

print "The Transpose of A is"
print A.T

One important property of transposing a matrix is the transpose of a product of two matrices. Let matrix A be of dimension $N \times M$ and let B of of dimension $M \times P$. Then $$ \begin{equation} (AB)^{\prime}=B^{\prime}A^{\prime} \end{equation} $$ For more information, see this http://en.wikipedia.org/wiki/Matrix_transposition on matrix transposition. This is also easy to implement:


In [37]:
print B.T.dot(A.T)
print "Is identical to:"
print (A.dot(B)).T


[[ 0  0  0  0  0  0]
 [ 0  1  2  3  4  5]
 [ 0  2  4  6  8 10]
 [ 0  3  6  9 12 15]
 [ 0  4  8 12 16 20]
 [ 0  5 10 15 20 25]]
Is identical to:
[[ 0  0  0  0  0  0]
 [ 0  1  2  3  4  5]
 [ 0  2  4  6  8 10]
 [ 0  3  6  9 12 15]
 [ 0  4  8 12 16 20]
 [ 0  5 10 15 20 25]]

In [40]:
B.shape


Out[40]:
(1, 6)

In [41]:
B[0, 3]


Out[41]:
3

In [43]:
A = np.arange(12).reshape((3,4))
A


Out[43]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [49]:
A[2,:].shape


Out[49]:
(4,)

In [53]:
A[:,1].reshape(1,3).shape


Out[53]:
(1, 3)

In [ ]:

Mechanics


In [ ]:
a = np.arange(10) 
s = slice(2,7,2) 
print a[s]

In [54]:
a = np.arange(10) 
b = a[2:7:2] 
print b


[2 4 6]

In [ ]:
a = np.arange(10) 
b = a[5] 
print b

In [57]:
a = np.arange(10) 
print a
print a[2:5]


[0 1 2 3 4 5 6 7 8 9]
[2 3 4]

In [ ]:
import numpy as np 
a = np.arange(10) 
print a[2:5]

In [58]:
a = np.array([[1,2,3],[3,4,5],[4,5,6]]) 
print a  

# slice items starting from index
print 'Now we will slice the array from the index a[1:]' 
print a[1:]


[[1 2 3]
 [3 4 5]
 [4 5 6]]
Now we will slice the array from the index a[1:]
[[3 4 5]
 [4 5 6]]

In [59]:
# array to begin with 
a = np.array([[1,2,3],[3,4,5],[4,5,6]]) 

print 'Our array is:' 
print a 
print '\n'  

# this returns array of items in the second column 
print 'The items in the second column are:'  
print a[...,1] 
print '\n'  

# Now we will slice all items from the second row 
print 'The items in the second row are:' 
print a[1,...] 
print '\n'  

# Now we will slice all items from column 1 onwards 
print 'The items column 1 onwards are:' 
print a[...,1:]


Our array is:
[[1 2 3]
 [3 4 5]
 [4 5 6]]


The items in the second column are:
[2 4 5]


The items in the second row are:
[3 4 5]


The items column 1 onwards are:
[[2 3]
 [4 5]
 [5 6]]

Logic, Comparison


In [72]:
A = np.random.rand(5,5)*10
print A
print (A < 5)


[[ 2.74825125  3.66327804  7.30298943  3.46705323  2.84502206]
 [ 9.25292119  8.18036788  5.21405825  5.79221822  2.51686822]
 [ 1.18177811  6.20346576  6.66123982  0.93145888  3.28405485]
 [ 6.62341177  4.90183949  0.65675271  4.78174612  2.59768208]
 [ 5.83704272  0.09697114  0.23165873  9.54812567  5.53696957]]
[[ True  True False  True  True]
 [False False False False  True]
 [ True False False  True  True]
 [False  True  True  True  True]
 [False  True  True False False]]

In [73]:
print A[A < 5]


[ 2.74825125  3.66327804  3.46705323  2.84502206  2.51686822  1.18177811
  0.93145888  3.28405485  4.90183949  0.65675271  4.78174612  2.59768208
  0.09697114  0.23165873]

In [74]:
A[A<5] = 0
A


Out[74]:
array([[ 0.        ,  0.        ,  7.30298943,  0.        ,  0.        ],
       [ 9.25292119,  8.18036788,  5.21405825,  5.79221822,  0.        ],
       [ 0.        ,  6.20346576,  6.66123982,  0.        ,  0.        ],
       [ 6.62341177,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 5.83704272,  0.        ,  0.        ,  9.54812567,  5.53696957]])

In [ ]:


In [77]:
A[A>=5] = 1

In [78]:
A


Out[78]:
array([[ 0.,  0.,  1.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  1.,  1.]])

Concatenate, Reshape


In [86]:
np.ones((10,5), int)


Out[86]:
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [87]:
np.zeros((10,5), int)


Out[87]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [91]:
np.eye(5, dtype="int")


Out[91]:
array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

In [ ]:


In [ ]:

Random Numbers


In [128]:
v1 = np.random.rand(100)
v2 = np.random.randn(100)
plt.plot(range(v1.shape[0]), v1, '.')
plt.plot(range(v2.shape[0]), v2, '.')


Out[128]:
[<matplotlib.lines.Line2D at 0x42da090>]

In [130]:
plt.hist(v1)
;


Out[130]:
''

In [132]:
v2 = np.random.randn(10000)
plt.hist(v2, bins=100)
;


Out[132]:
''

In [138]:
v3 = np.random.beta(3,2, 1000)
plt.hist(v3, bins=100)
;


Out[138]:
''

Numpy load, save data files


In [149]:
ls -l HW03/


total 100
-rw-rw-r-- 1 pmolnar pmolnar 87053 Jun 23 07:58 Camera.csv
-rw-rw-r-- 1 pmolnar pmolnar  1497 Sep  6 15:23 future_README.md
-rwxrwxr-x 1 pmolnar pmolnar   244 Sep  6 10:09 preprocess_data.sh*
-rw-rw-r-- 1 pmolnar pmolnar   140 Sep  6 15:24 README.md

In [ ]:


In [151]:
%%sh
./HW03/preprocess_data.sh HW03/Camera.csv HW03/Camera_cleaned.csv
head HW03/Camera_cleaned.csv


Model;Release_date;Max_resolution;Low_resolution;Effective_pixels;Zoom_wide_W;Zoom_tele_T;Normal_focus_range;Macro_focus_range;Storage_included;Weight_inc._batteries;Dimensions;Price
1;1997;1024.0;640.0;0.0;38.0;114.0;70.0;40.0;4.0;420.0;95.0;179.0
2;1998;1280.0;640.0;1.0;38.0;114.0;50.0;0.0;4.0;420.0;158.0;179.0
3;2000;640.0;0.0;0.0;45.0;45.0;0.0;0.0;2.0;0.0;0.0;179.0
4;1999;1152.0;640.0;0.0;35.0;35.0;0.0;0.0;4.0;0.0;0.0;269.0
5;1999;1152.0;640.0;0.0;43.0;43.0;50.0;0.0;40.0;300.0;128.0;1299.0
6;2001;1600.0;640.0;1.0;51.0;51.0;50.0;20.0;8.0;270.0;119.0;179.0
7;1999;1280.0;640.0;1.0;34.0;102.0;0.0;0.0;8.0;0.0;0.0;179.0
8;1997;640.0;0.0;0.0;42.0;42.0;70.0;3.0;2.0;320.0;93.0;149.0
9;1996;832.0;640.0;0.0;50.0;50.0;40.0;10.0;1.0;460.0;160.0;139.0

In [152]:
DATA = np.genfromtxt('HW03/Camera_cleaned.csv', delimiter=';')

In [153]:
DATA.


Out[153]:
(1039, 13)

In [162]:
np.max(DATA[1:,2])


Out[162]:
5616.0

In [ ]:
np.nanargmin()

In [ ]:

Similarity: Euclidian vs Cosine


In [ ]:

Example Nearest Neighbor search

Nearest Neighbor search is a common technique in Machine Learning


In [94]:
### Pure iterative Python ###

points = [[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]]
qPoint = [4,5,3]

minIdx = -1
minDist = -1
for idx, point in enumerate(points):  # iterate over all points
        print "index is %d, point is %s" % (idx, point)
        dist = sum([(dp-dq)**2 for dp,dq in zip(point,qPoint)])**0.5  # compute the euclidean distance for each point to q
        if dist < minDist or minDist < 0:  # if necessary, update minimum distance and index of the corresponding point
            minDist = dist
            minIdx = idx

print 'Nearest point to q: ', points[minIdx]


index is 0, point is [9, 2, 8]
index is 1, point is [4, 7, 2]
index is 2, point is [3, 4, 4]
index is 3, point is [5, 6, 9]
index is 4, point is [5, 0, 7]
index is 5, point is [8, 2, 7]
index is 6, point is [0, 3, 2]
index is 7, point is [7, 3, 0]
index is 8, point is [6, 1, 1]
index is 9, point is [2, 9, 6]
Nearest point to q:  [3, 4, 4]

In [96]:
zip(point, qPoint)


Out[96]:
[(2, 4), (9, 5), (6, 3)]

In [97]:
# # # Equivalent NumPy vectorization # # #
import numpy as np
points = np.array([[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]])
qPoint = np.array([4,5,3])
minIdx = np.argmin(np.linalg.norm(points-qPoint,axis=1))  # compute all euclidean distances at once and return the index of the smallest one
print 'Nearest point to q: ', points[minIdx]


Nearest point to q:  [3 4 4]

In [103]:
print points.shape
print qPoint.shape


(10, 3)
(3,)

In [106]:
print points
print qPoint
print points-qPoint


[[9 2 8]
 [4 7 2]
 [3 4 4]
 [5 6 9]
 [5 0 7]
 [8 2 7]
 [0 3 2]
 [7 3 0]
 [6 1 1]
 [2 9 6]]
[4 5 3]
[[ 5 -3  5]
 [ 0  2 -1]
 [-1 -1  1]
 [ 1  1  6]
 [ 1 -5  4]
 [ 4 -3  4]
 [-4 -2 -1]
 [ 3 -2 -3]
 [ 2 -4 -2]
 [-2  4  3]]

In [113]:
from numpy.linalg import norm

In [115]:
norm(points-qPoint)


Out[115]:
16.852299546352718

In [122]:
1.0-points[0,:].dot(qPoint)/(norm(points[0,:])*norm(qPoint))


Out[122]:
0.18900177509111082

In [ ]:

Example: Linear Regression

Linear regression is an approach for modeling the relationship between a scalar dependent variable $y$ and one or more explanatory variables (or independent variables) denoted $X$. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.$^1$ (This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.$^2$

We assume that the equation

$y_i = \beta_0 + \beta_1 X_i + \epsilon_i$ where $\epsilon_i \approx N(0, \sigma^2)$


$^1$ David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press. p. 26. A simple regression equation has on the right hand side an intercept and an explanatory variable with a slope coefficient. A multiple regression equation has two or more explanatory variables on the right hand side, each with its own slope coefficient

$^2$ Rencher, Alvin C.; Christensen, William F. (2012), "Chapter 10, Multivariate regression – Section 10.1, Introduction", Methods of Multivariate Analysis, Wiley Series in Probability and Statistics, 709 (3rd ed.), John Wiley & Sons, p. 19, ISBN 9781118391679.


In [123]:
n = 100  # numeber of samples
Xr = np.random.rand(n)*99.0
y = -7.3 + 2.5*Xr + np.random.randn(n)*27.0
plt.plot(Xr, y, "o", alpha=0.5)


Out[123]:
[<matplotlib.lines.Line2D at 0x3cef150>]

Let's add the bias, i.e. a column of $1$s to the explanatory variables


In [ ]:
X = np.vstack((np.ones(n), Xr)).T
print X.shape
X[0:10,:]

Closed-form Linear Regression

And compute the parametes $\beta_0$ and $\beta_1$ according to $$ \beta = (X^\prime X)^{-1} X^\prime y $$


Note: This not only looks elegant but can also be written in Julia code. However, matrix inversion $M^{-1}$ requires $O(d^3)$ iterations for a $d\times d$ matrix.
https://www.coursera.org/learn/ml-regression/lecture/jOVX8/discussing-the-closed-form-solution


In [ ]:
beta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

In [ ]:
yhat = X.dot(beta)
yhat.shape

In [ ]:
plt.plot(X[:,1], y, "o", alpha=0.5)
plt.plot(X[:,1], yhat, "-", alpha=1, color="red")

Multiple Linear Regression


In [ ]:
n = 100  # numeber of samples
X1 = np.random.rand(n)*99.0
X2 = np.random.rand(n)*51.0 - 26.8
X3 = np.random.rand(n)*5.0 + 6.1
X4 = np.random.rand(n)*1.0 - 0.5
X5 = np.random.rand(n)*300.0

y_m = -7.3 + 2.5*X1 + -7.9*X2 + 1.5*X3 + 10.0*X4 + 0.13*X5 + np.random.randn(n)*7.0
plt.hist(y_m, bins=20)
;

In [ ]:
X_m = np.vstack((np.ones(n), X1, X2, X3, X4, X5)).T
X_m.shape

In [ ]:
beta_m = np.linalg.inv(X_m.T.dot(X_m)).dot(X_m.T).dot(y_m)
beta_m

In [ ]:
yhat_m = X.dot(beta_m)
yhat_m.shape

Evaluation: Root-mean-square Deviation

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSD represents the sample standard deviation of the differences between predicted values and observed values. These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.$^1$


$^1$ Hyndman, Rob J. Koehler, Anne B.; Koehler (2006). "Another look at measures of forecast accuracy". International Journal of Forecasting. 22 (4): 679–688. doi:10.1016/j.ijforecast.2006.03.001.


In [ ]:
import math
RSMD = math.sqrt(np.square(yhat_m-y_m).sum()/n)
print RSMD

Regularization, Ridge-Regression

Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.

In general, a regularization term $R(f)$ is introduced to a general loss function:

for a loss function $V$ that describes the cost of predicting $f(x)$ when the label is $y$, such as the square loss or hinge loss, and for the term $\lambda$ which controls the importance of the regularization term. $R(f)$ is typically a penalty on the complexity of $f$, such as restrictions for smoothness or bounds on the vector space norm.$^1$

A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution, as depicted in the figure. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.

Regularization can be used to learn simpler models, induce models to be sparse, introduce group structure into the learning problem, and more.

We're going to add the L2 term $\lambda||\beta||_2^2$ to the regression equation, which yields to$^2$

$$ \beta = (X^\prime X + \lambda I)^{-1} X^\prime y $$

$^1$ Bishop, Christopher M. (2007). Pattern recognition and machine learning (Corr. printing. ed.). New York: Springer. ISBN 978-0387310732.

$^2$ http://stats.stackexchange.com/questions/69205/how-to-derive-the-ridge-regression-solution


In [ ]:
p = X.shape[1]  ## get number of parameters
lam = 10.0
p, lam

In [ ]:
beta2 = np.linalg.inv(X.T.dot(X) + lam*np.eye(p)).dot(X.T).dot(y)

In [ ]:
yhat2 = X.dot(beta2)

In [ ]:
RSMD2 = math.sqrt(np.square(yhat2-y).sum()/n)
print RSMD2

In [ ]:
##n = float(X.shape[0])
print "      RMSE = ", math.sqrt(np.square(yhat-y).sum()/n)
print "Ridge RMSE = ", math.sqrt(np.square(yhat2-y).sum()/n)
plt.plot(X[:,1], y, "o", alpha=0.5)
plt.plot(X[:,1], yhat, "-", alpha=0.7, color="red")
plt.plot(X[:,1], yhat2, "-", alpha=0.7, color="green")

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: