Linear Regression

This tutorial shows how simple it is to implement a linear regression in python.

So a regression is about fitting a function (in the linear case a line) to given data points. Therefore the reconstruction cost is minimized such that for all datapoints x the squared distanze between its target value y and the function value f(x) of the fitted curve is minimal.

1) We need some data so create an array x with shape (100,1) that contains the numbers from -5 to 5.

2) Initialize the random number generator of numpy to 42.

3) Now add Gaussian random noise to x with a standard deviation of 1 and store the result in an array y.

4) Plot the datapoints as yellow dots in the range $x\in[-10,10]$ and $y\in[-10,10]$

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
A linear regression for datapoint matrix $X$ ($D \times N$, D datapoints and N input dimensions) and target matrix $Y$ ($D \times M$, D datapoints and M output dimensions) is defined as:

$min \langle \frac{1}{2}\left(\vec{\vec{A}}\vec{x}-\vec{y}\right)^2 \rangle_d = min \frac{1}{2}\frac{1}{D}\sum_d^{D} \sum_i^{N} \sum_j^M \left(a_{ij} x_{di}-y_{dj}\right)^2$

where $\langle \cdot \rangle_d$ is the average over the Training data.

We ignore the bias value for now!

1) Set the derivative to zero and solve the equation for $A$ to get the optima $A^*$. (Hint: If you have problems with the closed form solution with abitray dimensions, first solve the equation for 1D input and output)

2) Now plot the result

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
A common way to integrate a bias value for many machine learning methods is to add a dimension which is constant one for all datapoints!

1) Modify the code by adding a second constant dimension to x and add 10 to y to shift the datapoints verctically.

2) Now plot the result in the range $x\in[-10,10]$ and $y\in[0,20]$, notice that you have to select the first dimension of x in order not to plot the constant dimension!

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
By using a polynomial expansion of x we can fit a polynome to the data.

Fit a ploynome of degree 5 to the data

y = np.cos(x[:,0])+np.random.randn(100)*0.5

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Now perform the same using the linear regression function np.polyfit(x,y,5) of numpy. Notice that x,y are 1D arrays here!

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
