Linear Regression

This tutorial shows how simple it is to implement a linear regression in python.

So a regression is about fitting a function (in the linear case a line) to given data points. Therefore the reconstruction cost is minimized such that for all datapoints x the squared distanze between its target value y and the function value f(x) of the fitted curve is minimal.

1) We need some data so create an array x with shape (100,1) that contains the numbers from -5 to 5.

2) Initialize the random number generator of numpy to 42.

3) Now add Gaussian random noise to x with a standard deviation of 1 and store the result in an array y.

4) Plot the datapoints as yellow dots in the range $x\in[-10,10]$ and $y\in[-10,10]$



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# your code here



In [ ]:

    
# our solution
from solutions import *
decrypt_solution(solution_regression_1, 'foo')

A linear regression for datapoint matrix $X$ ($D \times N$, D datapoints and N input dimensions) and target matrix $Y$ ($D \times M$, D datapoints and M output dimensions) is defined as:

$min \langle \frac{1}{2}\left(\vec{\vec{A}}\vec{x}-\vec{y}\right)^2 \rangle_d = min \frac{1}{2}\frac{1}{D}\sum_d^{D} \sum_i^{N} \sum_j^M \left(a_{ij} x_{di}-y_{dj}\right)^2$

where $\langle \cdot \rangle_d$ is the average over the Training data.

We ignore the bias value for now!

1) Set the derivative to zero and solve the equation for $A$ to get the optima $A^*$. (Hint: If you have problems with the closed form solution with abitray dimensions, first solve the equation for 1D input and output)

2) Now plot the result



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# your code here



In [ ]:

    
# our solution
from solutions import *
decrypt_solution(solution_regression_2, 'foo')

A common way to integrate a bias value for many machine learning methods is to add a dimension which is constant one for all datapoints!

1) Modify the code by adding a second constant dimension to x and add 10 to y to shift the datapoints verctically.

2) Now plot the result in the range $x\in[-10,10]$ and $y\in[0,20]$, notice that you have to select the first dimension of x in order not to plot the constant dimension!



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# your code here



In [ ]:

    
# our solution
from solutions import *
decrypt_solution(solution_regression_3, 'foo')

By using a polynomial expansion of x we can fit a polynome to the data.

Fit a ploynome of degree 5 to the data



In [ ]:

    
y = np.cos(x[:,0])+np.random.randn(100)*0.5



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# your code here



In [ ]:

    
# our solution
from solutions import *
decrypt_solution(solution_regression_4, 'foo')

Now perform the same using the linear regression function np.polyfit(x,y,5) of numpy. Notice that x,y are 1D arrays here!



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# your code here



In [ ]:

    
# our solution
from solutions import *
decrypt_solution(solution_regression_5, 'foo')