EECS 445: Machine Learning

Hands On 10: Bias Variance Tradeoff

Consider a sequence of IID random variable: $$ X_i = \begin{cases} 100 & \text{ with prob. } 0.02 \\ 0 & \text{ with prob. } 0.97 \\ -100 & \text{ with prob. } 0.01 \\ \end{cases} $$ The true mean of $X_i$ is $$ 0.02 \times 100 + 0.97 \times 0 + 0.01 \times -100 = 1 $$

We want to estimate the true mean of this distribution. We will consider two different estimators of the true mean. Let's say you take three samples $X_1, X_2, X_3$, and you compute the empirical mean $Z=\frac{X_1 + X_2 + X_3}{3}$ and empirical median $Y$ of these three samples (recall that the median is obtained by sorting $X_1, X_2, X_3$ and then choosing the middle (2nd) entry).

What is the bias-variance tradeoff of the $Y$ and $Z$ for estimating the true mean of the above distribution?

They are both unbiased estimators of the true mean, and have the same variance.
The median has higher bias and higher variance.
The mean has higher bias and higher variance.
They both have no bias, but the mean has lower variance.
The mean has no bias but some variance, and the median has non-zero bias but less variance

Activity 1: Bias Variance Tradeoff

We will now see try to see the inherent tradeoff between bias and variance of estimators through linear regression. Consider the following dataset.



In [2]:

    
import numpy as np
import matplotlib.pyplot as plt
from numpy.matlib import repmat
from sklearn.preprocessing import PolynomialFeatures
degrees = [1,2,3,4,5]


#define data
n = 20
sub = 1000
mean = 0
std = 0.25

#define test set
Xtest = np.random.random((n,1))*2*np.pi
ytest = np.sin(Xtest) + np.random.normal(mean,std,(n,1))

#pre-allocate variables
preds = np.zeros((n,sub))
bias = np.zeros(len(degrees))
variance = np.zeros(len(degrees))
mse = np.zeros(len(degrees))
values = np.expand_dims(np.linspace(0,2*np.pi,100),1)

Let's try several polynomial fits to the data:



In [ ]:

    
for j,degree in enumerate(degrees):
    
    for i in range(sub):
            
        #create data - sample from sine wave     
        x = np.random.random((n,1))*2*np.pi
        y = np.sin(x) + np.random.normal(mean,std,(n,1))
        
        poly = PolynomialFeatures(degree=degree)

        
        #TODO
        #create features corresponding to degree - ex: 1, x, x^2, x^3...
        A = 
        
        #TODO:        
        #fit model using least squares solution (linear regression)
        #later include ridge regression/normalization
        coeffs = 
                
        #store predictions for each sampling
        preds[:,i] = poly.fit_transform(Xtest).dot(coeffs)[:,0]
        
        #plot 9 images
        if i < 9:
            plt.subplot(3,3,i+1)
            plt.plot(values,poly.fit_transform(values).dot(coeffs),x,y,'.b')

    plt.axis([0,2*np.pi,-2,2])
    plt.suptitle('PolyFit = %i' % (degree))
    plt.show()

    #TODO
    #Calculate mean bias, variance, and MSE (UNCOMMENT CODE BELOW!)
    #bias[j] = 
    #variance[j] = 
    #mse[j] =

Let's plot the data with the estimators!



In [ ]:

    
plt.subplot(3,1,1)
plt.plot(degrees,bias)
plt.title('bias')
plt.subplot(3,1,2)
plt.plot(degrees,variance)
plt.title('variance')
plt.subplot(3,1,3)
plt.plot(degrees,mse)
plt.title('MSE')
plt.show()