Introduction to Machine Learning with scikit-learn

Lab 3: Regression

The goal of this lab session is to discover a few regression tools from scikit-learn. As for the classification lab, we will start with generated data to have a more visual idea of the results.


In [2]:
%matplotlib inline
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

Generating the data

As we did during the classification lab, we generate some data points uniformly using np.random.rand. In this example, we will generate a set of data points $x \in [0, 1]$ and their images $y$ under the function $f$ defined by $$ y = f(x) = \cos \left( \dfrac{3\pi}{2} x \right)$$ which will be modeled by my_function.

We also add some noise following a Gaussian distribution (np.random.randn) centered in 0.


In [3]:
import numpy as np

np.random.seed(0)

n_samples = 30

def my_function(X):
    return np.cos(1.5 * np.pi * X)
    
X = np.random.rand(n_samples)
y = my_function(X) + np.random.randn(n_samples) * 0.1

In [4]:
X.shape


Out[4]:
(30,)

Let us also generate the testing points needed to plot the result of our prediction. In this case, we generate 100 points uniformly spread between 0 and 1 using np.linspace.


In [5]:
X_test = np.linspace(0, 1, 100)

In [6]:
X_test


Out[6]:
array([ 0.        ,  0.01010101,  0.02020202,  0.03030303,  0.04040404,
        0.05050505,  0.06060606,  0.07070707,  0.08080808,  0.09090909,
        0.1010101 ,  0.11111111,  0.12121212,  0.13131313,  0.14141414,
        0.15151515,  0.16161616,  0.17171717,  0.18181818,  0.19191919,
        0.2020202 ,  0.21212121,  0.22222222,  0.23232323,  0.24242424,
        0.25252525,  0.26262626,  0.27272727,  0.28282828,  0.29292929,
        0.3030303 ,  0.31313131,  0.32323232,  0.33333333,  0.34343434,
        0.35353535,  0.36363636,  0.37373737,  0.38383838,  0.39393939,
        0.4040404 ,  0.41414141,  0.42424242,  0.43434343,  0.44444444,
        0.45454545,  0.46464646,  0.47474747,  0.48484848,  0.49494949,
        0.50505051,  0.51515152,  0.52525253,  0.53535354,  0.54545455,
        0.55555556,  0.56565657,  0.57575758,  0.58585859,  0.5959596 ,
        0.60606061,  0.61616162,  0.62626263,  0.63636364,  0.64646465,
        0.65656566,  0.66666667,  0.67676768,  0.68686869,  0.6969697 ,
        0.70707071,  0.71717172,  0.72727273,  0.73737374,  0.74747475,
        0.75757576,  0.76767677,  0.77777778,  0.78787879,  0.7979798 ,
        0.80808081,  0.81818182,  0.82828283,  0.83838384,  0.84848485,
        0.85858586,  0.86868687,  0.87878788,  0.88888889,  0.8989899 ,
        0.90909091,  0.91919192,  0.92929293,  0.93939394,  0.94949495,
        0.95959596,  0.96969697,  0.97979798,  0.98989899,  1.        ])

With this, we can plot $f$ and the generated points.


In [7]:
import matplotlib.pyplot as plt
plt.plot(X_test, my_function(X_test), label="True function")
plt.scatter(X, y, label="Samples")
plt.legend(loc="best")


Out[7]:
<matplotlib.legend.Legend at 0x7ff2a403ebd0>

Note that the samples are not on the function curve because of the additive noise.

 Linear regression

Recall that the linear regression with a squared loss aims at finding the solution of the following optimization problem: $$\min_w \sum_i \left(w^T \phi\left(x^{(i)}\right) - y^{(i)}\right)^2$$ Linear regression is implemented in sklearn.linear_model.LinearRegression.


In [8]:
from sklearn.linear_model import LinearRegression
linear_regression = LinearRegression()

In [9]:
my_regression = linear_regression.fit(X[:, np.newaxis], y)

In [10]:
X_test = np.linspace(0, 1, 100)
plt.plot(X_test, my_regression.predict(X_test[:, np.newaxis]), label="Model")
# my_regression.predict(X_test[:, np.newaxis] = y_test (the prediction of X_test)
plt.plot(X_test, my_function(X_test), label="True function")
plt.scatter(X, y, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")


Out[10]:
<matplotlib.legend.Legend at 0x7ff295e13e10>

 Defining polynomial features

The previous example is a simple linear regression and as we can see, the trained model fits the samples (and $f$) quite poorly. As we have seen during the lectures, we can use polynomial features to obtain a more accurate model.

Formally, it consists in defining a polynomial kernel $\phi$ such that of degree $d$ such that, for each $x \in [0, 1]$, we have $$\phi(x) = \left[1, x, x^2, \dots, x^d\right]$$ In sklearn, this can be achieved with the sklearn.preprocessing.PolynomialFeatures function:


In [11]:
from sklearn.preprocessing import PolynomialFeatures
polynomial_features = PolynomialFeatures(degree=1)
polynomial_X = polynomial_features.fit_transform(X[:, np.newaxis])

Note that here we use np.newaxis to create an array of single elements. Applying polynomial_features.fit_transform to X will not lead to the desired result (try it by yourself).


In [12]:
X[:, np.newaxis]


Out[12]:
array([[ 0.5488135 ],
       [ 0.71518937],
       [ 0.60276338],
       [ 0.54488318],
       [ 0.4236548 ],
       [ 0.64589411],
       [ 0.43758721],
       [ 0.891773  ],
       [ 0.96366276],
       [ 0.38344152],
       [ 0.79172504],
       [ 0.52889492],
       [ 0.56804456],
       [ 0.92559664],
       [ 0.07103606],
       [ 0.0871293 ],
       [ 0.0202184 ],
       [ 0.83261985],
       [ 0.77815675],
       [ 0.87001215],
       [ 0.97861834],
       [ 0.79915856],
       [ 0.46147936],
       [ 0.78052918],
       [ 0.11827443],
       [ 0.63992102],
       [ 0.14335329],
       [ 0.94466892],
       [ 0.52184832],
       [ 0.41466194]])

So far, we have just created a feature of degree 1, which won't change much the data we have. However, the shape of the polynomial features has changed:


In [13]:
polynomial_X.shape


Out[13]:
(30, 2)

In [14]:
X[:, np.newaxis].shape


Out[14]:
(30, 1)

Let's take a closer look at what is inside polynomial_X.


In [15]:
polynomial_X


Out[15]:
array([[ 1.        ,  0.5488135 ],
       [ 1.        ,  0.71518937],
       [ 1.        ,  0.60276338],
       [ 1.        ,  0.54488318],
       [ 1.        ,  0.4236548 ],
       [ 1.        ,  0.64589411],
       [ 1.        ,  0.43758721],
       [ 1.        ,  0.891773  ],
       [ 1.        ,  0.96366276],
       [ 1.        ,  0.38344152],
       [ 1.        ,  0.79172504],
       [ 1.        ,  0.52889492],
       [ 1.        ,  0.56804456],
       [ 1.        ,  0.92559664],
       [ 1.        ,  0.07103606],
       [ 1.        ,  0.0871293 ],
       [ 1.        ,  0.0202184 ],
       [ 1.        ,  0.83261985],
       [ 1.        ,  0.77815675],
       [ 1.        ,  0.87001215],
       [ 1.        ,  0.97861834],
       [ 1.        ,  0.79915856],
       [ 1.        ,  0.46147936],
       [ 1.        ,  0.78052918],
       [ 1.        ,  0.11827443],
       [ 1.        ,  0.63992102],
       [ 1.        ,  0.14335329],
       [ 1.        ,  0.94466892],
       [ 1.        ,  0.52184832],
       [ 1.        ,  0.41466194]])

By default, PolynomialFeatures adds a constant term equal to 1. It corresponds to the intercept (or degree 0) of the model we are training.

Let's now define another linear regression and train it on the augmented features $\phi(x) = \left[1, x, x^2\right]$.


In [16]:
linear_regression = LinearRegression()
polynomial_features = PolynomialFeatures(degree=2)
polynomial_X = polynomial_features.fit_transform(X[:, np.newaxis])
my_regression = linear_regression.fit(polynomial_X, y)

And display the result:


In [17]:
X_test = np.linspace(0, 1, 100)
polynomial_X_test = polynomial_features.fit_transform(X_test[:, np.newaxis])
plt.plot(X_test, my_regression.predict(polynomial_X_test), label="Model")
plt.plot(X_test, my_function(X_test), label="True function")
plt.scatter(X, y, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")


Out[17]:
<matplotlib.legend.Legend at 0x7ff295ce9c10>

(Note that we need to apply $\phi$ to X_test because my_regression has been trained on polynomial features!).

As we can see, a second order kernel gives a much better fit than a simple linear regression.

Putting everything together with pipelines

So far, our regression consists of 2 different steps:

  • A polynomial transformation on X which returns polynomial_X
  • A linear regression on polynomial_X and y

It is possible to write this in a more compact way with a Pipeline object, as the following:


In [18]:
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import cross_val_score

degrees = [1, 4, 15]
plt.figure(figsize=(14, 5))

for i in range(len(degrees)):
    ax = plt.subplot(1, len(degrees), i + 1)
    plt.setp(ax, xticks=(), yticks=())

    polynomial_features = PolynomialFeatures(degree=degrees[i])
    linear_regression = LinearRegression()
    pipeline = Pipeline([("polynomial_features", polynomial_features),
                         ("linear_regression", linear_regression)])
    pipeline.fit(X[:, np.newaxis], y)

    # Evaluate the models using crossvalidation
    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
                             scoring="mean_squared_error", cv=10)

    X_test = np.linspace(0, 1, 100)
    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
    plt.plot(X_test, my_function(X_test), label="True function")
    plt.scatter(X, y, label="Samples")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.xlim((0, 1))
    plt.ylim((-2, 2))
    plt.legend(loc="best")
    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
        degrees[i], -scores.mean(), scores.std()))
plt.show()


Regularizing the linear regression

The Linear regression we have been working with solves the following problem: $$\min_w \sum_i \left(w^T \phi\left(x^{(i)}\right) - y^{(i)}\right)^2$$

We can add a regularization term to the optimization problem in order to penalise big coefficients in $w$ and avoid overfitting as a consequence: $$\min_w \sum_i \left(w^T \phi\left(x^{(i)}\right) - y^{(i)}\right)^2 + \alpha ||w||_2^2$$

Note that this leads to adding an extra parameter, $\alpha$, to the model. As we will see in the following, this parameter has a big impact on how much we overfit or underfit the training data.


In [19]:
from sklearn.linear_model import Ridge

In [20]:
polynomial_features = PolynomialFeatures(degree=15)
polynomial_X = polynomial_features.fit_transform(X[:, np.newaxis])

ridge_regression = Ridge(alpha = .1)
my_regression = ridge_regression.fit(polynomial_X, y)

In [21]:
X_test = np.linspace(0, 1, 100)
X_test_polynomial = polynomial_features.fit_transform(X_test[:, np.newaxis])
plt.plot(X_test, my_regression.predict(X_test_polynomial), label="Model")
plt.plot(X_test, my_function(X_test), label="True function")
plt.scatter(X, y, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.title("Ridge regression")


Out[21]:
<matplotlib.text.Text at 0x7ff295975810>

Support vector regression

Support vector regression (sklearn.svm.SVR in sklearn) is another way to train a regression model. The difference with the linear regression with least squares we have used previously is in the loss function: $$\min_w \sum_i \ell \left(w^T \phi\left(x^{(i)}\right), y^{(i)}\right)$$ For ordinary least squares, we had $$\ell \left(w^T \phi\left(x^{(i)}\right), y^{(i)}\right) = \left(w^T \phi\left(x^{(i)}\right) - y^{(i)}\right)^2.$$ For SVR, we have $$\ell \left(w^T \phi\left(x^{(i)}\right), y^{(i)}\right) = \begin{cases} 0,& \text{if } -\epsilon \leq w^T \phi\left(x\right) - y \leq +\epsilon\\ \left| w^T \phi\left(x\right) - y \right| - \epsilon, & \text{otherwise} \end{cases}. $$ Having this in mind, we can apply SVR by chosing an apprioriate value for $\epsilon$.


In [22]:
from sklearn.svm import SVR

In [23]:
epsilon = .1
sv_regression = SVR(epsilon = epsilon)
my_svr = sv_regression.fit(polynomial_X, y)

Note that chosing $\epsilon$ with an order of magnitude close to the estimated noise makes sense because we would not penalize errors smaller than the noise.


In [24]:
plt.plot(X_test, my_regression.predict(X_test_polynomial), label="Model")
plt.plot(X_test, my_function(X_test), label="True function")
plt.scatter(X, y, label="Samples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.title("Support vector regression")


Out[24]:
<matplotlib.text.Text at 0x7ff2959e3150>

Applying regression to standard datasets

As for classification, sklearn proposes standard datasets to evaluate models. These datasets can be found here: http://scikit-learn.org/stable/datasets/. Two dataset are suited for regression problem:

  • The Boston house-prices dataset
  • diabetes dataset

In [25]:
from sklearn import datasets
boston = datasets.load_boston()

In [26]:
print boston.DESCR


Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)


In [27]:
boston.feature_names


Out[27]:
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], 
      dtype='|S7')

In [28]:
print boston.data.shape, boston.target.shape


(506, 13) (506,)

In [29]:
polynomial_features = PolynomialFeatures(degree=3)
polynomial_X = polynomial_features.fit_transform(boston.data)

alpha = .1
ridge_regression = Ridge(alpha = alpha)
my_regression = ridge_regression.fit(polynomial_X, boston.target)


/usr/lib/python2.7/dist-packages/sklearn/linear_model/ridge.py:154: UserWarning: Singular matrix in solving dual problem. Using least-squares solution instead.
  warnings.warn("Singular matrix in solving dual problem. Using "

In [30]:
predicted_prices = my_regression.predict(polynomial_X)

In [31]:
predicted_prices


Out[31]:
array([ 26.36704219,  19.55198776,  29.91913025,  27.58576051,
        30.20720795,  22.45225765,  20.51734097,  23.38970339,
        11.03211485,  22.37648014,  22.06518311,  20.62910928,
        18.09680136,  18.66638972,  18.83215006,  19.03623538,
        25.99380016,  16.61780972,  21.42860504,  16.79217152,
        12.53882666,  16.55622058,  13.57093328,  13.85465875,
        16.23855154,  20.1153093 ,  17.68897276,  18.23456513,
        18.91680237,  19.91264716,  13.06662599,  19.28156197,
        10.47224167,  14.46123079,   8.53410475,  21.76252314,
        21.96864363,  23.64630343,  27.36457736,  29.96648856,
        29.82212581,  28.81710074,  24.38096567,  24.84590611,
        21.51211765,  19.30458635,  20.52834177,  19.85501932,
        15.80274397,  19.81995932,  20.45155669,  19.30233042,
        23.62092475,  19.68076685,  19.84001308,  40.19854078,
        28.42698782,  33.49547267,  20.85081092,  19.53438047,
        20.66694248,  17.52711668,  18.0950959 ,  31.56429407,
        32.61111729,  22.42930182,  19.19917424,  22.53158226,
        17.77150707,  19.25710535,  22.22553885,  19.64279456,
        25.85162283,  20.43572433,  23.6994007 ,  20.53025712,
        22.35483333,  19.90255911,  17.27248355,  20.22464258,
        28.71132788,  22.98245278,  24.815175  ,  21.96741591,
        24.06181907,  26.94648309,  22.60981834,  24.67319218,
        24.09112103,  27.77418972,  22.56045025,  21.39078454,
        27.0066481 ,  29.54616545,  20.74156638,  29.53277916,
        21.20939569,  38.2077444 ,  46.62396996,  33.13025645,
        22.91350695,  25.88604717,  19.03540211,  18.91440436,
        19.76906891,  18.10157533,  17.58074362,  19.32854068,
        19.21386449,  17.3904531 ,  24.62349015,  21.8864836 ,
        20.46606068,  20.97897163,  20.79933097,  21.98129364,
        19.82830182,  18.6517876 ,  24.07924303,  19.72612283,
        18.48673605,  23.47035991,  19.59837366,  18.556496  ,
        18.88300629,  20.61314945,  16.03029084,  19.96227383,
        17.86863877,  14.82564091,  21.24434191,  18.49042144,
        21.17763234,  20.80074949,  17.51306806,  15.66433375,
        16.71242835,  18.37449734,  17.10775931,  14.18647509,
         7.73574424,  14.94110025,  14.8542705 ,  12.35472108,
        14.61248514,  16.49249894,  16.29897448,  15.52704315,
        20.12620984,  17.60131469,  16.47548295,  16.32204993,
        16.84586859,  21.32617572,  15.80239511,  15.53477577,
        13.97336421,  37.99221425,  28.36481514,  19.38895885,
        26.43699258,  54.92120606,  46.78435948,  55.42375016,
        20.90814063,  20.45109069,  50.65047743,  18.75013301,
        22.58065256,  24.50266875,  16.76205891,  21.09781134,
        24.1451902 ,  27.03338715,  24.29225659,  33.61516257,
        24.58070941,  23.051935  ,  29.74484335,  42.08147397,
        39.86096826,  29.48496045,  41.87204265,  38.07404471,
        19.72933788,  23.11653717,  52.27192459,  26.38849166,
        31.35524259,  33.09054459,  36.89281967,  26.19244596,
        36.53854235,  33.87684335,  32.68357887,  45.67963857,
        29.37311731,  28.14928792,  34.25377497,  35.33946144,
        32.37601706,  31.58282496,  39.0519933 ,  46.74215499,
        42.6985579 ,  21.62650268,  24.58943626,  21.38606179,
        25.3386259 ,  21.23118577,  23.90793395,  19.72216505,
        24.02675912,  26.20745153,  24.65602372,  24.6906656 ,
        22.20191208,  26.2949815 ,  18.61452598,  17.20740619,
        27.3599444 ,  18.44531784,  29.67503449,  27.53399214,
        45.19565933,  47.79567368,  40.83495968,  31.67007564,
        38.71539683,  30.79726985,  21.14942792,  32.52359445,
        44.0821268 ,  41.9634919 ,  27.15436374,  21.9915955 ,
        26.80435991,  30.31241042,  28.73353849,  26.71888465,
        27.885249  ,  18.70545409,  23.84118874,  30.34908201,
        14.72179165,  17.90676819,  23.07028337,  16.99031433,
        24.90388339,  27.58144926,  25.57689896,  24.31926499,
        25.43655419,  43.30099111,  20.11161019,  13.36291195,
        46.84473721,  48.7045751 ,  33.73001243,  30.35387624,
        39.01406875,  38.60636788,  42.89701667,  33.57870175,
        36.87253502,  22.9800227 ,  33.03786713,  57.17510073,
        47.20297717,  20.49046633,  13.72898013,  25.56077935,
        24.48554966,  38.42701495,  28.95341489,  26.81696409,
        33.25989303,  28.17279214,  25.45200896,  34.67399049,
        43.25977243,  32.17795783,  50.31476165,  52.2280544 ,
        37.83941997,  24.99668409,  15.54947949,  20.33080401,
        20.69664116,  25.58592666,  29.85688681,  40.78448041,
        30.78311919,  20.02381963,  22.82958548,  31.31973913,
        30.39377256,  23.29614987,  22.0533921 ,  33.10135004,
        22.8404539 ,  20.95001802,  24.59419067,  35.20580424,
        34.74936255,  28.36399469,  34.92271743,  30.35838917,
        26.01887832,  20.15536634,  14.22736838,  29.07626965,
        20.44870396,  21.79695147,  22.29256783,  16.94668179,
        20.84135314,  19.03621287,  22.55983703,  20.31121603,
        23.58882233,  23.09194053,  20.69503749,  17.97288499,
        24.14274579,  24.43376603,  22.49084862,  22.91250251,
        18.96962288,  24.32842171,  17.7363512 ,  11.97249271,
        21.44007529,  22.27030626,  23.57908829,  23.08824632,
        21.22664309,  20.99701043,  23.49240114,  22.25657686,
        20.98068461,  36.44318325,  13.01180267,  25.81971298,
        31.95505329,  17.93839128,  22.80642778,  25.54143666,
        28.96367549,  30.84345891,  20.81208571,  25.99607748,
        20.64962589,  28.9766028 ,  14.66814344,  17.27855892,
        12.82433487,  24.91863323,  20.24068725,  20.8845758 ,
        22.7595527 ,  17.85479508,  16.89134136,  27.4929414 ,
        22.3140136 ,  28.2207969 ,  18.82005039,  24.87873414,
        48.70863249,  51.67523845,  50.89163515,  43.16673221,
        39.75807406,  12.79482542,  13.10179554,  21.62630533,
         9.05164495,  13.449664  ,   4.91428818,  10.64105895,
        10.22480676,  11.52624313,  11.77290322,  11.77557467,
         8.92697748,   6.52615696,   8.10966868,   3.89889291,
        11.93771526,  13.8000838 ,  16.02140849,  17.76737724,
        12.32399857,  19.70482642,  14.82504462,  17.7746282 ,
        15.47283356,  14.77764258,   9.65071104,   9.84481191,
         4.62554796,  12.0929016 ,  14.31884042,   5.95967385,
        13.61623298,   4.09582305,  17.11591505,  30.10446318,
        16.5766646 ,  23.3690729 ,  14.11857593,  15.71927753,
        15.8456689 ,  16.40998332,   6.48178448,   5.62355957,
         8.59142834,   5.60577499,   9.32404811,   6.68714032,
        18.28966575,  17.53882532,  18.69735547,  17.17511537,
        13.30049189,  12.18397617,   5.22799535,  11.10730796,
        14.98092407,  11.36868086,  18.1460769 ,  18.39129384,
        15.3818608 ,  13.48904312,  14.58480358,  11.63334546,
        11.83835721,   5.97761695,  12.85804999,  11.70156877,
         5.39699956,  12.20824732,  15.29530412,  12.73923075,
        10.8005029 ,   5.27266507,  15.67547174,  13.08520456,
        12.24035574,  14.81861647,  10.72555385,  16.33759794,
        16.65052869,  19.2465996 ,  14.21769472,  14.29214106,
        13.17065047,  13.64502996,  13.09725231,  17.46587771,
        18.38902089,  20.05650994,  18.70837834,  20.89303485,
        21.37709862,  21.49266527,  17.33579026,  14.55441388,
        14.99214249,  24.17884821,  17.55810494,  19.39530281,
        21.82109969,  24.87350621,  14.02004885,  14.21941555,
        16.18102219,  10.29174728,  15.39040754,  21.36744092,
        22.17668332,  27.74124762,  30.14243635,  21.79826667,
        20.15783298,  19.23012671,  15.86167465,  24.90846543,
        13.57155475,   8.30067153,   6.79851764,  11.34744476,
        23.49154377,  21.50535665,  22.94479062,  26.80833424,
        20.32038881,  17.12252954,  18.42301886,  17.40526668,
        16.80326753,  22.52990716,  16.97306349,  23.26894376,
        21.09055205,  16.01534743])

In [32]:
from sklearn.cross_validation import train_test_split
features_train, features_test, labels_train, labels_test = train_test_split(polynomial_X, boston.target, test_size = 0.5)

In [33]:
print features_train.shape, features_test.shape, labels_train.shape, labels_test.shape


(253, 560) (253, 560) (253,) (253,)

In [34]:
from sklearn import metrics

In [35]:
predicted_labels_train = my_regression.predict(features_train)
print "MSE on the train set: ", metrics.mean_squared_error(labels_train, predicted_labels_train)
predicted_labels_test = my_regression.predict(features_test)
print "MSE on the test set: ", metrics.mean_squared_error(labels_test, predicted_labels_test)


MSE on the train set:  8.46967866345
MSE on the test set:  8.89644223918

Feel free to tune the parameters manually and see which one gives the best results.