```
In [2]:
```import pandas as pd
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 10)

In the following you'll see the same code (without visualization) we wrote in the previous module for the rgeression model using both **TV** and **Newspaper** data, so it's nothing new, **except** for the part where we prepare our data. We'll be splitting the dataset into three parts now instead of two:

**Training Set**: we'll train the model on this**Validation Set**: we'll be tuning hyperparameters on this (more on that later)**Tests Set**: we'll be evaluating our model on this

```
In [3]:
```def scale_features(X, scalar=None):
if(len(X.shape) == 1):
X = X.reshape(-1, 1)
if scalar == None:
scalar = StandardScaler()
scalar.fit(X)
return scalar.transform(X), scalar

```
In [3]:
```# get the advertising data set
dataset = pd.read_csv('../datasets/Advertising.csv')
dataset = dataset[["TV", "Radio", "Newspaper", "Sales"]] # filtering the Unamed index column out of the dataset

```
In [7]:
```dataset_size = len(dataset)
training_size = np.floor(dataset_size * 0.6).astype(int)
validation_size = np.floor(dataset_size * 0.2).astype(int)
# First we split the shuffled dataset into three parts: training, validation and test
X_training = dataset[["TV", "Newspaper"]][:training_size]
y_training = dataset["Sales"][:training_size]
X_validation = dataset[["TV", "Newspaper"]][training_size:training_size + validation_size]
y_validation = dataset["Sales"][training_size:training_size + validation_size]
X_test = dataset[["TV", "Newspaper"]][training_size:training_size + validation_size:]
y_test = dataset["Sales"][training_size:training_size + validation_size:]
# Second we apply feature scaling on X_training and X_test
X_training, training_scalar = scale_features(X_training)
X_validation,_ = scale_features(X_validation, scalar=training_scalar)
X_test,_ = scale_features(X_test, scalar=training_scalar)

```
In [40]:
```model = SGDRegressor(loss='squared_loss')
model.fit(X_training, y_training)
w0 = model.intercept_
w1 = model.coef_[0] # Notice that model.coef_ is a list now not a single number
w2 = model.coef_[1]
print "Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂" % (w0, w1, w2)
MSE = np.mean((y_test - model.predict(X_test)) ** 2)
print "The Test Data MSE is: %0.3f" % (MSE)

```
```

From the videos, we learned that the idea of **regularization** is introduced to prevent the model from overfitting to the data points by adding a penality for large weights values. Such penality is expressed mathematically with the second term of the cost function:

This is called **L2 Regularization** and $\lambda$ is called the **Regularization Parameter** , How can we implment it then with scikit-learn for our models?

Well, no worries, scikit-learn implements that for you and we have been using it all the time.
The **SGDRegressor** constructs has two arguments that define the behavior of the penality:

*penalty*: which is a string specifying the type of penality to use (default to 'l2')*alpha*: which is the value of the $\lambda$ in the equation above

Now let's play with the value of alpha and see how does that affect our model's accuracy. Let's set alpha to a large number say 1. In this case we give the values of the weights a very harsh penalty so they'll end up smaller than they should be and the accuracy should be worse!

```
In [41]:
```model = SGDRegressor(loss='squared_loss', alpha=1)
model.fit(X_training, y_training)
w0 = model.intercept_
w1 = model.coef_[0] # Notice that model.coef_ is a list now not a single number
w2 = model.coef_[1]
print "Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂" % (w0, w1, w2)
MSE = np.mean((y_test - model.predict(X_test)) ** 2)
print "The Test Data MSE is: %0.3f" % (MSE)

```
```

```
In [61]:
```alphas = [0.00025, 0.00005, 0.0001, 0.0002, 0.0004]
best_alpha = alphas[0]
least_mse = float("inf") #initialized to infinity
for possible_alpha in alphas:
model = SGDRegressor(loss='squared_loss', alpha=possible_alpha)
model.fit(X_training, y_training)
mse = np.mean((y_validation - model.predict(X_validation)) ** 2)
if mse <= least_mse:
least_mse = mse
best_alpha = possible_alpha
print "The Best alpha is: %.4f" % (best_alpha)
best_model = SGDRegressor(loss='squared_loss', alpha=best_alpha)
best_model.fit(X_training, y_training)
MSE = np.mean((y_test - best_model.predict(X_test)) ** 2) # evaluating the best model on test data
print "The Test Data MSE is: %0.3f" % (MSE)

```
```

The Last thing we have here is to see how we can evaluate our model using the $R^2$ metric. We learned in the videos that the $R^2$ metric measures how close the data points are to our regression line (or plane). We also learned that there's an adjusted version of that metric denoted by $\overline{R^2}$ that penalizes for the extra features we add to the model that doesn't help the model be more accurate. Those metric can be calculated using the following formulas:

$$R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - f_i)^2}{\sum_{i=1}^{n}(y_i - \overline{y})^2}$$where $f_i$ is our model prediction and $overline{y}$ is the mean of all n $y_i$s. And for the adjusted version:

$$\overline{R^2} = R^2 - \frac{k - 1}{n - k}(1 - R^2)$$where $k$ is the number of fatures and $n$ is the number of data samples. Both $R^2$ and $\overline{R^2}$ take a value less than or equal to **1**.The closer it is to one, the better our model is.

Fortunately, we don't have to do all these calculations by hand to use this metric with scikit-learn. The model's **score** method does that for us. It takes the test Xs and ys and spits out the value of $\overline{R^2}$

```
In [64]:
```model = SGDRegressor(loss='squared_loss', eta0=0.02)
model.fit(X_training, y_training)
w0 = model.intercept_
w1 = model.coef_[0] # Notice that model.coef_ is a list now not a single number
w2 = model.coef_[1]
print "Trained model: y = %0.2f + %0.2fx₁ + %0.2fx₂" % (w0, w1, w2)
R2_adjusted = model.score(X_test, y_test)
print "The Model's Adjusted R² on Test Data is %0.2f" % (R2_adjusted)

```
```

Apply the ideas of L2 Regularization and $R^2$ metric to the exercises you did in the last two modules.

Download Kaggle's 2016 US Election Dataset and explore the data using what you learned in Linear Regression. Make assumptions about the data correlations and dependence and test your assumptions using what you learned. If had interesting results, publish your code and your results to the Script's Repo and share them with the community.

```
In [ ]:
```