Another essential tool to have as a Data Scientist is a Support Vector Machine (SVM). The simplest of SVMs work by finding a pair of parallel lines that separate the data while maximizing the margin between the lines. This concept can be extended to not only lines, but any curve through the use of kernels. In this exercise, you'll use SVMs for both classification and regression tasks.
As discussed above, the simplest of SVMs work by finding a pair of straight lines, or hyperplanes, which separate the data into two categories while maximizing the distance between them. Analytically, the hyperplanes can be described by the equations $${\vec {w}}\cdot {\vec {x}}-b=1$$ and $${\vec {w}}\cdot {\vec {x}}-b=-1$$ where $\vec{w}$ is the normal vector to the plane, $\vec{x}$ is the set of data points, and $b$ is the intercept. The minimization problem then becomes to minimize $\|{\vec{w}}\|$ subject to $${\displaystyle y_{i}({\vec {w}}\cdot {\vec {x}}_{i}-b)\geq 1,}$$ for $${\displaystyle i=1,\,\ldots ,\,n} {\displaystyle i=1,\,\ldots ,\,n}$$ where $y_i$ is either 1 or -1, representing the categories of $x_i$.
In this exercise, you'll explore these concepts with both user generated and real world data.
1 - Create a set of 100 linearly separable points and create a scatter plot, color coding your data points.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
np.random.seed(56)
In [2]:
# random points between a and b
a = -3
b = 3
X1_lsep = (b - a)*np.random.rand(100) + a
# apply linear equation
X2_lsep = 2*X1_lsep - 1
# add noise +- random integer between a and b
a = 3
b = 10
noise = np.random.choice([-1, 1], 100) * np.random.choice(np.arange(a, b), 100)
# stack coordinates
X_lsep = np.vstack((X1_lsep, X2_lsep + noise)).T
# categorize using linear equation
y_lsep = np.array((X2_lsep + noise >= X2_lsep), dtype=int)
# plot categories
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_lsep[:, 0][y_lsep==0], X_lsep[:, 1][y_lsep==0], 'bo')
ax.plot(X_lsep[:, 0][y_lsep==1], X_lsep[:, 1][y_lsep==1], 'ro');
In [3]:
# solution:
X = np.r_[np.random.randn(50, 2) - [2, 2], np.random.randn(50, 2) + [2, 2]]
y = [0] * 50 + [1] * 50
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='Accent')
plt.xlabel('x1')
plt.ylabel('x2')
Out[3]:
2 - Now fit a linear SVM to the data, using different values for the C
parameter of varying orders of magnitude. Plot the decision lines, margins, and boundary for each. Comment on your results.
In [3]:
def plot_svc_decision_function(model, ax=None, plot_support=True):
"""Plot the decision function for a 2D SVC"""
if ax is None:
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
x = np.linspace(xlim[0], xlim[1], 30)
y = np.linspace(ylim[0], ylim[1], 30)
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
# plot decision boundary and margins
ax.contour(X, Y, P, colors='k',
levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
if plot_support:
ax.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
s=300, linewidth=1, facecolors='none',
edgecolors='k');
ax.set_xlim(xlim)
ax.set_ylim(ylim)
In [4]:
from sklearn.svm import SVC
linsvc = SVC(kernel='linear')
In [5]:
# i'm trying different order of magnitudes of C
for C in [10**i for i in range(-3, 6)]:
# set params and fit
linsvc.set_params(C=C)
linsvc.fit(X_lsep, y_lsep)
# score model
print('Score for C = {}: {:.5f}'.format(C, linsvc.score(X_lsep, y_lsep)))
# plot points and SVM boundaries
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_title('$C = {}$'.format(C))
ax.plot(X_lsep[:, 0][y_lsep==0], X_lsep[:, 1][y_lsep==0], 'bo')
ax.plot(X_lsep[:, 0][y_lsep==1], X_lsep[:, 1][y_lsep==1], 'ro')
plot_svc_decision_function(linsvc, ax=ax, plot_support=True);
The data is very linearly separable, so even for small values of $C$ the SVM classifies correctly all points. Increasing the value we get less and less boundary violations until we get a line very close to the one used to generate the points.
In [5]:
# solution
c_vals = [10**n for n in range(-5, 5, 1)]
models = []
for c in c_vals:
svc = SVC(kernel='linear', C=c, random_state=0)
svc.fit(X, y)
models.append(svc)
In [7]:
# solution
fignum = 1
for clf in models:
# get the separating hyperplance
w = clf.coef_[0]
a = - w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - clf.intercept_[0] / w[1]
# plot the parallels to the separating hyperplane that pass through the
# support vectors (margin away from hyperplane in direction
# perpendicular to hyperplane). This is sqrt(1+a^2) away vertically in 2-d.
margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2))
yy_down = yy - np.sqrt(1 + a**2) * margin
yy_up = yy + np.sqrt(1 + a**2) * margin
# plot the line, the points, and the nearest vectors to the plane
plt.figure(fignum)
plt.clf()
plt.plot(xx, yy, 'k-')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
facecolors='none', zorder=10, edgecolors='k')
plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired, edgecolors='k')
plt.title('C = ' + str(clf.get_params()['C']) + '\n' + 'Accuracy = ' + str(clf.score(X, y)))
plt.axis('tight')
x_min = -4.8
x_max = 4.2
y_min = -6
y_max = 6
XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.predict(np.c_[XX.ravel(), YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
plt.figure(fignum)
plt.pcolormesh(XX, YY, Z, cmap=plt.cm.Paired)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
fignum += 1
3 - Create a data set, as in part (1), but this time make it very noisy, i.e. deliberately make it not quite so linearly separable.
In [6]:
# random points between a and b
a = -3
b = 3
X1_nlsep = (b - a)*np.random.rand(100) + a
# apply linear equation
X2_nlsep = 2*X1_nlsep - 1
# add noise +- random float between a and b
a = 0.1
b = 1.5
noise = np.random.choice([-1, 1], 100) * np.random.choice(np.linspace(a, b, 50), 100)
# stack coordinates
X_nlsep = np.vstack((X1_nlsep, X2_nlsep + noise)).T
# categorize using linear equation
y_nlsep = np.array((X2_nlsep + noise >= X2_nlsep), dtype=int)
# more_noise = np.random.choice(np.arange(100), 10)
# for i in more_noise:
# y_nlsep[i] = (y_nlsep[i] + 1) % 2
# plot categories
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_nlsep[:, 0][y_nlsep==0], X_nlsep[:, 1][y_nlsep==0], 'bo')
ax.plot(X_nlsep[:, 0][y_nlsep==1], X_nlsep[:, 1][y_nlsep==1], 'ro');
In [8]:
# solution
X = np.r_[np.random.randn(50, 2) - [1, 1], np.random.randn(50, 2) + [1, 1]]
y = [0] * 50 + [1] * 50
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='Accent')
Out[8]:
4 - Repeat part (2) as above and comment on your results.
In [112]:
for C in [10**i for i in range(-3, 6)]:
linsvc.set_params(C=C)
linsvc.fit(X_nlsep, y_nlsep)
print('Score for C = {}: {:.5f}'.format(C, linsvc.score(X_nlsep, y_nlsep)))
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_title('$C = {}$'.format(C))
ax.plot(X_nlsep[:, 0][y_nlsep==0], X_nlsep[:, 1][y_nlsep==0], 'bo')
ax.plot(X_nlsep[:, 0][y_nlsep==1], X_nlsep[:, 1][y_nlsep==1], 'ro')
plot_svc_decision_function(linsvc, ax=ax, plot_support=True);
In [9]:
# solution
c_vals = [10**n for n in range(-5, 5, 1)]
models = []
for c in c_vals:
svc = SVC(kernel='linear', C=c, random_state=0)
svc.fit(X, y)
models.append(svc)
In [10]:
# solution
fignum = 1
for clf in models:
# get the separating hyperplance
w = clf.coef_[0]
a = - w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - clf.intercept_[0] / w[1]
# plot the parallels to the separating hyperplane that pass through the
# support vectors (margin away from hyperplane in direction
# perpendicular to hyperplane). This is sqrt(1+a^2) away vertically in 2-d.
margin = 1 / np.sqrt(np.sum(clf.coef_ ** 2))
yy_down = yy - np.sqrt(1 + a**2) * margin
yy_up = yy + np.sqrt(1 + a**2) * margin
# plot the line, the points, and the nearest vectors to the plane
plt.figure(fignum)
plt.clf()
plt.plot(xx, yy, 'k-')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
facecolors='none', zorder=10, edgecolors='k')
plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired, edgecolors='k')
plt.title('C = ' + str(clf.get_params()['C']) + '\n' + 'Accuracy = ' + str(clf.score(X, y)))
plt.axis('tight')
x_min = -4.8
x_max = 4.2
y_min = -6
y_max = 6
XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.predict(np.c_[XX.ravel(), YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
plt.figure(fignum)
plt.pcolormesh(XX, YY, Z, cmap=plt.cm.Paired)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
fignum += 1
This time for small values of $C$ we get bad results, then we get again very near to the original line but the boundaries are very thin.
5 - Head over to the Machine Learning Repository, download the Banknote Authentication Data Set, put it into a dataframe, and split into training and test sets. Be sure to familiarize yourself with the data before proceeding.
In [7]:
banknote = pd.read_csv('banknote.txt')
banknote.head()
Out[7]:
In [8]:
banknote.describe()
Out[8]:
In [9]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# splitting
Xbank_train, Xbank_test, ybank_train, ybank_test = train_test_split(banknote.iloc[:, :-1], banknote.iloc[:, -1], test_size=0.3, random_state=265)
# scaling
sc = StandardScaler()
sc.fit(Xbank_train)
Xbank_train_std = sc.fit_transform(Xbank_train)
Xbank_test_std = sc.fit_transform(Xbank_test)
6 - Fit a linear SVM to the data for various values of the C parameter, reporting your testing and training accuracies. Comment on your results.
In [10]:
from sklearn.svm import LinearSVC
linsvc2 = LinearSVC()
# the other SVC scales badly to higher Cs
In [39]:
for C in [10**i for i in range(-3, 6)]:
linsvc2.set_params(C=C)
linsvc2.fit(Xbank_train_std, ybank_train)
print('Training accuracy for C = {}: {:.5f}'.format(C, linsvc2.score(Xbank_train_std, ybank_train)))
print('Testing accuracy for C = {}: {:.5f}'.format(C, linsvc2.score(Xbank_test_std, ybank_test)))
The results are quite good (this dataset might be almost linearly separable), we can see the training accuracy reaching almost 100% when the regularization is decreased and, as a trade-off, the testing accuracy starts decreasing. The latter is greater for medium values of regularization and reaches its maximum for $C = 100$.
In the exercise above, we worked with data that was linearly separable and could thus be classified reasonably well with only a linear SVM. But what if the decision boundary for our data is non-linear? As it turns out, we can still very effectively use the concepts outlined above, but instead of finding a hyperplane that separates the data, we can use a non-linear kernel function of which sklearn
provides several.
1 - Create a set of 100 random datapoints separated by the boundary given by $f(x) = sin(x)$. Plot your points, color coding each.
In [29]:
# random points between a and b
a = -3
b = 3
X1_sin = (b - a)*np.random.rand(100) + a
# apply sine
X2_sin = np.sin(X1_sin)
# add noise +- random float between a and b
a = 0.1
b = 1.5
noise = np.random.choice([-1, 1], 100) * np.random.choice(np.linspace(a, b, 50), 100)
# stack coordinates
X_sin = np.vstack((X1_sin, X2_sin + noise)).T
# categorize using sine
y_sin = np.array((X2_sin + noise >= X2_sin), dtype=int)
# plot categories
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_sin[:, 0][y_sin==0], X_sin[:, 1][y_sin==0], 'bo')
ax.plot(X_sin[:, 0][y_sin==1], X_sin[:, 1][y_sin==1], 'ro');
2 - Fit 4 SVMs to the data using linear
, rbf
, and poly
kernels of degree = 2
and degree = 3
, leaving all other parameters at their default. Plot the classification and decision boundaries and comment on your results.
In [11]:
kernsvc = SVC()
In [30]:
deg = 2
for kernel in ['linear', 'rbf', 'poly', 'poly']:
# set kernel
kernsvc.set_params(kernel=kernel)
# prepare string for printing model details
message = '{} kernel'.format(kernel)
# if model is polynomial set degree and add to deg 1 after that
if kernel == 'poly':
kernsvc.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
# fit
kernsvc.fit(X_sin, y_sin)
# score
print(message + ': {:.5f}'.format(kernsvc.score(X_sin, y_sin)))
# plot points and boundaries
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_title(message)
ax.plot(X_sin[:, 0][y_sin==0], X_sin[:, 1][y_sin==0], 'bo')
ax.plot(X_sin[:, 0][y_sin==1], X_sin[:, 1][y_sin==1], 'ro')
plot_svc_decision_function(kernsvc, ax=ax, plot_support=True);
The linear kernel has quite good performance, but RBF is the best. A degree 2 polynomial is bad, as expected, while the degree 3 polynomial is performing worst than I thought.
In [12]:
# solution
# solution
def make_plot(slf, X, y, kern, size):
plt.clf()
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=80,
facecolors='none', zorder=10, edgecolors='k')
plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired, edgecolors='k')
plt.title('kernel = ' + kern + '\n' + 'C = ' + str(clf.get_params()['C']) + '\n' + 'Accuracy = ' + str(clf.score(X, y)))
plt.axis('tight')
x_min = -size
x_max = size
y_min = -size
y_max = size
XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
plt.pcolormesh(XX, YY, Z > 0, cmap=plt.cm.Paired)
plt.contour(XX, YY, Z, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'], levels=[-.5, 0, .5])
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
3 - Choose the most appropriate two models you found in part (2), and generate plots with decision boundaries for different values of the C
parameter of differing orders of magnitude.
In [32]:
# linear and RBF
for kernel in ['linear', 'rbf']:
kernsvc.set_params(kernel=kernel)
message = '{} kernel'.format(kernel)
if kernel == 'poly':
kernsvc.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
# various magnitudes of C
for C in [10**i for i in range(-3, 6)]:
kernsvc.set_params(C=C)
kernsvc.fit(X_sin, y_sin)
print(message + ' with C = {}: {:.5f}'.format(C, kernsvc.score(X_sin, y_sin)))
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_title(message + ', $C = {}$'.format(C))
ax.plot(X_sin[:, 0][y_sin==0], X_sin[:, 1][y_sin==0], 'bo')
ax.plot(X_sin[:, 0][y_sin==1], X_sin[:, 1][y_sin==1], 'ro')
plot_svc_decision_function(kernsvc, ax=ax, plot_support=True);
4 - Head over to the Machine Learning Repository, download the Image Segmentation Data Set, put it into a dataframe, and split into training and test sets. Be sure to familiarize yourself with the data before proceeding.
In [7]:
segmentation = pd.read_csv('image_segmentation.txt', index_col=None)
segmentation.head()
Out[7]:
In [13]:
segmentation.info()
In [8]:
segmentation.describe().T
Out[8]:
In [9]:
print(segmentation.describe(include=['O']))
segmentation.CLASS.value_counts()
Out[9]:
In [23]:
# split
Xsegm_train, Xsegm_test, ysegm_train, ysegm_test = train_test_split(segmentation.iloc[:, 1:], segmentation.iloc[:, 0], test_size=0.3, random_state=25)
# scale
sc = StandardScaler()
sc.fit(Xsegm_train)
Xsegm_train_std = sc.fit_transform(Xsegm_train)
Xsegm_test_std = sc.fit_transform(Xsegm_test)
5 - Fit 4 SVMs to the data using linear
, rbf
, and poly
kernels of degree = 2
and degree = 3
, leaving all other parameters at their default, printing the training and testing accuracy for each. Comment on your results.
In [24]:
# in the solutions the results are much worse: it may be due by the fact that I performed scaling?
deg = 2
for kernel in ['linear', 'rbf', 'poly', 'poly']:
kernsvc.set_params(kernel=kernel)
message = '{} kernel'.format(kernel)
if kernel == 'poly':
kernsvc.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
kernsvc.fit(Xsegm_train_std, ysegm_train)
print(message + ' training score: {:.5f}'.format(kernsvc.score(Xsegm_train_std, ysegm_train)))
print(message + ' testing score: {:.5f}'.format(kernsvc.score(Xsegm_test_std, ysegm_test)))
Linear kernel has the best performances, it may be that RBF could outperform it with some tuning but given the added complexity I would choose a linear kernel.
6 - Repeat part (5), but this time trying different values of the C
parameter. Comment on your results.
In [25]:
deg = 2
for kernel in ['linear', 'rbf', 'poly', 'poly']:
kernsvc.set_params(kernel=kernel)
message = '{} kernel'.format(kernel)
if kernel == 'poly':
kernsvc.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
for C in [10**i for i in range(-3, 6)]:
kernsvc.set_params(C=C)
kernsvc.fit(Xsegm_train_std, ysegm_train)
print(message + ' with C = {} training score: {:.5f}'.format(C, kernsvc.score(Xsegm_train_std, ysegm_train)))
print(message + ' with C = {} testing score: {:.5f}'.format(C, kernsvc.score(Xsegm_test_std, ysegm_test)))
With some tuning all the kernels reach about the same accuracy on the test set (96%).
7 - Using rbf
and poly
(degree = 2
) kernels, fit a SVM while trying a few values for the gamma
parameter, printing the training and testing errors. Comment on your results.
In [26]:
deg = 2
# solution: gamma in np.logspace(-9, 3, 13)
for kernel in ['rbf', 'poly']:
kernsvc.set_params(kernel=kernel)
message = '{} kernel'.format(kernel)
if kernel == 'poly':
kernsvc.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
for gamma in [0.1, 0.3, 1, 3, 10]:
kernsvc.set_params(gamma=gamma)
kernsvc.fit(Xsegm_train_std, ysegm_train)
print(message + ' with gamma = {} training score: {:.5f}'.format(gamma, kernsvc.score(Xsegm_train_std, ysegm_train)))
print(message + ' with gamma = {} testing score: {:.5f}'.format(gamma, kernsvc.score(Xsegm_test_std, ysegm_test)))
Increasing $\gamma$, expecially for the polynomial kernel, seems to produce better training accuracies. For greater values the training accuracies starts to get worst.
1 - Generate 500 data points for a damped oscillator given by the function $$f(x) = e^{-\beta x}sin(\nu x)$$ on the range $-2\pi < x < 2\pi$, where $\beta = \frac{1}{2.5}$ and $\nu = 10$. Plot your function and ensure it looks as expected.
In [12]:
# oscillator params
beta = 1/2.5
nu = 10
# evenly spaced points between a and b
a = -2*np.pi
b = 2*np.pi
X1_regr = np.linspace(a, b, 500)
# apply oscillator equation
X2_regr = np.exp(-beta*X1_regr) * np.sin(nu*X1_regr)
#stack coordinates
X_regr = np.vstack((X1_regr, X2_regr)).T
# plot
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_regr[:, 0], X_regr[:, 1], 'b');
2 - Fit a linear regression model to the data in part (1), put the line on the same plot, and report the $R^2$ of the fit, MSE, and coefficients of the fit.
In [13]:
from sklearn.svm import LinearSVR
In [15]:
linsvr = LinearSVR()
# fit
linsvr.fit(X1_regr.reshape(-1, 1), X2_regr)
# predict
predictions = linsvr.predict(X1_regr.reshape(-1, 1))
# plot
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_regr[:, 0], X_regr[:, 1], 'b')
ax.plot(X_regr[:, 0], predictions, 'r');
As we may have expected, the prediction is almost a horizontal line at 0.
In [18]:
# oops, I did linear SVM instead of regression, the results are almost the same though
# also, something strange is going on with how I calculated the MSE vs the mean_squared_error function of sklearn
# (which I didn't know existed...)
def MSE(predictions, y):
residuals = predictions - y
return (residuals**2).mean()
In [19]:
print('R-squared: {:.5f}'.format(linsvr.score(X1_regr.reshape(-1, 1), X2_regr)))
print('Slope and intercept: {:.5f}, {:.5f}'.format(linsvr.coef_[0], linsvr.intercept_[0]))
print('MSE: {:.5f}'.format(MSE(predictions, X2_regr)))
The scores aren't that good...
3 - Repeat part (2) but using SVM with linear
, rbf
, and poly
kernels of degree = 2
and degree = 3
. Comment on your results.
In [14]:
from sklearn.svm import SVR
In [21]:
svr = SVR()
deg = 2
# plot oscillator
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_regr[:, 0], X_regr[:, 1], 'k')
# different kernels and corresponding colors
for kernel, color in zip(['linear', 'rbf', 'poly', 'poly'], ['r', 'g', 'y', 'b']):
# set kernel
svr.set_params(kernel=kernel)
# create message for reporting results
message = '{} kernel'.format(kernel)
# if poly kernel, set degree and add 1 to deg
if kernel == 'poly':
svr.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
# fit
svr.fit(X1_regr.reshape(-1, 1), X2_regr)
#predict
predictions = svr.predict(X1_regr.reshape(-1, 1))
# print scores
print(message + ' R-squared: {:.5f}'.format(svr.score(X1_regr.reshape(-1, 1), X2_regr)))
print(message + ' MSE: {:.5f}'.format(MSE(predictions, X2_regr)))
# add predicted curve to the plot
ax.plot(X_regr[:, 0], predictions, color, label=message)
# add legend
plt.legend();
The results are almost the same for all kernels, only the RBF isn't a straight line but still it's pretty bad at predicting the curve.
4 - Using a rbf
kernel, experiment with the gamma
and epsilon
parameters to try to fit the function. Comment on your results.
In [23]:
# set kernel
svr.set_params(kernel='rbf')
# different gammas
for gamma in [0.1, 0.3, 1, 3, 10]:
# set params
svr.set_params(gamma=gamma)
# create message for reporting results
message = 'rbf kernel for gamma = {}'.format(gamma)
# different epsilons
for epsilon in [0.1, 0.3, 1, 3, 10]:
# set params
svr.set_params(epsilon=epsilon)
# update message
message2 = message + ' and epsilon = {}'.format(epsilon)
# fit
svr.fit(X1_regr.reshape(-1, 1), X2_regr)
# predict
predictions = svr.predict(X1_regr.reshape(-1, 1))
# score
print(message2 + ' R-squared: {:.5f}'.format(svr.score(X1_regr.reshape(-1, 1), X2_regr)))
print(message2 + ' MSE: {:.5f}'.format(MSE(predictions, X2_regr)))
The best scores are for $\gamma = 10$, so I'll try some greater values:
In [24]:
# different gammas
for gamma in [10, 30, 100]:
# set params
svr.set_params(gamma=gamma)
# create message for reporting results
message = 'rbf kernel for gamma = {}'.format(gamma)
# different epsilons
for epsilon in [0.1, 0.3, 1, 3, 10]:
# set params
svr.set_params(epsilon=epsilon)
# update message
message2 = message + ' and epsilon = {}'.format(epsilon)
# fit
svr.fit(X1_regr.reshape(-1, 1), X2_regr)
# predict
predictions = svr.predict(X1_regr.reshape(-1, 1))
# score
print(message2 + ' R-squared: {:.5f}'.format(svr.score(X1_regr.reshape(-1, 1), X2_regr)))
print(message2 + ' MSE: {:.5f}'.format(MSE(predictions, X2_regr)))
$\gamma = 30$ looks good, let's try some smaller values for $\epsilon$.
In [27]:
svr.set_params(gamma=30)
# create message for reporting results
message = 'rbf kernel for gamma = {}'.format(gamma)
# different epsilons
for epsilon in [0.00001, 0.00003, 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3]:
# set params
svr.set_params(epsilon=epsilon)
# update message
message2 = message + ' and epsilon = {}'.format(epsilon)
# fit
svr.fit(X1_regr.reshape(-1, 1), X2_regr)
# predict
predictions = svr.predict(X1_regr.reshape(-1, 1))
# score
print(message2 + ' R-squared: {:.5f}'.format(svr.score(X1_regr.reshape(-1, 1), X2_regr)))
print(message2 + ' MSE: {:.5f}'.format(MSE(predictions, X2_regr)))
Let's go with $\epsilon = 0.001$ and see the plot of the predicted curve:
In [29]:
svr.set_params(epsilon=0.001)
# fit
svr.fit(X1_regr.reshape(-1, 1), X2_regr)
# predict
predictions = svr.predict(X1_regr.reshape(-1, 1))
# plot oscillator
fig, ax = plt.subplots(figsize=(16, 8))
ax.plot(X_regr[:, 0], X_regr[:, 1], 'k')
# add predicted curve to the plot
ax.plot(X_regr[:, 0], predictions, 'r');
It's quite good!
5 - Head over to the Machine Learning Repository, download the Concrete Slump Test Data Set, put it into a dataframe, and split into training and test sets. Be sure to familiarize yourself with the data before proceeding.
In [15]:
concrete = pd.read_csv('concrete.txt')
concrete.head()
Out[15]:
In [16]:
concrete.info()
In [17]:
concrete.describe().T
Out[17]:
In [20]:
# split
Xconcrete_train, Xconcrete_test, yconcrete_train, yconcrete_test = train_test_split(concrete.iloc[:, 1:-1], concrete.iloc[:, -1], test_size=0.3, random_state=59)
# scale
sc = StandardScaler()
sc.fit(Xconcrete_train)
Xconcrete_train_std = sc.fit_transform(Xconcrete_train)
Xconcrete_test_std = sc.fit_transform(Xconcrete_test)
6 - Fit a SVM model to the data, using kernels of your choice, and adjusting the parameters as you see fit. Report your training and testing errors and compare to a standard linear regression model. Comment on your results.
In [25]:
from sklearn.metrics import mean_squared_error
In [27]:
# solution: let's try a simple linear regression first!
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(Xconcrete_train_std, yconcrete_train)
concrete_train_pred = lm.predict(Xconcrete_train_std)
concrete_test_pred = lm.predict(Xconcrete_test_std)
print('Training R-squared: {:.5f}'.format(lm.score(Xconcrete_train_std, yconcrete_train)))
print('Training MSE: {:.5f}'.format(mean_squared_error(concrete_train_pred, yconcrete_train)))
print('Testing R-squared: {:.5f}'.format(lm.score(Xconcrete_test_std, yconcrete_test)))
print('Testing MSE: {:.5f}'.format(mean_squared_error(concrete_test_pred, yconcrete_test)))
In [26]:
svr.set_params(kernel='poly')
svr.fit(Xconcrete_train_std, yconcrete_train)
concrete_train_pred = svr.predict(Xconcrete_train_std)
concrete_test_pred = svr.predict(Xconcrete_test_std)
print('Training R-squared: {:.5f}'.format(svr.score(Xconcrete_train_std, yconcrete_train)))
print('Training MSE: {:.5f}'.format(mean_squared_error(concrete_train_pred, yconcrete_train)))
print('Testing R-squared: {:.5f}'.format(svr.score(Xconcrete_test_std, yconcrete_test)))
print('Testing MSE: {:.5f}'.format(mean_squared_error(concrete_test_pred, yconcrete_test)))
In [29]:
deg = 2
for kernel in ['linear', 'rbf', 'poly', 'poly', 'poly']:
# set kernel
svr.set_params(kernel=kernel)
# create message for reporting results
message = '{} kernel'.format(kernel)
# if poly kernel, set degree and add 1 to deg
if kernel == 'poly':
svr.set_params(degree=deg)
message = message + ' with degree {}'.format(deg)
deg += 1
# fit
svr.fit(Xconcrete_train_std, yconcrete_train)
# predict
concrete_train_pred = svr.predict(Xconcrete_train_std)
concrete_test_pred = svr.predict(Xconcrete_test_std)
# score
print('Training R-squared for {}: {:.5f}'.format(message, svr.score(Xconcrete_train_std, yconcrete_train)))
print('Training MSE for {}: {:.5f}'.format(message, mean_squared_error(concrete_train_pred, yconcrete_train)))
print('Testing R-squared for {}: {:.5f}'.format(message, svr.score(Xconcrete_test_std, yconcrete_test)))
print('Testing MSE for {}: {:.5f}'.format(message, mean_squared_error(concrete_test_pred, yconcrete_test)))
We have really bad generalization for most of the kernels, the better is linear in this sense but the overall results aren't good. I could try to regularize a polynomial kernel of degree 4:
In [ ]:
# solution: good results for poly degree 2 with gamma = 0.001
In [40]:
svr.set_params(kernel='poly', degree=4)
for C in [10**i for i in range(-10, -2)]:
# set params
svr.set_params(C=C)
# create message for reporting results
message = 'Poly kernel of degree 4 kernel with C={}'.format(C)
# fit
svr.fit(Xconcrete_train_std, yconcrete_train)
# predict
concrete_train_pred = svr.predict(Xconcrete_train_std)
concrete_test_pred = svr.predict(Xconcrete_test_std)
# score
print('Training R-squared for {}: {:.5f}'.format(message, svr.score(Xconcrete_train_std, yconcrete_train)))
print('Training MSE for {}: {:.5f}'.format(message, MSE(concrete_train_pred, yconcrete_train)))
print('Testing R-squared for {}: {:.5f}'.format(message, svr.score(Xconcrete_test_std, yconcrete_test)))
print('Testing MSE for {}: {:.5f}'.format(message, MSE(concrete_test_pred, yconcrete_test)))
There's not too much to do it seems...