To solve this task, you will write a lot of code to try several machine learning methods for classification and regression.
You are HIGHLY RECOMMENDED to read relevant documentation, e.g. for python, numpy, matlpotlib and sklearn. Also remember that seminars, lecture slides, Google and StackOverflow are your close friends during this course (and, probably, whole life?).
If you want an easy life, you have to use BUILT-IN METHODS of sklearn
library instead of writing tons of our yown code. There exists a class/method for almost everything you can imagine (related to this homework).
To do this part of homework, you have to write CODE directly inside specified places inside notebook CELLS.
In some problems you may be asked to provide short discussion of the results. In this cases you have to create MARKDOWN cell with your comments right after the your code cell.
For every separate problem you can get only 0 points or maximal points for this problem. There are NO INTERMEDIATE scores. So make sure that you did everything required in the task
Your SOLUTION notebook MUST BE REPRODUCIBLE, i.e. if the reviewer decides to execute Kernel
-> Restart Kernel and Run All Cells
, after all the computation he will obtain exactly the same solution (with all the corresponding plots) as in your uploaded notebook. For this purpose, we suggest to fix random seed
or (better) define random_state=
inside every algorithm that uses some pseudorandomness.
Your code must be clear to the reviewer. For this purpose, try to include neccessary comments inside the code. But remember: GOOD CODE MUST BE SELF-EXPLANATORY without any additional comments.
Before the start, read several additional recommendations.
jupyter notebook
or ipython notebook
from linux console. Try jupyter lab
instead - it is a more convenient environment to work with notebooks.htop
(for CPU/RAM) or free -s 0.2
(for RAM) in terminal.sklearn
algorithms support multithreading (Ensemble Methods, Cross-Validation, etc.). Check if the particular algorithm has n_jobs
parameters and set it to -1
to use all the cores.Please, write your implementation within the designated blocks:
...
### BEGIN Solution
# >>> your solution here <<<
### END Solution
...
Let's load the dataset for this task.
In [1]:
import numpy as np
import sklearn
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score
%matplotlib inline
In [136]:
data_fs = pd.read_csv(r'data/data_fs.csv', low_memory=False)
Look at the first 10 rows of this dataset.
In [137]:
data_fs.head(10)
Out[137]:
The dataset has many NaN's and also a lot of categorical features. So at first, you should preprocess the data. We can deal with categorical features by using one-hot encoding. To do that we can use pandas.get_dummies
.
In [138]:
# fill nan with 0
data_fs = data_fs.fillna(0)
# our goal is to predict the "price_doc" feature.
y = data_fs[["price_doc"]]
X = data_fs.drop("price_doc", axis=1)
X = X.drop("timestamp", axis=1)
# one-hot encoding
X = pd.get_dummies(X, sparse=True)
In [139]:
# Let's split our dataset into train 70 % and test 30% by using sklearn.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Look at first 10 rows what you get.
X_train.head(10)
Out[139]:
Okay, now let's see how much data we have.
In [7]:
print("Train size =", X_train.shape)
print("Test size =", X_test.shape)
There are too many features in this dataset and not all of them are equally important for our problem. Besides, using the whole dataset as-is to train a linear model will, for sure, lead to overfitting. Instead of painful and time consuming manual selection of the most relevant data, we will use the methods of automatic feature selection.
But at first, we almost forgot to take a look at our targets. Let's plot y_train
histogram.
In [8]:
y_train.hist(bins=100)
Out[8]:
There is a big variance in it and it's far from being a normal distribution. In the real-world problems it happens all the time: the data can be far from perfect. We can use some tricks to make it more like what we want. In this particular case we can predict $\log y$ instead of $y$. This transformation is invertible, so we will be able to get our $y$ back.
In [9]:
y_train_log = np.log(y_train)
y_test_log = np.log(y_test)
y_train_log.hist(bins=100)
Out[9]:
Now it looks more like the data we want to deal with.
The preprocessing is finally over, so now we are ready for the actual task.
If you have difficulties with solving the below problems take a look at seminar $7$ on feature and model selection.
Use random forest to find the imortance of features. Plot the histogram.
In [10]:
from sklearn.ensemble import RandomForestRegressor
### BEGIN Solution
random_forest = RandomForestRegressor(n_estimators=250, random_state=101, n_jobs=4)
random_forest.fit(X_train, y_train_log.values.ravel())
std = np.std([random_forest.feature_importances_ for random_forest in random_forest.estimators_], axis=0)
importances = random_forest.feature_importances_
indices = np.argsort(importances)[::-1]
In [11]:
FEAT_NUM = 20
plt.figure(figsize=(15,10))
plt.title("Top %d important features" % (FEAT_NUM), size=16)
plt.bar(range(FEAT_NUM), importances[indices][:FEAT_NUM],
color="r", yerr=std[indices[:FEAT_NUM]], align="center")
plt.xticks(range(FEAT_NUM), [X_train.columns[indices[f]] for f in range(FEAT_NUM)],
rotation='vertical', size=16)
plt.yticks(size=16)
plt.xlim([-1, FEAT_NUM])
plt.show()
### END Solution
Print the 20 most important features and their values.
In [12]:
### BEGIN Solution
# Print the feature ranking
print("Feature ranking:")
for f in range(FEAT_NUM):
print("%d. %s (%f)" % (f + 1, (X_train.columns[indices[f]]), importances[indices[f]]))
### END Solution
In [13]:
X_train_cut = X_train.filter([X_train.columns[x] for x in indices[:20]], axis=1)
X_test_cut = X_test.filter([X_test.columns[x] for x in indices[:20]], axis=1)
print("New shape of training samples: ", X_train_cut.shape)
print("New shape of testing samples: ", X_test_cut.shape)
On these 20 features train each of the following models
and test its performance using the Root Mean Squared Logarithmic Error (RMSLE).
In [14]:
from sklearn.metrics import mean_squared_log_error
You will need to do it for the next tasks too, so we recommend you to implement a dedicated function for comparisons, which
(X_train, y_train)
and a test sample (X_test, y_test)
(X_train, y_train)
sample(X_test, y_test)
In [15]:
from sklearn import linear_model
from sklearn.metrics import mean_squared_log_error
def comparator(X_train, y_train, X_test, y_test):
"""
Parameters
==========
X_train: ndarray - training inputs
y_train: ndarray - training targets
X_test: ndarray - test inputs
y_test: ndarray - test targets
Returns
=======
pd.DataFrame - table of RMSLE scores of each model on test and train datasets
"""
methods = {
"Linear Regression": sklearn.linear_model.LinearRegression(n_jobs=4),
"Lasso": linear_model.Lasso(random_state=101),
"Ridge": linear_model.Ridge(random_state=101),
"Dtree": sklearn.tree.DecisionTreeRegressor(random_state=101),
"RFR": sklearn.ensemble.RandomForestRegressor(random_state=101, n_estimators =100, n_jobs=4)
}
error_train = []
error_test = []
### BEGIN Solution
for model in methods.values():
model.fit(X_train, y_train.values.ravel())
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
error_train.append(mean_squared_log_error(y_train_pred, y_train))
error_test.append(mean_squared_log_error(y_test_pred, y_test))
### END Solution
return pd.DataFrame({
"Methods": list(methods.keys()),
"Train loss": error_train,
"Test loss": error_test
})
Now apply this function
In [18]:
### BEGIN Solution
result = comparator(X_train_cut, y_train_log, X_test_cut, y_test_log)
print(result)
### END Solution
Implement the following greedy feature selection algorithm:
# Initialize with an empty list of features.
list_of_best_features = []
while round < n_rounds:
round = round + 1
if no_more_features:
# end loop
# Iterate over currently *unsued* features and use $k$-fold
# . `cross_val_score` to measure model "quality".
compute_quality_with_each_new_unused_feature(...)
# **Add** the feature that gives the highest "quality" of the model.
pick_and_add_the_best_feature(...)
if model_quality_has_increased_since_last_round:
round = 0
return list_of_best_features
Use $k=3$ for the $k$-fold cv, because higher values could take a lo-o-o-o-o-o-o-o-ong time.
Please bear in mind that the lower RMSLE (mean_squared_log_error
) is, the higher the model "quality" is.
Please look up cross_val_score(...)
peculiarities in scikit's manual.
In the cell below implement a function that would iterate over a list of features and use $k$-fold cross_val_score
to measure model "quality".
In [39]:
from sklearn.metrics import make_scorer
import warnings
warnings.filterwarnings("ignore")
def selection_step(model, X, y, used_features=(), cv=3):
"""
Parameters
==========
X: ndarray - training inputs
y: ndarray - training targets
used_features: - list of features
cv: int - number of folds
Returns
=======
scores - dictionary of scores
"""
scores = {}
### BEGIN Solution
for feature in X.columns:
if feature not in used_features:
feat_set = list(used_features).copy()
feat_set.append(feature)
rmsle = abs(cross_val_score(model, X[feat_set], y.values.ravel(),
scoring=make_scorer(mean_squared_log_error),
error_score=np.nan, cv=cv, n_jobs=4).mean())
scores[feature] = rmsle
### END Solution
return scores
In [47]:
def forward_steps(X, y, n_rounds, method):
"""
Parameters
==========
X: ndarray - training inputs
y: ndarray - training targets
n_rounds: int - early stop when score doesn't increase n_rounds
method: sklearn model
Returns
=======
feat_best_list - list of features
"""
feat_best_list = []
last_score = np.inf
### BEGIN Solution
round = 0
count = 0
while (round < n_rounds):
round = round + 1
count = count + 1
if (len(feat_best_list) == X.shape[1]):
break
scores = selection_step(method, X, y, feat_best_list)
best_feat = min(scores, key=scores.get)
feat_best_list.append(best_feat)
print(round, best_feat)
if (scores[best_feat] < last_score):
last_score = scores[best_feat]
round = 0
### END Solution
return feat_best_list
Use the function implemented above and use DecisionTreeRegressor to get the best features according to this algorithm and print them.
In [48]:
### BEGIN Solution
from sklearn import tree
# DecisionTreeRegressor
print("Decision Tree Regressor feature ranking")
clf = sklearn.tree.DecisionTreeRegressor(random_state=101)
best_features = forward_steps(X_train, y_train_log, 3, clf)
### END Solution
Use Linear Regression, Ridge regression, Random forest and DecisionTree to get the RMSLE score using these features. Remember the function you wrote earlier.
In [49]:
### BEGIN Solution
result = comparator(X_train[best_features], y_train_log, X_test[best_features], y_test_log)
print(result)
### END Solution
In this task you are asked to implement a boosting algorithm, and compare speed of different popular boosting libraries.
Let's generate a toy dataset for classification.
In [97]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(n_samples=300, shuffle=True, noise=0.05, random_state=1011)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1011)
In [98]:
y_test[y_test == 0] = -1
y_train[y_train == 0] = -1
Your task is:
For basic implementation please refer to seminars $8-9$.
In [99]:
from sklearn.tree import DecisionTreeRegressor
from scipy.optimize import minimize
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.metrics import accuracy_score, precision_score, f1_score
In [130]:
class FuncSeries:
def __init__(self):
self.func_series = []
def __call__(self, X):
sum = self.func_series[0](X)
for f in self.func_series[1:]:
sum += f(X)
return sum
def append(self, func):
self.func_series.append(func)
class GradientBoostingClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, estimators=5):
self.estimators = estimators
self.func_series = FuncSeries()
def fit(self, X, y):
self.func_series.append(lambda X: np.zeros(X.shape[0]))
for i in range(self.estimators):
residuals = 2 * y / (1 + np.exp(2 * y * self.func_series(X)))
clf = DecisionTreeRegressor(max_depth=3)
clf.fit(X, residuals)
self.func_series.append(lambda X: clf.predict(X))
return self
def predict(self, X):
predicted = np.sign(self.func_series(X)).astype(np.int)
predicted[predicted == 0] = -1
return predicted
In [131]:
class GBM:
def __init__(self, estimator, estimator_params, n_estimators):
self.base_estimator = estimator
self.params = estimator_params
self.n_estimators = n_estimators
self.cascade = []
def fit(self, X, y):
for i in range(self.n_estimators):
s = y / (1.0 + np.exp(y * self._output(X)))
new_estimator = self.base_estimator(**self.params)
new_estimator.fit(X, s)
self.cascade.append(new_estimator)
def _output(self, X):
res = np.zeros(X.shape[0])
for i in range(len(self.cascade)):
res += self.cascade[i].predict(X)
return res
def predict_proba(self, X):
return 1.0 / (1.0 + np.exp(-self._output(X)))
def predict(self, X):
res = np.sign(self._output(X))
res[res == 0] = -1
return res
In [133]:
### BEGIN Solution
model = GradientBoostingClassifier(estimators=6)
params = {
'max_depth' : 2
}
# model = GBM(DecisionTreeRegressor, params, 100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
results = {'Accuracy' : accuracy_score(y_test, y_pred),
'Precision': precision_score(y_test, y_pred),
'F1_score' : f1_score(y_test, y_pred)}
print ("Results:")
for key, value in results.items():
print ("%s %.3f" % (key, value))
### END Solution
In [134]:
from mlxtend.plotting import plot_decision_regions
plt.figure(figsize=(10,7))
plt.title("Decision boundary", size=16)
plot_decision_regions(X=X_train, y=y_train, clf=model, legend=2)
plt.tight_layout()
plt.show()
In this task you are asked to compare the training time of the GBDT, the Gradient Boosted Decision Trees, as implemeted by different popular ML libraries. The dataset you shall use is the UCI Breast Cancer dataset. You should study the parameters of each library and establish the correspondence between them.
The plan is as follows:
You need to make sure that you are comparing comparable classifiers, i.e. with the same tree and ensemble hyperparameters.
**NOTE** You need figure out how to make parameter settings compatible. One possible way to understand the correspondence is to study the docs. You may choose the default parameters from any library.
Please plot three ROC curves, one per library, on the same one plot with a comprehensible legend.
A useful command for timing is IPython's timeit cell magic.
In [141]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
import xgboost as xgb
import catboost as ctb
import lightgbm as lgb
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=0x0BADBEEF)
In [142]:
### BEGIN Solution
default_depth = 6
default_estimators = 100
clfs = {
"CatBoost" : ctb.CatBoostClassifier(logging_level='Silent',
max_depth=default_depth,
n_estimators=default_estimators),
"LGBM" : lgb.LGBMClassifier(max_depth=default_depth,
n_estimators=default_estimators),
"XGBC" : xgb.XGBClassifier(max_depth=default_depth,
n_estimators=default_estimators)
}
plt.figure(figsize=(10,8))
for key, clf in clfs.items():
probas = clf.fit(X_train, y_train).predict_proba(X_test)
fpr, tpr, thresholds = roc_curve(y_test, probas[:, 1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, lw=2, alpha=0.8,
label='ROC %s (AUC = %0.2f)' % (key, roc_auc))
plt.xlabel('False Positive Rate', size=14)
plt.ylabel('True Positive Rate', size=14)
plt.title('Receiver operating characteristic example', size=14)
plt.legend(loc="lower right")
plt.show()
### END Solution
In [143]:
tunned_params = [{
'max_depth' : range(2, 8, 2),
'n_estimators' : range(40, 160, 20)
}]
cv = StratifiedKFold(n_splits=3)
plt.figure(figsize=(10,8))
for key, clf in models.items():
gs = GridSearchCV(clf, tunned_params, cv=3, iid=True,
scoring='roc_auc', n_jobs=4, return_train_score=True)
gs.fit(X_train, y_train)
clf = gs.best_estimator_
probas = clf.predict_proba(X_test)
fpr, tpr, thresholds = roc_curve(y_test, probas[:, 1])
plt.plot(fpr, tpr, lw=2, alpha=0.6,
label='%s | %s = %d; %s = %d'
% (key, 'max_depth', gs.best_params_['max_depth'],
'n_estimators', gs.best_params_['n_estimators']))
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()
In [350]:
estimator_params = [{
'n_estimators' : range(40, 160, 20)
}]
depth_params = [{
'max_depth' : range(2, 10, 2),
}]
f, axes = plt.subplots(1, 2, sharex=True, figsize=(15, 7))
axes[0].set_title('Relative time')
axes[1].set_title('Absolute time')
def plot_time_chart(axes, params, key):
grid_cv = GridSearchCV(model, params, cv=3, iid=True,
scoring='roc_auc', n_jobs=4, return_train_score=True)
grid_cv.fit(X_train, y_train)
axes[0].plot(params[0][key],
np.array(grid_cv.cv_results_['mean_fit_time']) / grid_cv.cv_results_['mean_fit_time'][0],
label=grid_cv.best_estimator_.__class__.__name__)
axes[1].plot(params[0][key],
grid_cv.cv_results_['mean_fit_time'],
label=grid_cv.best_estimator_.__class__.__name__)
for model in models.values():
plot_time_chart(axes, estimator_params, 'n_estimators')
axes[0].legend()
axes[1].legend()
plt.tight_layout()
plt.show()
In [351]:
i = 0
f, axes = plt.subplots(1, 2, sharex=True, figsize=(15, 7))
axes[0].set_title('Relative time')
axes[1].set_title('Absolute time')
for model in models.values():
plot_time_chart(axes, depth_params, 'max_depth')
axes[0].legend()
axes[1].legend()
plt.tight_layout()
plt.show()
Plot the following activation functions using their PyTorch realizations and their derivatives using autograd functionality:
In [33]:
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torch
x = torch.arange(-2, 2, .01, requires_grad=True)
x.sum().backward() # to create x.grad
f, axes = plt.subplots(2, 2, sharex=True, figsize=(15, 7))
axes[0, 0].set_title('Values')
axes[0, 1].set_title('Derivatives')
for i, function_set in (0, (('ReLU', F.relu), ('ELU', F.elu), ('Softplus', F.softplus))), \
(1, (('Sign', torch.sign), ('Sigmoid', torch.sigmoid), ('Softsign', F.softsign), ('Tanh', torch.tanh))):
for function_name, activation in function_set:
### BEGIN Solution
x.grad.data.zero_()
y = activation(x)
axes[i, 0].plot(x.data.numpy(), y.data.numpy(), label=function_name)
y.sum().backward()
axes[i, 1].plot(x.data.numpy(), x.grad.data.numpy(), label=function_name)
### END Solution
axes[i, 0].legend()
axes[i, 1].legend()
plt.tight_layout()
plt.show()
Answer the following questions. Which of these functions may be, and which -- definitely are a poor choise as an activation function in a neural network? Why?
The main requirement for backprop is activation function being differentiable. However, the sign function is non-differentiable at x = 0 and it has 0 derivative elsewhere. Therefore the gradient descent will not be able to update the weights.
Another problem may be with ReLU activation function which may cause lot of redundant or dead nodes in a net. Thus neurons do not contribute to the fial result, and do not have a derivative.
At the seminar 10 on neural networks, we built an MLP with one hidden layer using our numpy implementations of linear layer and logistic and softmax activation functions. Your task is to
sklearn.datasets
.
In [34]:
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_digits
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
Prepare the dataset.
In [233]:
digits, targets = load_digits(return_X_y=True)
digits = digits.astype(np.float32) / 255
digits_train, digits_test, targets_train, targets_test = train_test_split(digits, targets, random_state=0)
train_size = digits_train.shape[0]
test_size = digits_test.shape[0]
input_size = 8*8
classes_n = 10
Implement the MLP with backprop.
In [249]:
class Linear:
def __init__(self, input_size, output_size):
self.thetas = np.random.randn(input_size, output_size)
self.thetas_grads = np.empty_like(self.thetas)
self.bias = np.random.randn(output_size)
self.bias_grads = np.empty_like(self.bias)
self.input = None
self.out = None
def forward(self, x):
self.input = x
output = np.matmul(x, self.thetas) + self.bias
self.out = output
return output
def backward(self, x, output_grad):
### BEGIN Solution
self.input = self.input.reshape(-1,1)
input_grad = np.matmul(self.thetas, output_grad)
self.thetas_grads += self.input @ output_grad.T
self.bias_grads += output_grad.sum(axis=1)
assert self.thetas_grads.shape == self.thetas.shape
assert self.bias_grads.shape == self.bias.shape
### END Solution
return input_grad
class LogisticActivation:
def __init__(self):
self.input = None
self.out = None
def forward(self, x):
self.input = x
output = 1/(1 + np.exp(-x))
self.out = output
return output
def backward(self, x, output_grad):
### BEGIN Solution
self.out = self.out.reshape(-1,1)
input_grad = output_grad * self.out * (1. - self.out)
### END Solution
return input_grad
class SoftMaxActivation:
def __init__(self):
self.input = None
self.out = None
def forward(self, x):
self.input = x
output = np.exp(x) / np.exp(x).sum(axis=-1, keepdims=True)
self.out = output
return output
def backward(self, x, output_grad):
### BEGIN Solution
self.out = self.out.reshape(-1,1)
input_grad = output_grad * self.out * (1. - self.out)
### END Solution
return input_grad
class MLP:
def __init__(self, input_size, hidden_layer_size, output_size):
self.linear1 = Linear(input_size, hidden_layer_size)
self.activation1 = LogisticActivation()
self.linear2 = Linear(hidden_layer_size, output_size)
self.softmax = SoftMaxActivation()
def forward(self, x):
return self.softmax.forward((self.linear2.forward(self.activation1.forward(self.linear1.forward(x)))))
def backward(self, x, output_grad):
### BEGIN Solution
output_grad = self.linear2.backward(x, output_grad)
output_grad = self.activation1.backward(x, output_grad)
output_grad = self.linear1.backward(x, output_grad)
### END Solution
In [353]:
### BEGIN Solution
def cross_entropy_loss(predicted, target):
target_vector = np.zeros_like(predicted)
if (predicted.ndim != 1):
target_vector[np.arange(len(target)), target] = 1
cost = -np.sum(target_vector * np.log2(predicted), axis=1)
else:
target_vector[target] = 1
cost = -np.sum(target_vector * np.log2(predicted))
return cost
def grad_cross_entropy_loss(predicted, target):
target_vector = np.zeros_like(predicted)
target_vector[target] = 1
return (predicted - target_vector).reshape(-1, 1)
### END Solution
In [354]:
np.random.seed(0)
mlp = MLP(input_size=input_size, hidden_layer_size=100, output_size=classes_n)
epochs_n = 250
learning_curve = [0] * epochs_n
test_curve = [0] * epochs_n
x_train = digits_train
x_test = digits_test
y_train = targets_train
y_test = targets_test
learning_rate = 1e-2
for epoch in range(epochs_n):
if epoch % 10 == 0:
print('Starting epoch', epoch)
for sample_i in range(train_size):
x = x_train[sample_i]
target = y_train[sample_i]
### BEGIN Solution
# ... zero the gradients
mlp.linear1.thetas_grads = np.zeros_like(mlp.linear1.thetas_grads)
mlp.linear1.bias_grads = np.zeros_like(mlp.linear1.bias_grads)
mlp.linear2.thetas_grads = np.zeros_like(mlp.linear2.thetas_grads)
mlp.linear2.bias_grads = np.zeros_like(mlp.linear2.bias_grads)
# prediction = mlp.forward(x)
predicted_value = mlp.forward(x)
loss = cross_entropy_loss(predicted_value, target) # use cross entropy loss
loss_grad = grad_cross_entropy_loss(predicted_value, target)
learning_curve[epoch] += loss
grad = mlp.backward(x, loss_grad)
# ... perform backward pass
# ... update the weights simply with weight -= grad * learning_rate
mlp.linear1.thetas -= learning_rate * mlp.linear1.thetas_grads
mlp.linear1.bias -= learning_rate * mlp.linear1.bias_grads
mlp.linear2.thetas -= learning_rate * mlp.linear2.thetas_grads
mlp.linear2.bias -= learning_rate * mlp.linear2.bias_grads
learning_curve[epoch] /= train_size
prediction = mlp.forward(x_test)
loss = cross_entropy_loss(prediction, y_test).mean()
test_curve[epoch] = loss
### END Solution
plt.plot(learning_curve)
plt.plot(test_curve)
Out[354]:
In [355]:
predictions = mlp.forward(digits).argmax(axis=1)
pd.DataFrame(confusion_matrix(targets, predictions))
Out[355]:
In this task you will train your own CNN for dogs vs cats classification task. The goal of this task is not to get the highest accuracy possible (try getting the highest accuracy possible though) but to model the real-life process of training a deep neural network.
There is a good amount of datasets in torchvision, but in practice, chances are that you wouldn't find the dataset for your particular problem, so you should be capable of writing DataLoader
for your own dataset.
In [19]:
from torch.utils.data import DataLoader, Dataset
import torch.nn.functional as F
import PIL.Image as Image
from torch import nn
import numpy as np
import torch.optim as optim
import matplotlib.pyplot as plt
import pandas as pd
import torch
from torchvision import transforms, utils
from PIL import Image
import os
import os.path
import sys
import progressbar
Make sure you are using the right device.
In [20]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
First take a look at the data.
In [21]:
dt = pd.read_csv(r'data/cats_dogs/train.csv')
dt.head()
Out[21]:
In [22]:
Image.open('data/' + dt['path'].iloc[1])
Out[22]:
Implement your Dataset
class.
In [23]:
#Change class name
class ImageFolder(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to csv file
root_dir (string): Root directory path.
"""
self.dt = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __getitem__(self, idx):
"""
Args:
index (int): Index
Returns:
tuple: (sample, target) where target is class_index of the target class.
"""
path = self.root_dir + '/' + self.dt.iloc[idx]['path']
target = self.dt.iloc[idx]['y']
with open(path, 'rb') as f:
sample= Image.open(f).convert('RGB')
sample = self.transform(sample)
return sample, target
def __len__(self):
return self.dt.shape[0]
In [24]:
root_dir = './data'
image_size = 224
batch_size = 8
workers = 2
ngpu = 2
In [25]:
dataset = ImageFolder('data/cats_dogs/train.csv', root_dir)
len(dataset)
Out[25]:
Define the augmentation tranform and instantiate training and validation subsets of your Dataset
and the correpsonding DataLoaders
.
In [26]:
data_transform_train = transforms.Compose([
transforms.RandomResizedCrop(image_size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
data_transform_test = transforms.Compose([
transforms.Resize(image_size),
transforms.CenterCrop(image_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
### BEGIN Solution
dataset_train = ImageFolder('data/cats_dogs/train.csv', root_dir, transform=data_transform_train)
dataset_val = ImageFolder('data/cats_dogs/validation.csv', root_dir, transform=data_transform_test)
train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=batch_size,
shuffle=True, num_workers=workers)
val_loader = torch.utils.data.DataLoader(dataset_val, batch_size=batch_size,
shuffle=False, num_workers=workers)
### END Solution
Make sure that dataloader works as expected by observing one sample from it.
In [27]:
for X,y in train_loader:
print(X[0])
print(y[0])
plt.imshow(np.array(X[0,0,:,:]))
break
Implement your model below. You can use any layers that you want, but in general the structure of your model should be
In [28]:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.convol = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(16, 32, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Dropout(),
nn.ReLU()
)
self.linear = nn.Sequential(
nn.Linear(64 * 28 * 28, 256),
nn.ReLU(),
nn.Linear(256, 84),
nn.ReLU(),
nn.Dropout(),
nn.Linear(84, 2),
nn.LogSoftmax(dim=1)
)
def forward(self, input):
x = self.convol(input)
x = x.view(x.size(0), -1)
x = self.linear(x)
return x
Send your model to GPU, if you have it.
In [29]:
def create_model(net, device):
model = net.to(device)
if (device.type == 'cuda') and (ngpu > 1):
model = nn.DataParallel(model, list(range(ngpu)))
return model
Implement your loss function below, or use the predefined loss, suitable for this task.
In [30]:
### BEGIN Solution
criterion = nn.CrossEntropyLoss().cuda()
### END Solution
Try two different optimizers and choose one. For the optimizer of your choice, try two different sets of parameters (e.g learning rate). Explain both of your choices and back them with the learning performance of the network (see the rest of the task).
In this parts of the task you may try more than two options, but, please, leave in your solution only the results for two different optimizers and two different sets of parameters.
You may finally train you model. Don't forget to:
tensorboardX
extremely useful for this task);Your model should be able to show at least 75% validation accuracy.
You may also find useful the following parts of documentation: Module.train
, Module.eval
, Module.state_dict
, Module.load_state_dict
.
In [31]:
def save_checkpoint(model, path):
torch.save({
'model_state_dict': model.state_dict(),
}, path)
def load_checkpoint(model, path):
checkpoint = torch.load(path)
model.load_state_dict(checkpoint['model_state_dict'])
def accuracy_score(model):
correct = 0
total = 0
model.eval()
with torch.no_grad():
for data in val_loader:
samples = data[0].to(device)
labels = data[1].to(device)
outputs = model(samples)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network: %d %%' % (100 * correct / total))
model.train()
In [32]:
### BEGIN Solution
def train(model, optimizer):
model_losses_train = []
model_losses_val = []
model_accuracy_val = []
num_epochs = 100
it = 0
with progressbar.ProgressBar(max_value = num_epochs * len(train_loader)) as bar:
for epoch in range(num_epochs):
train_loss = 0
for i, data in enumerate(train_loader, 0):
model.zero_grad()
labels = data[1].to(device)
samples = data[0].type(torch.FloatTensor).to(device)
output = model(samples)
err = criterion(output, labels)
err.backward()
optimizer.step()
train_loss += err.item()
bar.update(it)
it += 1
if epoch % 5 == 0:
model_losses_train.append(train_loss / len(train_loader))
model.eval()
with torch.no_grad():
test_loss = 0
for data_val in val_loader:
labels_val = data_val[1].to(device)
samples_val = data_val[0].type(torch.FloatTensor).to(device)
output_val = model(samples_val)
err_val = criterion(output_val, labels_val)
test_loss += err_val.item()
model_losses_val.append(test_loss / len(val_loader))
model.train()
plt.figure(figsize=(10,5))
plt.title("CE Loss During Training")
plt.plot(model_losses_train, label="Train")
plt.plot(model_losses_val, label="Validation")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()
### END Solution
In [33]:
# Model = Net(), optimizer = Adam, lr = 1e-5
model = create_model(Net(), device)
lr = 1e-5
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-7)
train(model, optimizer)
save_checkpoint(model, './adam_relu_lr=1e-5')
accuracy_score(model)
In [34]:
# Model = Net(), optimizer = Adam, lr = 1e-7
model = create_model(Net(), device)
lr = 1e-5
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-7)
train(model, optimizer)
save_checkpoint(model, './adam_relu_lr=1e-7')
accuracy_score(model)
In [35]:
# Model = Net(), optimizer = SGD, lr = 1e-5, momentum=0.9
model = create_model(Net(), device)
lr = 1e-5
optimizer = optim.SGD(model.parameters(), lr = 1e-5, momentum = 0.9)
train(model, optimizer)
save_checkpoint(model, './sgd_relu_lr=1e-5')
accuracy_score(model)
Load Checkpoints and check accuracy score
In [39]:
model = create_model(Net(), device)
load_checkpoint(model, './adam_relu_lr=1e-5')
accuracy_score(model)
In [ ]:
In [40]:
class Sign(nn.Module):
def __init__(self):
super(Sign, self).__init__()
def forward(self, input):
return torch.sign(input)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.convol = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
Sign(),
nn.MaxPool2d(2, 2),
nn.Conv2d(16, 32, 3, padding=1),
Sign(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, 3, padding=1),
Sign(),
nn.MaxPool2d(2, 2),
nn.Dropout(),
Sign()
)
self.linear = nn.Sequential(
nn.Linear(64 * 28 * 28, 256),
Sign(),
nn.Linear(256, 84),
Sign(),
nn.Dropout(),
nn.Linear(84, 2),
nn.LogSoftmax(dim=1)
)
def forward(self, input):
x = self.convol(input)
x = x.view(x.size(0), -1)
x = self.linear(x)
return x
In [42]:
model = create_model(Net(), device)
lr = 1e-5
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-7)
In [43]:
train(model, optimizer)
accuracy_score(model)