Home Assignment No. 1: Part 1 (Practice)

To solve this task, you will write a lot of code to try several machine learning methods for classification and regression.

  • You are HIGHLY RECOMMENDED to read relevant documentation, e.g. for python, numpy, matlpotlib and sklearn. Also remember that seminars, lecture slides, Google and StackOverflow are your close friends during this course (and, probably, whole life?).

  • If you want an easy life, you have to use BUILT-IN METHODS of sklearn library instead of writing tons of our yown code. There exists a class/method for almost everything you can imagine (related to this homework).

  • To do this part of homework, you have to write CODE directly inside specified places incide notebook CELLS.

  • In some problems you are asked to provide short discussion of the results. In this cases you have to create MARKDOWN cell with your comments right after the your code cell.

  • For every separate problem you can get only 0 points or maximal points for this problem. There are NO INTERMEDIATE scores. So make sure that you did everything required in the task

  • Your SOLUTION notebook MUST BE REPRODUCIBLE, i.e. if the reviewer decides to execute Kernel -> Restart Kernel and Run All Cells, after all the computation he will obtain exactly the same solution (with all the corresponding plots) as in your uploaded notebook. For this purpose, we suggest to fix random seed or (better) define random_state= inside every algorithm that uses some pseudorandomness.

  • Your code must be clear to the reviewer. For this purpose, try to include neccessary comments inside the code. But remember: GOOD CODE MUST BE SELF-EXPLANATORY without any additional comments.

Before the start, read several additional recommendations.

  • Probably you lauch jupyter notebook or ipython notebook from linux console. Try jupyter lab instead - it is a more convenient environment to work with notebooks.
  • Probably the PC on which you are going to evaluate models has limited CPU/RAM Memory. In this case, we recommend to monitor the CPU and Memory Usage. To do this, you can execute htop (for CPU/RAM) or free -s 0.2 (for RAM) in terminal.
  • Probably tou have multiple Cores (CPU) on your PC. Many sklearn algorithms support multithreading (Ensemble Methods, Cross-Validation, etc.). Check if the particular algorithm has n_jobs parameters and set it to -1 to use all the cores.

To begin with, lets import the essential (for this assignment) libraries.


In [1]:
import numpy as np

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

import pandas as pd

from sklearn.datasets import make_moons, make_circles

from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

### BEGIN Your imports
from sklearn.svm import SVC
from mlxtend.plotting import plot_decision_regions
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.gridspec as gridspec
import itertools
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, StratifiedKFold, GridSearchCV, cross_validate
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report, make_scorer, f1_score
from sklearn.random_projection import GaussianRandomProjection
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from IPython.display import display, HTML
from sklearn.multiclass import OneVsOneClassifier, OneVsRestClassifier
### END Your imports

Task 1. Numpy Problem 1 (1 point)

Write a function, which takes a matrix arr and centers each of its rows by the mean within that row. Check this out for documentation.


In [2]:
def center(arr):
    """Center each row of a matrix by the mean value in that row.

    Parameters
    ----------
    arr : array, shape = (n_rows, n_cols)
        The matrix, the rows of which to center.

    Returns
    ----------
    out : array, shape = (n_rows, n_cols)
        The final row-centered matrix.
    """
    assert arr.ndim == 2
    n_rows, n_cols = arr.shape

    ### BEGIN Solution
    out = (arr.T - np.mean(arr, axis=1) * np.ones_like(arr)).T
    ### END Solution
    return out

Task 2. Numpy Problem 2 (1 point)

Implement the following fancy function: \begin{equation} f(x) = \sigma\bigl( \max{x + 5, 0} + \max{5 - x, 0}

            + \max\{\min\{\cos(2 x \pi), \tfrac12\}, -\tfrac14\}
        \bigr)
     \,,

\end{equation} where $\sigma(x) = (1+e^{-x})^{-1}$ is the sigmoid function.


In [3]:
def fancy_function(x):
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    """Compute some fancy function.
    
    Parameters
    ----------
    x : array, 1 dimendional, shape=(n_samples,)
        The array argument values.

    Returns
    -------
    y : array, 1 dimendional, shape=(n_samples,)
        The values of the fancy function.
    """
    ### BEGIN Solution
    print(x)
    max1 = np.amax((x + 5, np.zeros_like(x)), axis=0)
    max2 = np.amax((5 - x, np.zeros_like(x)), axis=0)
    min1 = np.amin((np.cos(2*x*np.pi), np.ones_like(x)*0.5), axis=0)
    max3 = np.amax((min1, np.ones_like(x)*-0.25), axis=0)
    out = sigmoid(max1 + max2 + max3)
    #max2 = np.max([5 - x, 0])
    #min3 = np.min([np.cos(2 * x * np.pi), 0.5])
    #out = sigmoid(max1 +  max2 + np.max([min3, -0.25]))
    ### END Solution
    
    return out

Plot the function


In [4]:
x = np.linspace(-12.5, 12.5, num=1001)
fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(x, fancy_function(x))
plt.show()


[-12.5   -12.475 -12.45  ...  12.45   12.475  12.5  ]

Task 3. Matplotlib (2 points)

Plot the level sets of the $l^p$ norm \begin{equation} \|z\|_p = \biggl(\sum_i \lvert x_i\rvert^p\biggr)^\tfrac1{p} \,. \end{equation} and make the contour of the unit ball in $l^p$ norm stand out. Draw plots for $p \in \{0, \tfrac1{25}, \tfrac12, 1, 1.5, 2, 7, \infty\}$.

Study plotting examples on this and this pages (especially the last one) and have a look at these functions: np.meshgrid, np.linspace in numpy's documentation. We suggest to use np.linalg.norm.

Try to produce a plot that looks like the one below:


In [5]:
p_values = [0., 0.04, 0.5, 1, 1.5, 2, 7, np.inf]
xx, yy = np.meshgrid(np.linspace(-3, 3, num=101),
                     np.linspace(-3, 3, num=101))

fig, axes = plt.subplots(ncols=(len(p_values) + 1)// 2,
                         nrows=2, figsize=(14, 7))

for p, ax in zip(p_values, axes.flat):
    ### BEGIN Solution
    norm = np.linalg.norm([xx, yy], p, axis=0)
    ax.contourf(xx, yy, norm, levels=np.linspace(0, norm.max(), 20), cmap=plt.cm.winter_r)
    ax.contour(xx, yy, norm, levels=[1], colors="hotpink")
    ax.set_title(r"contour = {$x : \| x \|_{%g}$ = 1}" % (p))
    ### END Solution
plt.show()


Task 4. Decision Rules and Feature Engeneering (1+1=2 points)

In this task your goal is to visualize the decision rules of several classifiers applied to artificial $2$-dimensional dataset generated by builtin sklearn.datasets method called make_moons. In the cell below we generate the dataset.


In [6]:
X, y = make_moons(n_samples=500, noise=0.2, random_state=404)

Subproblem 4.1. Decision Rule Plotting (1 of 2 points)

The goal of the subproblem is to fit the following classifiers on features X to target y:

  • Decision Tree (single!) with small depth (e.g. $4$);
  • Random Forest with small number of trees (e.g. $25$) of low depth;
  • Logistic Regreesion;
  • Support Vector Machine with RBF kernel;
  • Gaussian Naive Bayes;
  • k-Nearest Neighbor Classifier with small number of neighbors (e.g. $3$);

For all the fitted classifiers you have to plot the decision regions (the example is shown below the cell). Each plot must have Title which contains name of the classifier and its accuracy on the data.

You can write the plotting code on your own, but we highly recommend just to use mlxtend library (pip install mlxtend in linux terminal), which has awesome one-line decision rule plotting function (you are to google it).


In [7]:
clfs = [
    {'clf': DecisionTreeClassifier(max_depth=4, random_state=101)}, 
    {'clf': RandomForestClassifier(n_estimators=25, max_depth=4, random_state=101, n_jobs=-1)},
    {'clf': LogisticRegression(random_state=101, solver="lbfgs")},
    {'clf': SVC(kernel="rbf", random_state=101, gamma="scale")},
    {'clf': GaussianNB()},
    {'clf': KNeighborsClassifier(n_neighbors=3, n_jobs=-1)}
]

fig, axes = plt.subplots(
    ncols=(len(clfs) + 1)// 2,
    nrows=2, figsize=(6 * ((len(clfs) + 1)) // 2, 12),
    dpi=75
)

for clf, ax in zip(clfs, axes.flat):
    clf['clf'].fit(X, y)
    acc = cross_val_score(clf['clf'], X, y, scoring="accuracy", cv=5).mean()
    fig = plot_decision_regions(X=X, y=y, clf=clf['clf'], legend=2, ax=ax)
    ax.set_title(clf['clf'].__class__.__name__ + ", accuracy = %g" % acc, size=16)
    
plt.tight_layout()


Subproblem 4.2. Pipeline: Fitting to Data by Feature Engeneering (1 of 2 points)

In previous task 1.1 several classifiers obviously failed fitting to data. This happend because the decision rule of the classifier has a restricted form (e.g. linear for linear models), while the data is more complicated.

One may try to change the parameters of the classifier (e.g. increase the number of trees in Forest) in order to improve accuracy, but some models (especially linear) do not have parameters that can change the form of the decision rule.

In this case, the feature engeneering helps: one may try to compute new (e.g. non-linear) features based on the existing pool and fit the classifier in the new features. This may help low-complex claffifiers to fit to hard data dependencies.

Your task it to

  • Choose two classifiers from the following: Decision Tree, Random Forest, Naive Bayes, Logistic Regression;
  • By generating of additional features (e.g. polynomial) make them achieve accuracy $>0.95$.
  • For each classifier, write 2-3 sentences why did you decide to choose these features.
  • Plot their decision rules in the original feature space.

It is your choice how to generate features. You may create hand-crafted featues and add them manualy. Nevertheless, we highly suggest to get used to and apply the following builtin sklearn methods:

  • PolynomialFeatures, GaussianRandomProjection among others - for feature generation
  • StandartScaler, MinMaxScaler among others - for feature scaling
  • Pipeline - for combining several operations in a row (e.g. feature creation & prediction)

In [8]:
### BEGIN Solution

grid_params = [
    {
        'clf': [LogisticRegression(random_state=101, solver="lbfgs")],
        'poly': [PolynomialFeatures()],
        'poly__degree': [2,3],
        'scaler': [StandardScaler()]},
    {
        'clf': [DecisionTreeClassifier(max_depth=4, random_state=101)],
        'poly': [PolynomialFeatures()],
        'poly__degree': [2,3],
        'scaler' : [None]}
]

fig, axes = plt.subplots(ncols=2, nrows=1, figsize=(14, 7), dpi=75)
kfold = StratifiedKFold(n_splits=5, random_state=101, shuffle=True)

for grid_p, ax in zip(grid_params, fig.axes):
    pipe = Pipeline([('scaler', StandardScaler()),
                     ('poly', PolynomialFeatures()),
                     ('clf', RandomForestClassifier(random_state=101, n_jobs=-1))])
    feature_tuned_clf = GridSearchCV(pipe, param_grid = grid_p, cv=kfold, n_jobs=-1, \
                                     verbose=0, scoring=make_scorer(accuracy_score))
    feature_tuned_clf.fit(X, y)
    fig = plot_decision_regions(X=X, y=y, clf=feature_tuned_clf.best_estimator_, legend=2, ax=ax)
    ax.set_title(grid_p['clf'][0].__class__.__name__ + ", accuracy = %g" % \
                 accuracy_score(y, feature_tuned_clf.best_estimator_.predict(X)), size=16)

plt.tight_layout()

### END Solution


The separation line is not linear in the dataset given (and it is even not composed of lines), so that's why some conventional classifiers like LinearRegression and DecisionTree might fail. To address this problem one might transform initial feature space and make them linerarly separable. To do so we use polynomial in such a way that after preprocessing classifiers will be able to separate two classes. In case of LinearRegression the reason is simple, we just transform a curve to a line, but in case of DecisionTree we transform an initial curve to the set of lines (like pixelating), which can be easily processed then.

Task 5. Model Selection (1+1 points)

You are to test Random Forests and Support Vector Machines on a trivial Tic Tac Toe Endgame Dataset. Let's load it.


In [9]:
data = pd.read_csv('data/tic-tac-toe.csv')
X, y = data.drop('class', axis=1), data['class'].astype(int)
data.sample(3, random_state=101)


Out[9]:
TL TM TR ML MM MR BL BM BR class
138 x o x b x o o b x True
673 x o x x o x o o b False
32 x x x o o b b o x True

The dataset consists of several possible endgame positions of the Tic-Tac-Toe game. The target variable is the victory of x player over o player (victory or defeat/draw). Since the features are categorical, we simply transform them to real-valued $-1$ for o, $1$ for x and $0$ for emply cell b.


In [10]:
X = X.applymap(lambda v: 1.0 if v == 'x' else -1.0 if v == 'o' else 0.0)
X.sample(4, random_state=101).sort_index()


Out[10]:
TL TM TR ML MM MR BL BM BR
32 1.0 1.0 1.0 -1.0 -1.0 0.0 0.0 -1.0 1.0
36 1.0 1.0 1.0 -1.0 0.0 1.0 0.0 -1.0 -1.0
138 1.0 -1.0 1.0 0.0 1.0 -1.0 -1.0 0.0 1.0
673 1.0 -1.0 1.0 1.0 -1.0 1.0 -1.0 -1.0 0.0

We are going to test how machine learning algorithms can classify the final game positions into the ones when x player won and all others. Everybody knows that for this problem there is a simple decision rule: x wins if there are three x's in a row/column/diagonal. But can our cool machine learning tools catch this trivial dependence? In this problem the class balance is around $2:1$ so we still use accuracy metric.

Intuitively, this rule is logical, i.e. one may expect decision-tree-based algorithm to be the most appropriate for this case. But is that true? In the code below we compare huge Random forest with simple Logistic Regression and SVM with default parameters.


In [11]:
rf = RandomForestClassifier(n_estimators=400, max_depth=10)
svm = SVC(gamma='auto')
lr = LogisticRegression(solver='lbfgs')
clfs = (rf, svm, lr)

for clf in clfs:
    score = cross_val_score(clf, X, y, cv=5).mean()
    name = clf.__class__.__name__
    print(f'{name} scored {round(score, 3)}')


RandomForestClassifier scored 0.886
SVC scored 0.941
LogisticRegression scored 0.983

We counterintuitively see that RandomForest is outperformed by SVM. Moreover, the simplest Logistic regression significantly outperforms everything. Let's improve Random Forest and SVM in order to be able to compete with Logistic model.

Subproblem 5.1. Model selection for SVM (1 of 2 points)

Perform Grid Search for optimal hyperparameter for SVM model in order to achieve 5-fold validation score on the data not lower than $0.98$. You can code the Grid Search Manually, but we highly encourage yo use builtin GridSearchCV method.


In [12]:
### BEGIN Solution

grid_params = {
        'clf': [SVC(random_state=101, gamma="scale")],
        'clf__kernel': ['linear', 'poly', 'rbf', 'sigmoid'] }

kfold = StratifiedKFold(n_splits=5, random_state=101, shuffle=False)
pipe = Pipeline([('scaler', StandardScaler()),
                 ('clf', SVC(random_state=101, gamma="scale"))])
tuned_clf = GridSearchCV(pipe, param_grid = grid_params, cv=kfold, n_jobs=-1, \
                         verbose=0, scoring=make_scorer(accuracy_score), iid=True)
tuned_clf.fit(X, y)
print("The best kernel: %s" % tuned_clf.best_params_['clf__kernel'])
print("Accuracy: %g" % cross_val_score(tuned_clf.best_estimator_, X, y, cv=5).mean())

### END Solution


The best kernel: linear
Accuracy: 0.983246

Subproblem 5.2. Feature engeneering for Random Forest (1 of 2 points)

Perform feature engeneering for Random Forest in order to achieve 5-fold validation score not lower than $0.94$. Write 2-3 sentences to explicitly explain your motivation for provided feature choice.


In [13]:
rf = RandomForestClassifier(n_estimators=400, max_depth=10, random_state=404)

### BEGIN Solution
X_handcrafted = pd.DataFrame(X.sum(axis=1))
print("Accuracy: %g" % cross_val_score(rf, X_handcrafted, y, cv=5).mean())
### END Solution


Accuracy: 0.983246

Assuming that all possible values are +1, 0 and -1 we can sum all values on the board at final stage and classify win when total sum is more than 0. It is intuitive because win happens once imbalance occurs (except for a few cases when is doesn't hold).

Task 6. Bagging Ensembles of Regressors (2 points)

In this problem you are to deal with Concrete Compressive Strength Dataset. You goal will be to determine the optimal parameters for two Bagging-Based Forest Ensemble Regressors and compare the forests. Lets load the data and split it into test and train parts.


In [14]:
data = pd.read_csv('data/concrete.csv').astype(float)
X = data.drop('concrete_compressive_strength', axis=1)
y = data.concrete_compressive_strength

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

data.sample(3, random_state=101).sort_index()


Out[14]:
cement blast_furnace_slag fly_ash water superplasticizer coarse_aggregate fine_aggregate age concrete_compressive_strength
572 220.8 147.2 0.0 185.7 0.0 1055.0 744.3 7.0 13.09
605 236.0 0.0 0.0 194.0 0.0 968.0 885.0 3.0 6.47
920 136.0 162.0 126.0 172.0 10.0 923.0 764.0 28.0 29.07

Please note that both in Task 4 and Task 5 the whole data was the Train Data. In Task 4 the output score (accuracy) was the train score (i.e. the score on the train data of the model fitted on the same data). In task 5 the output score was the validation score, i.e. the result of validating the model on the train data.

In this problem, we do a step further and split the whole data into train part (on which we train & validate) and test part (where we compute the final test score on the validated model).

In this problem you are to consider the RandomForestRegressor and ExtraTreesRegressor models for the prediction of concrete compressive strength under squared loss function (mean squared error). Recall that Random Forest was discussed in the lectures. Extremely Randomized Forest is an other bootstraped forest with simple tree building algorithm. Basically, each split of each tree node is chosen at random both w.r.t. feature and threshold (while in random forest the split minimizes impurity).

You have to do the following steps and answer the following questions:

  • For both Forests perform the Grid Search (on the train data) over most important algorithm's parameters (what are they?) to determine the optimal hyperparameters.
  • For the optimal parameters output the train, validation score and the score for predicting for the test data.
  • Compare the obtained scores. Explain, why the scores differ a lot for train and validation/test.
  • Which of the algorithms perform better on the training set? Explain why!

In [15]:
### BEGIN Solution

grid_params = [
    {
        'clf': [RandomForestRegressor(random_state=101, n_jobs=-1)],
        'clf__n_estimators': [1, 10, 100],
        'clf__max_features': [1, 2, 4, 8],
        'clf__max_depth': np.linspace(4, 20, 8),
        'clf__bootstrap' : [True, False]},
    {
        'clf': [ExtraTreesRegressor(random_state=101, n_jobs=-1)],
        'clf__n_estimators': [1, 10, 100],
        'clf__max_features': [1, 2, 4, 8],
        'clf__max_depth': np.linspace(4, 20, 8),
        'clf__bootstrap' : [True, False]}
]

output = [
    {
        'clf' : "RndForestRegressor",
        'train' : 0,
        'validation' : 0,
        'test' : 0 },
    {
        'clf' : "ExtraTreesRegressor",
        'train' : 0,
        'validation' : 0,
        'test' : 0 }
]

print('%s %8s %8s %6s' % ("Ensemble Classifier", "Train", "Valid", "Test"))

for i, grid_p in enumerate(grid_params):
    pipe = Pipeline([('clf', RandomForestRegressor(random_state=101, n_jobs=-1))])
    tuned_clf = GridSearchCV(pipe, param_grid=grid_p, cv=5, n_jobs=-1, \
                             verbose=0, scoring='neg_mean_squared_error', iid=True)
    tuned_clf.fit(X_train, y_train)
    output[i]['train'] = np.abs(tuned_clf.score(X_train, y_train))
    output[i]['validation'] = np.abs(tuned_clf.best_score_)
    output[i]['test'] = np.abs(tuned_clf.score(X_test, y_test))
    print('%s %8.4g %8.4g %7.4g' % (output[i]['clf'], output[i]['train'], \
                                    output[i]['validation'], output[i]['test']))
    
### END Solution


Ensemble Classifier    Train    Valid   Test
RndForestRegressor     1.32    25.37   24.75
ExtraTreesRegressor    1.324    24.93   23.96

Randomization itself in the context of ExtraTrees is used to lower the variance. As it is intrinsic for forests structures to have a big bias, resulted loss in test fold might be greater than in train fold, which in fact leads to the case when model is not able to fit correctly.

Task 7. Multi-Label Classification Strategies (2 points)

In this task you deal with multiclass classification problem for Glass Classification Data. Lets load the dataset.


In [16]:
data = pd.read_csv('data/glass.csv')
X, y = data.drop('Type', axis=1), data.Type
data.sample(3, random_state=101)


Out[16]:
RI Na Mg Al Si K Ca Ba Fe Type
74 1.51596 13.02 3.56 1.54 73.11 0.72 7.90 0.0 0.0 2
172 1.51321 13.00 0.00 3.02 70.70 6.21 6.93 0.0 0.0 5
65 1.52099 13.69 3.59 1.12 71.96 0.09 9.40 0.0 0.0 1

The features of each glass oject correspond to the fraction of the particular chemical element in the object. The target variable corresponds to the type of glass (6 classes).

In this problem you have to empirically compare the time complexity and performance of several multiclass labeling strategies for different algorithms. You must consider the following algorithms:

  • Single Decision Tree (depth 7)
  • Medium Random Forest (100 trees of depth 3)
  • KNearestNeighbors (5 neighbors)
  • Logistic Regression

Note that all these algorithms by default support multiclass labeling. Nevertheless, we want you to compare this approach with OneVSRest and OneVSOne approaches applied to this algorithms. More precisely, for every pair (algorithm, approach) you are to perform 5-fold cross validation on the data and output the validation score and the computation time in the table form. Please note that you also have to choose the metric to optimize during CV (e.g. accuracy, balanced accuracy) on your own.

After that, you are to answer the following questions:

  • Which metric did you choose to optimize during cross validation and why? Explain
  • For which algorithms the usage of OneVSRest/OneVSOne approach provides significantly better performance without significant increase in computation time?

In [17]:
### BEGIN Solution

clfs = [
    {'clf': DecisionTreeClassifier(max_depth=7, random_state=101)}, 
    {'clf': RandomForestClassifier(n_estimators=100, max_depth=3, random_state=101, n_jobs=-1)},
    {'clf': LogisticRegression(random_state=101, multi_class='multinomial', solver="newton-cg")},
    {'clf': KNeighborsClassifier(n_neighbors=3, n_jobs=-1)}
]

# Create all neccesary tables
cross_valid = pd.DataFrame(columns=['','LogisticRegression','RandomForestClassifier',\
                                       'KNeighborsClassifier','DecisionTreeClassifier'])
cross_valid[''] = ["Multiclass", "OneVsRest", "OneVsOne"]

time_performance = pd.DataFrame(columns=['','LogisticRegression','RandomForestClassifier',\
                                            'KNeighborsClassifier','DecisionTreeClassifier'])
time_performance[''] = ["Multiclass", "OneVsRest", "OneVsOne"]

# Do all calculations
for clf in clfs:
    one_vs_one = OneVsOneClassifier(clf['clf'], n_jobs=-1)
    one_vs_rest = OneVsRestClassifier(clf['clf'], n_jobs=-1)
    
    valid_score = []
    comp_time = []
    
    for clf_clf in reversed([one_vs_one, one_vs_rest, clf['clf']]):
        cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=101)
        scores = cross_validate(clf_clf, X, y, scoring="recall_macro",
                                cv=cv, return_train_score=False)
        valid_score.append(scores['test_score'].mean())
        comp_time.append(scores['fit_time'].mean())
    
    cross_valid[clf['clf'].__class__.__name__] = valid_score
    time_performance[clf['clf'].__class__.__name__] = comp_time

# Display results
print("Cross validation")
display(HTML(cross_valid.to_html(index=False)))
print("Computation time")
display(HTML(time_performance.to_html(index=False)))

### END Solution


Cross validation
LogisticRegression RandomForestClassifier KNeighborsClassifier DecisionTreeClassifier
Multiclass 0.499881 0.570377 0.575635 0.630516
OneVsRest 0.483095 0.608611 0.579603 0.550655
OneVsOne 0.491131 0.649881 0.575476 0.723294
Computation time
LogisticRegression RandomForestClassifier KNeighborsClassifier DecisionTreeClassifier
Multiclass 0.093208 0.120159 0.001184 0.002270
OneVsRest 0.115456 0.600123 0.012982 0.018615
OneVsOne 0.209205 1.145541 0.016944 0.016445

There is a zoo of possible metrics to be chosen, but the main feature of this dataset is that it is quite unbalanced and multiclass. For that reason, precision metrics are not feaasible because there might be cases when there are not samples from a particular class (that leads to zero in a denominator). Another possible metric is recall, but there are two kinds of them: macro and micro. The latter is not suitable as well, as it focuses too much on the largest classes imposing unfair conditions to less populated classes. After such contemplations the recall_macro was picked as a final evaluation score.

A few words need to be said about computational time. One might easily observed that the greatest score improvment was achieved with DecisionTreeClassifier, but computational time increased significantly (8 times in OneVsOne scenario). Just a little bit lower imporovement was achieved by RandomForestClassifier, but the price for that turned out to be higher (10 times slower in OneVsOne), which is in fact also a good result.