Introduction to Scikit-learn



In [ ]:

    
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline



In [ ]:

    
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, random_state=0)



In [ ]:

    
X_train.shape



In [ ]:

    
np.bincount(y_train)

Really Simple API

0) Import your model class



In [ ]:

    
from sklearn.svm import LinearSVC

1) Instantiate an object and set the parameters



In [ ]:

    
svm = LinearSVC()

2) Fit the model



In [ ]:

    
svm.fit(X_train, y_train)

3) Apply / evaluate



In [ ]:

    
print(svm.predict(X_train))
print(y_train)



In [ ]:

    
svm.score(X_train, y_train)



In [ ]:

    
svm.score(X_test, y_test)

And again



In [ ]:

    
from sklearn.ensemble import RandomForestClassifier



In [ ]:

    
rf = RandomForestClassifier(n_estimators=50)



In [ ]:

    
rf.fit(X_train, y_train)



In [ ]:

    
rf.score(X_test, y_test)

Exercises

Load the iris dataset from the sklearn.datasets module using the load_iris function.

Split it into training and test set using train_test_split.

Then train an evaluate a classifier of your choice. Try sklearn.neighbors.KNeighborsClassifier for example.



In [ ]:

    
# %load solutions/train_iris.py