We can separate the learning problems in a few large categories:
A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the .data
member, which is a n_sample, n_features
array.
Example of digits.data
, which gives access to the features that can be used to classify the digits samples:
In [14]:
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
print(digits.data)
The groundtruth:
In [15]:
digits.target
Out[15]:
Each original sample is an image of shape (8,8) and can be accessed using:
In [16]:
digits.images[0]
Out[16]:
In [17]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(digits.data[:-1], digits.target[:-1])
Out[17]:
Now we can predict new values
In [18]:
clf.predict(digits.data[-1])
Out[18]:
Save a model in the scikit by using Python's built-in persistence model, namely pickle
In [19]:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X,y)
Out[19]:
In [20]:
import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0])
Out[20]:
In [21]:
y[0]
Out[21]:
Model persistence in scikit-learn is worth considering.
In [ ]: