Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
Congratulations! You have just made a big step toward becoming a machine learning practitioner. Not only are you familiar with a wide variety of fundamental machine learning algorithms, you also know how to apply them to both supervised and unsupervised learning problems.
Before we part ways, I want to give you some final words of advice, point you toward some additional resources, and give you some suggestions on how you can further improve your machine learning and data science skills.
When you see a new machine learning problem in the wild, you might be tempted to jump ahead and throw your favorite algorithm at the problem—perhaps the one you understood best or had the most fun implementing. But knowing beforehand which algorithm will perform best on your specific problem is not often possible.
Instead, you need to take a step back and look at the big picture. Here the book provides an easy-to-follow outline on how to approach machine learning problems in the wild (p.331ff.).
The first step is to define a
file MyClass.cpp:
#include <opencv2/opencv.hpp>
#include <opencv2/ml/ml.hpp>
#include <stdio.h>
class MyClass : public cv::ml::StatModel
{
public:
MyClass()
{
print("MyClass constructor\n");
}
~MyClass() {}
int getVarCount() const
{
// returns the number of variables in the training samples
return 0;
}
bool empty() const
{
return true;
}
bool isTrained() const
{
// returns true if the model is trained
return false;
}
bool isClassifier() const
{
// returns true if the model is a classifier
return true;
}
bool train(const cv::Ptr<cv::ml::TrainData>& trainData, int flags=0) const
{
// trains the model
// trainData: training data that can be loaded from file using
// TrainData::loadFromCSV or created with TrainData::create.
// flags: optional flags, depending on the model. Some of the models
// can be updated with the new training samples, not completely
// overwritten (such as NormalBayesClassifier or ANN_MLP).
return false;
}
bool train(cv::InputArray samples, int layout, cv::InputArray responses)
{
// trains the model
// samples: training samples
// layout: see ml::SampleTypes
// responses: vector of responses associated with the training samples
return false;
}
float calcError(const cv::Ptr<cv::ml::TrainData>& data, bool test, cv::OutputArray resp)
{
// calculates the error on the training or test set
// data: the training data
// test: if true, the error is computed over the test subset of the data, otherwise
// it's computed over the training subset of the data.
return 0.0f;
}
float predict(cv::InputArray samples, cv::OutputArray results=cv::noArray(), int flags=0) const
{
// predicts responses for the provided samples
// samples: the input samples, floating-point matrix
// results: the optional matrix of results
// flags: the optional flags, model-dependent. see cv::ml::StatModel::Flags
return 0.0f;
}
};
int main()
{
MyClass myclass;
return 0;
}
Then create a file CMakeLists.txt:
cmake_minimum_required(VERSION 2.8)
project(MyClass)
find_package(OpenCV REQUIRED)
add_executable(MyClass MyClass.cpp)
target_link_libraries(MyClass ${OpenCV_LIBS})
Then you can compile the file from the command line via cmake and make:
$ cmake .
$ make
Then run the file:
$ ./MyClass
This should not generate any error, and print to console:
MyClass constructor
Alternatively, you can write your own classifier using the scikit-learn library.
You can do this by importing BaseEstimator and ClassifierMixin. The latter will
provide a corresponding score method, which works for all classifiers. Optionally, you can
overwrite the score method to provide your own.
The following mixins are available:
ClassifierMixin if you are writing a classifier (will provide a basic score method)RegressorMixin if you are writing a regressor (will provide a basic score method)ClusterMixin if you are writing a clustering algorithm (will provide a basic fit_predict method)TransformerMixin if you are writing a transformer (will provide a basic fit_predict method)
In [1]:
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
In [2]:
class MyClassifier(BaseEstimator, ClassifierMixin):
"""An example classifier"""
def __init__(self, param1=1, param2=2):
"""Called when initializing the classifier
The constructor is used to define some optional
parameters of the classifier. Store them as class
attributes for future access.
Parameters
----------
param1 : int, optional, default: 1
The first parameter
param2 : int, optional, default: 2
The second parameter
"""
self.param1 = param1
self.param2 = param2
def fit(self, X, y=None):
"""Fits the classifier to data
This should fit the classifier to the training data.
All the "work" should be done here.
Parameters
----------
X : array-like
The training data, where the first dimension is
the number of training samples, and the second
dimension is the number of features.
y : array-like, optional, default: None
Vector of class labels
Returns
-------
The fit method returns the classifier object it
belongs to.
"""
return self
def predict(self, X):
"""Predicts target labels
This should predict the target labels of some data `X`.
Parameters
----------
X : array-like
Data samples for which to predict the target labels.
Returns
-------
y_pred : array-like
Target labels for every data sample in `X`
"""
return np.zeros(X.shape[0])
The classifier can be instantiated as follows:
In [3]:
myclass = MyClassifier()
You can then fit the model to some arbitrary data:
In [4]:
X = np.random.rand(10, 3)
myclass.fit(X)
Out[4]:
And then you can proceed to predicting the target responses:
In [5]:
myclass.predict(X)
Out[5]:
The goal of this book was to introduce you to the world of machine learning and prepare you to become a machine learning practitioner. Now that you know everything about the fundamental algorithms, you might want to investigate some topics in more depth.
Although it is not necessary to understand all the details of all the algorithms we implemented in this book, knowing some of the theory behind them might just make you a better data scientist.
Turn to the book to find a list of suggested reading materials, books, and machine learning software!
In this book, we covered a lot of theory and practice.
We discussed a wide variety of fundamental machine learning algorithms, be it supervised or unsupervised, illustrated best practices as well as ways to avoid common pitfalls, and we touched upon a variety of commands and packages for data analysis, machine learning, and visualization.
If you made it this far, you have already made a big step toward machine learning mastery. From here on out, I am confident you will do just fine on your own. All that's left to say is farewell! I hope you enjoyed the ride; I certainly did.