This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

< Chaining Algorithms Together to Form a Pipeline | Contents |

Wrapping Up

Congratulations! You have just made a big step toward becoming a machine learning practitioner. Not only are you familiar with a wide variety of fundamental machine learning algorithms, you also know how to apply them to both supervised and unsupervised learning problems.

Before we part ways, I want to give you some final words of advice, point you toward some additional resources, and give you some suggestions on how you can further improve your machine learning and data science skills.

Approaching a machine learning problem

When you see a new machine learning problem in the wild, you might be tempted to jump ahead and throw your favorite algorithm at the problem—perhaps the one you understood best or had the most fun implementing. But knowing beforehand which algorithm will perform best on your specific problem is not often possible.

Instead, you need to take a step back and look at the big picture. Here the book provides an easy-to-follow outline on how to approach machine learning problems in the wild (p.331ff.).

Writing your own OpenCV based classifier in C++

Since OpenCV is one of those Python libraries that does not contain a single line of Python code under the hood (I'm kidding, but it's close), you will have to implement your custom estimator in C++.

The first step is to define a file MyClass.cpp:

#include <opencv2/opencv.hpp>
#include <opencv2/ml/ml.hpp>
#include <stdio.h>

class MyClass : public cv::ml::StatModel
{
public:
    MyClass()
    {
        print("MyClass constructor\n");
    }

    ~MyClass() {}

    int getVarCount() const
    {
        // returns the number of variables in the training samples
        return 0;
    }

    bool empty() const
    {
        return true;
    }

    bool isTrained() const
    {
        // returns true if the model is trained
        return false;
    }

    bool isClassifier() const
    {
        // returns true if the model is a classifier
        return true;
    }

    bool train(const cv::Ptr<cv::ml::TrainData>& trainData, int flags=0) const
    {
        // trains the model
        // trainData: training data that can be loaded from file using
        //            TrainData::loadFromCSV or created with TrainData::create.
        // flags:     optional flags, depending on the model. Some of the models
        //            can be updated with the new training samples, not completely
        //            overwritten (such as NormalBayesClassifier or ANN_MLP).
        return false;
    }

    bool train(cv::InputArray samples, int layout, cv::InputArray responses)
    {
        // trains the model
        // samples:   training samples
        // layout:    see ml::SampleTypes
        // responses: vector of responses associated with the training samples
        return false;
    }

    float calcError(const cv::Ptr<cv::ml::TrainData>& data, bool test, cv::OutputArray resp)
    {
        // calculates the error on the training or test set
        // data: the training data
        // test: if true, the error is computed over the test subset of the data, otherwise
        //       it's computed over the training subset of the data.
        return 0.0f;
    }

    float predict(cv::InputArray samples, cv::OutputArray results=cv::noArray(), int flags=0) const
    {
        // predicts responses for the provided samples
        // samples: the input samples, floating-point matrix
        // results: the optional matrix of results
        // flags:   the optional flags, model-dependent. see cv::ml::StatModel::Flags
        return 0.0f;
    }
};

int main()
{
    MyClass myclass;
    return 0;
}

Then create a file CMakeLists.txt:

cmake_minimum_required(VERSION 2.8)
project(MyClass)
find_package(OpenCV REQUIRED)
add_executable(MyClass MyClass.cpp)
target_link_libraries(MyClass ${OpenCV_LIBS})

Then you can compile the file from the command line via cmake and make:

$ cmake .
$ make

Then run the file:

$ ./MyClass

This should not generate any error, and print to console:

MyClass constructor

Writing your own Scikit-Learn based classifier in Python:

Alternatively, you can write your own classifier using the scikit-learn library.

You can do this by importing BaseEstimator and ClassifierMixin. The latter will provide a corresponding score method, which works for all classifiers. Optionally, you can overwrite the score method to provide your own.

The following mixins are available:

ClassifierMixin if you are writing a classifier (will provide a basic score method)
RegressorMixin if you are writing a regressor (will provide a basic score method)
ClusterMixin if you are writing a clustering algorithm (will provide a basic fit_predict method)
TransformerMixin if you are writing a transformer (will provide a basic fit_predict method)



In [1]:

    
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin



In [2]:

    
class MyClassifier(BaseEstimator, ClassifierMixin):
    """An example classifier"""

    def __init__(self, param1=1, param2=2):
        """Called when initializing the classifier
        
        The constructor is used to define some optional
        parameters of the classifier. Store them as class
        attributes for future access.
        
        Parameters
        ----------
        param1 : int, optional, default: 1
            The first parameter
        param2 : int, optional, default: 2
            The second parameter
        """
        self.param1 = param1
        self.param2 = param2
    
    def fit(self, X, y=None):
        """Fits the classifier to data
        
        This should fit the classifier to the training data.
        All the "work" should be done here.
        
        Parameters
        ----------
        X : array-like
            The training data, where the first dimension is
            the number of training samples, and the second
            dimension is the number of features.
        y : array-like, optional, default: None
            Vector of class labels
            
        Returns
        -------
        The fit method returns the classifier object it
        belongs to.
        """
        return self
    
    def predict(self, X):
        """Predicts target labels
        
        This should predict the target labels of some data `X`.
        
        Parameters
        ----------
        X : array-like
            Data samples for which to predict the target labels.
        
        Returns
        -------
        y_pred : array-like
            Target labels for every data sample in `X`
        """
        return np.zeros(X.shape[0])

The classifier can be instantiated as follows:



In [3]:

    
myclass = MyClassifier()

You can then fit the model to some arbitrary data:



In [4]:

    
X = np.random.rand(10, 3)
myclass.fit(X)









    Out[4]:





MyClassifier(param1=1, param2=2)

And then you can proceed to predicting the target responses:



In [5]:

    
myclass.predict(X)









    Out[5]:





array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

Where to go from here?

The goal of this book was to introduce you to the world of machine learning and prepare you to become a machine learning practitioner. Now that you know everything about the fundamental algorithms, you might want to investigate some topics in more depth.

Although it is not necessary to understand all the details of all the algorithms we implemented in this book, knowing some of the theory behind them might just make you a better data scientist.

Turn to the book to find a list of suggested reading materials, books, and machine learning software!

Summary

In this book, we covered a lot of theory and practice.

We discussed a wide variety of fundamental machine learning algorithms, be it supervised or unsupervised, illustrated best practices as well as ways to avoid common pitfalls, and we touched upon a variety of commands and packages for data analysis, machine learning, and visualization.

If you made it this far, you have already made a big step toward machine learning mastery. From here on out, I am confident you will do just fine on your own. All that's left to say is farewell! I hope you enjoyed the ride; I certainly did.

< Chaining Algorithms Together to Form a Pipeline | Contents |