Multi-label classification and problem transformation

binary classification, in which each instance must be assigned to one of the two classes, and multi-class classification, in which each instance must be assigned to one of the set of classes. The final type of classification problem that is multi-label classification, in which each instance can be assigned a subset of the set of classes. Examples of multi-label classification include assigning tags to messages posted on a forum, and classifying the objects present in an image. There are two groups of approaches for multi-label classification.

Problem transformation methods are techniques that cast the original multi-label problem as a set of single-label classification problems. The first problem transformation method that we will review converts each set of labels encountered in the training data to a single label. For example, consider a multi-label classification problem in which news articles must be assigned to one or more categories from a set. The following training data contains seven articles that can pertain to one or more of the five categories.
Transforming the problem into a single-label classification task using the power set of labels seen in the training data results in the following training data.

Transforming the problem into a single-label classification task using the power set of labels seen in the training data results in the following training data. Previously, the first instance was classified as Local and US. Now it has a single label, Local  US.

The multi-label classification problem that had five classes is now a multi-class classification problem with seven classes. While the power set problem transformation is intuitive, increasing the number of classes is frequently impractical; this transformation can produce many new labels that correspond to only a few training instances. Furthermore, the classifier can only predict combinations of labels that were seen in the training data.

A second problem transformation is to train one binary classifier for each of the labels in the training set. Each classifier predicts whether or not the instance belongs to one label. Our example would require five binary classifiers; the first classifier would predict whether or not an instance should be classified as Local, the second classifier would predict whether or not an instance should be classified as US, and so on. The final prediction is the union of the predictions from all of the binary classifiers.

This problem transformation ensures that the single-label problems will have the same number of training examples as the multilabel problem, but ignores relationships between the labels.

Multi-label classification performance metrics

Multi-label classification problems must be assessed using different performance measures than single-label classification problems. Two of the most common performance metrics are Hamming loss and Jaccard similarity. Hamming loss is the average fraction of incorrect labels. Note that Hamming loss is a loss function, and that the perfect score is zero. Jaccard similarity, or the Jaccard index, is the size of the intersection of the predicted labels and the true labels divided by the size of the union of the predicted and true labels. It ranges from zero to one, and one is the perfect score. Jaccard similarity is calculated by the following equation:

$$ J(Predicted, True) = \frac{|Predicted \bigcup True |}{|Predicted \bigcap True|} $$

In [2]:
import numpy as np
from sklearn.metrics import hamming_loss, jaccard_similarity_score

hamming_loss


In [3]:
print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]])))
print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]])))
print(hamming_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]])))


0.0
0.25
0.5

jaccard_similarity_score


In [ ]:
print(jaccard_similarity_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[0.0, 1.0], [1.0, 1.0]])))
print(jaccard_similarity_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [1.0, 1.0]])))
print(jaccard_similarity_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.array([[1.0, 1.0], [0.0, 1.0]])))

In [ ]:


In [ ]: