Image Classification

内容列表

图像分类、数据驱动方法和流程
Nearest Neighbor分类器
- k-Nearest Neighbor
验证集、交叉验证集和超参数调参
Nearest Neighbor的优劣
拓展阅读

图像分类

Motivation

Assigning an input image one label from a fixed set of categories. 即已知一固定的分类标签集合，然后对于输入的图像，从分类标签集合中找出一个分类标签，并把这个分类标签分配给该输入图像。

Challenges

As we present (an inexhaustive) list of challenges below, keep in mind the raw representation of images as a 3-D array of brightness values:

Viewpoint Variation 视角变换
Scale Variation 大小变化
Deformation 形变
Occlusion 遮挡
Illumination Conditions 光照条件
Background Clutter 背景干扰
Intra-class Variation 类内差异

图像分类流程

Input输入: 输入是包含N个图像的集合(训练集training set)，每个图像的标签是k种分类标签中的一种。
Learning学习：使用训练集学习每个类到底长什么样，即训练分类器或学习一个模型。
Evaluation评价：让分类器来预测它未曾见过的图像的分类标签，并以此来评价分类器的质量。

Nearest Neighbor Classifier

The nearest neighbor classifier will take a test image, compare it to every single one of the training images, and predict the label of the closest training image.

L1 distance: $$d_{1}(I_{1},I_{2})=\sum_{p}{|I_1^p-I_2^p|}$$

L2 distance: $$d_1(I_1-I_2)=\sqrt{\sum_p |I_1^p-I_2^p|}$$

Sample Code:

import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in xrange(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      # using the L2 distance
      # distances = np.sqrt(np.sum(np.square(self.Xtr - X[i,:]), axis = 1))
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072

nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print('accuracy: %f' % ( np.mean(Yte_predict == Yte)))

L1和L2比较: 在面对两个向量之间的差异时，L2比L1更加不能容忍这些差异。即相对于1个巨大的差异，L2距离更倾向于接受多个中等程度的差异。

k-Nearest Neighbor Classifier

Instead of finding the single closest image in the training set, we will find the top k closest images, and have them vote on the label of the test image. In particular, when k = 1, we recover the Nearest Neighbor classifier. Intuitively, higher values of k have a smoothing effect that makes the classifier more resistant to outliers

Validation sets for Hyperparameter tuning

Do NOT use the test set for the purpose of tweaking hyperparameters.If you only use the test set once at end, it remains a good proxy for measuring the generalization of your classifier.

Split your training set into training set（训练集） and a validation set（验证集）. Use validation set to tune all hyperparameters. At the end run a single time on the test set（测试集） and report performance.

Cross-validation（交叉验证）：讲训练集平均分成几份，其中一部分作为验证集，其余部分作为训练集，并循环取其中的一份验证其余训练，最后取所有验证结果的平均值作为算法验证结果

In practice：交叉验证一般会耗费较多的计算资源，所以通常直接把训练集按照50%-90%的比例分成训练集和验证集，但如果hyperparameter很多，而验证集的数量不够，则可以考虑使用交叉验证，一般都是分成3/5或10份。一旦找到最优的超参数，就让算法以该参数在测试集跑且只跑一次，并根据测试结果评价算法。