Max-Margin Learning for Embeddings

  • Goal: embed discrete objects into metric space

    • Code
    • Words
    • Formulae
    • Symbols
    • DNA sequences
    • Logical expressions
One-hot encoding:
  • Create a dictionary of words
  • Represent each sentence as a vector in discrete space
  • A sentence is a collection of on-hot encoded vectors

    Problem: one-hot encoding doesn't preserve the distance between samples

Applications

  • Transfer learning

  • Multi-task learning

  • Auxiliary tasks for feature learning

  • Cross-domain embeddings

  • Retrieval (question answering, ranking)

  • One-shot learning

    • You have a problem with very few data points
    • Training a deep-learning is not feasible

Recall: metric-learning

  • Euclidean distance: $\displaystyle d(x,y) = ||x-y||_2$

  • Mahalanabos distance: $\displaystyle d(x,y) = \sqrt{(x-y)^TS^{-1}(x-y)}$

  • Mahalanobis distanc emetric learning: $\displaystyle d(x,y)= d_A(x,y) = ||x-y||_A = \sqrt{(x-y)^TA^{-1}(x-y)}$

    $$\min_A \sum_{(x_i.x_j)\in \text{similar}} ||x_i - x_j||_A^2$$

    • Constraints: $$\displaystyle \sum_{(x_i,x_j)\in \text{dissimilar}} ||x_i-x_j||_A^2 \ge 1 \\ A \ge 0$$

      Note that $A$ has to be positive semi-definite ($A\ge0$) because otherwise, there might be some negative distances (invalid)


In [2]:
Image(filename="figs/metricleanring-embedding.png")


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-d9b8af17c81f> in <module>()
----> 1 import Image
      2 Image(filename="figs/metricleanring-embedding.png")

ImportError: No module named 'Image'

Large Margin Nearest Neighbors

$$\min_{A \ge 0} \sum_{(i,j) \in \text{similar}} d_A(x_i,x_j) ~+~ \lambda \sum_{(i,j,k)\in \text{dissimilar}} \left[1 + d_A(x_i,x_j) - d_A(x_i,x_k)\right]_+$$

Siamese Network

We use two networks, both have the same exact weights. (indeed, you have two instances of the same network. The word comes from Siamese twins)

$$G_W$$

  • What margin value $m$ to use?
    • $m$ is usually set to 1, and usually it is a safe choice. $m$ determines the order of magnitue of the output, so sometimes you may need to change that to other vlaues for stability, like a high vlaue $100$

Application: Face Recognition

Two types of tasks:

  • Classification: class label, who this person is?

  • Verification: we have two images, and we want to verify whther they are similar or not.

    verification is a much easier problem.

    Verification an be done using Siamese network.

Face verification

Face Classiifcation and Verification

Three different losses:

  • Classification loss: cross-entropy loss

  • Verification loss: (contratsive):

  • Verification loss (cosie)

Triplet Network

Instead of sending a pair of input samples, we send a triplet $(x^-, x, x^+)$

  • $x^-$ is dissimilar to $x$

  • $x^+$ is similar to $x$

Loss function
  • Hinge loss: $$L(x,x^+, x^-) = \left[\|Net(x) - Net(x^+)\|_2^2 ~+~ \|Net(x) - Net(x^-)\|_2^2 ~+~ \alpha \right]_+$$
  • One issue is that there are som many negatve samples (dissimilar samples are much more than similar samples). So to fix this, you can start with a set of dissimilar samples, and repeately after a fixed number of iterations, re-sample and get a new dissimilar set by searching among the closest dissimilar points.

Applicaiton: Cross-Domain Image Matching


In [ ]: