Max-Margin Learning for Embeddings

Goal: embed discrete objects into metric space
- Code
- Words
- Formulae
- Symbols
- DNA sequences
- Logical expressions

One-hot encoding:

Create a dictionary of words
Represent each sentence as a vector in discrete space
A sentence is a collection of on-hot encoded vectors

Problem: one-hot encoding doesn't preserve the distance between samples

Applications

Transfer learning
Multi-task learning
Auxiliary tasks for feature learning

Cross-domain embeddings
Retrieval (question answering, ranking)
One-shot learning
- You have a problem with very few data points
- Training a deep-learning is not feasible

Recall: metric-learning

Euclidean distance: $\displaystyle d(x,y) = ||x-y||_2$
Mahalanabos distance: $\displaystyle d(x,y) = \sqrt{(x-y)^TS^{-1}(x-y)}$
Mahalanobis distanc emetric learning: $\displaystyle d(x,y)= d_A(x,y) = ||x-y||_A = \sqrt{(x-y)^TA^{-1}(x-y)}$

$$\min_A \sum_{(x_i.x_j)\in \text{similar}} ||x_i - x_j||_A^2$$
- Constraints: $$\displaystyle \sum_{(x_i,x_j)\in \text{dissimilar}} ||x_i-x_j||_A^2 \ge 1 \\ A \ge 0$$
  
  Note that $A$ has to be positive semi-definite ($A\ge0$) because otherwise, there might be some negative distances (invalid)



In [2]:

    
Image(filename="figs/metricleanring-embedding.png")









    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-d9b8af17c81f> in <module>()
----> 1 import Image
      2 Image(filename="figs/metricleanring-embedding.png")

ImportError: No module named 'Image'

Large Margin Nearest Neighbors

$$\min_{A \ge 0} \sum_{(i,j) \in \text{similar}} d_A(x_i,x_j) ~+~ \lambda \sum_{(i,j,k)\in \text{dissimilar}} \left[1 + d_A(x_i,x_j) - d_A(x_i,x_k)\right]_+$$

Siamese Network

We use two networks, both have the same exact weights. (indeed, you have two instances of the same network. The word comes from Siamese twins)

$$G_W$$

What margin value $m$ to use?
- $m$ is usually set to 1, and usually it is a safe choice. $m$ determines the order of magnitue of the output, so sometimes you may need to change that to other vlaues for stability, like a high vlaue $100$

Application: Face Recognition

Two types of tasks:

Classification: class label, who this person is?
Verification: we have two images, and we want to verify whther they are similar or not.

verification is a much easier problem.

Verification an be done using Siamese network.

Face verification

Face Classiifcation and Verification

Three different losses:

Classification loss: cross-entropy loss
Verification loss: (contratsive):
Verification loss (cosie)

Triplet Network

Instead of sending a pair of input samples, we send a triplet $(x^-, x, x^+)$

$x^-$ is dissimilar to $x$
$x^+$ is similar to $x$

Loss function

Hinge loss: $$L(x,x^+, x^-) = \left[\|Net(x) - Net(x^+)\|_2^2 ~+~ \|Net(x) - Net(x^-)\|_2^2 ~+~ \alpha \right]_+$$

One issue is that there are som many negatve samples (dissimilar samples are much more than similar samples). So to fix this, you can start with a set of dissimilar samples, and repeately after a fixed number of iterations, re-sample and get a new dissimilar set by searching among the closest dissimilar points.

Applicaiton: Cross-Domain Image Matching



In [ ]: