In this tutorial article, we want to learn how to combine landmark points together in order to create new faces. This process is used in developing Active Shape Model (ASM), which is a technique for detecting landmarks (key facial points) on a face image. This is an example of landmark points on a face image:
In ASM, first you start training a model using a set of pre-annotated faces, where a fixed number of landmarks are given, and the goal is to automatically identify landmarks on a new face image.
The goal of this tutorial is to walk you through the first step of ASM training. We have a dataset of images and their landmark points. We will 1) align all the landmarks, 2) compute the mean face 3) create a model for generating new faces using eigen faces.
For this tutorial, we use MUCT database: www.milbo.org/muct which contains 3755 faces with manually landmarked 76 points for each face.
In [1]:
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
import pandas
After downloading the dataset, unzip the landmarks, and use a file named muct76.csv
.
In [2]:
df = pandas.read_csv('muct76.csv', header=0, usecols=np.arange(2,154), dtype=float)
df.head()
Out[2]:
The df contains both $x$ and $y$ coordinates of landmark points. There are 3755 images in the dataset, but the reverse (mirrored) images are also considered here, so total we have 7510 rows. Next, let us split the $x$, $y$ coordinates into separate numpy arrays for convinience.
In [3]:
X = df.iloc[:, ::2].values
Y = df.iloc[:, 1::2].values
print(X.shape, Y.shape)
Now, let us visualize it to get more familar with the data. We can plot the landmarks for one image using matplotlib. Landmarks identify eyebrows, eyes, face boundary, nose, upper and lower lips. These facial points are exteremly useful for a lot of computer vision applications.
In [4]:
plt.plot(X[0,:], Y[0,:])
plt.show()
In order to compute the mean-face, we need to align all faces together. Typically for alignment we need to do three forms of transformation: linear translation, scaling and rotation. Assuming that all the images have straight pose, we can skip the rotation. The general formula for transformation is the following
$$\mathrm T\displaystyle \left(\begin{array}{c}x\\y\end{array}\right) = s \left[ \begin{array}{cc} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{array}\right] \left( \begin{array}{c}x \\ y \end{array}\right) + \left(\begin{array}{c} x_{translate} \\ y_{translate} \end{array} \right) $$Where, $s$ is the scaling factor, $\theta$ is the rotation angle, and $x_{translate}$ and $y_{translate}$ are the translations in $x, y$ coordinates.
We want to do similarity transfomration which includes linear translation, rotation and scaling to align a shape $f$ to $f'$. A given point $(x,y)$ will be mapped to $T(x,y) = (x', y')$ as follows
$$T\left(\begin{array}{c} x \\ y\end{array}\right) = \left(\begin{array}{cc} a & -b \\ b &a \end{array}\right) \left(\begin{array}{c} x \\ y\end{array}\right) + \left(\begin{array}{c} t_x \\ t_y\end{array}\right)$$. There are 4 parameters: $a, b, t_x, t_y$, which can be found as follows:
(Assuming that $f$ is already centered at origin)
$$t_x = \frac{1}{n} \sum x'_i\ \ \ \ \ \ \ \ \ t_y = \frac{1}{n} \sum y'_i$$$$a = \frac{S_{xx'} + S_{yy'}}{S_{xx} + S_{yy}} = \frac{f.f'}{|f|^2}$$$$b = \frac{S_{xy'} - S_{yx'}}{S_{xx} + S_{yy}} = \frac{S_{xy'} - S_{yx'}}{|f|^2}$$where these summation notations are used for brevity:
$$S_{xx} = \frac{1}{n} \sum x_i^2 \ \ \ \ \ \ \ \ \ \ S_{yy} = \frac{1}{n} \sum y_i^2$$$$S_{xx'} = \frac{1}{n} \sum x_i x'_i \ \ \ \ \ \ \ \ \ \ S_{yy'} = \frac{1}{n} \sum y_i y'_i$$$$S_{xy'} = \frac{1}{n} \sum x_i y'_i \ \ \ \ \ \ \ \ \ \ S_{yx'} = \frac{1}{n} \sum y_i x'_i$$For mean centering, first, we need to find the center of each face, i.e. the mean of all $x$ and $y$ coordinates. Then, we can apply a linear translation so that the center of each face is located at coordinate $(0, 0)$. This way, we can overlap all the images.
In [5]:
xmeans = np.mean(X, axis=1)
ymeans = np.mean(Y, axis=1)
xmeans.shape
Out[5]:
In [6]:
## mean-centering each image
X = (X.T - xmeans).T
Y = (Y.T - ymeans).T
In [11]:
def simTransform(pref, pcmp, showerror = False):
err_before = np.mean(np.sum(np.square(pref - pcmp), axis=1))
ref_mean = np.mean(pref, axis=0)
prefcentered = np.asmatrix(pref) - np.asmatrix(ref_mean)
cmp_mean = np.mean(pcmp, axis=0)
pcmpcentered = np.asmatrix(pcmp) - np.asmatrix(cmp_mean)
Sxx = np.sum(np.square(pcmpcentered[:,0]))
Syy = np.sum(np.square(pcmpcentered[:,1]))
Sxxr = prefcentered[:,0].T * pcmpcentered[:,0] #(ref_x, x)
Syyr = prefcentered[:,1].T * pcmpcentered[:,1] #(ref_y, y)
Sxyr = prefcentered[:,1].T * pcmpcentered[:,0] #(ref_y, x)
Syxr = prefcentered[:,0].T * pcmpcentered[:,1] #(ref_x, y)
a = (Sxxr + Syyr)/(Sxx + Syy) #(Sxxr + Syyr) / (Sxx + Syy)
b = (Sxyr - Syxr) / (Sxx + Syy)
a = np.asscalar(a)
b = np.asscalar(b)
Rot = np.matrix([[a, -b],[b, a]])
translation = -Rot * np.asmatrix(cmp_mean).T + np.asmatrix(ref_mean).T
outx, outy = [], []
res = Rot * np.asmatrix(pcmp).T + translation
if showerror:
err_after = np.mean(np.sum(np.square(pref - res.T), axis=1))
print("Error before: %.4f after: %.4f\n"%(err_before, err_after))
return res.T
## Testing
r = simTransform(np.vstack((X[0,:], Y[0,:])).T, np.vstack((X[1,:], Y[1,:])).T, True)
rx, ry = r[:,0], r[:,1]
plt.figure(figsize=(10,6))
plt.subplot(1, 2, 1)
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(X[1,:], Y[1,:], 'g')
plt.subplot(1, 2, 2)
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(rx, ry, 'g')
plt.show()
In [17]:
def calDiff(ref, cmp):
return np.mean(np.sum(np.square(ref - cmp), axis=1))
In [39]:
def scaleShapes(fx, fy, desiredIPD):
"""Interpupils distance: landmarks 31 and 36"""
interpupils_dist = np.abs(fx[31] - fx[36])
fxscaled = fx * desiredIPD / interpupils_dist
fyscaled = fy * desiredIPD / interpupils_dist
return (fxscaled, fyscaled)
ref_ipd = np.abs(fref[31,0] - fref[36,0])
fref = np.vstack((X[0,:], Y[0,:])).T
fcmp = np.vstack((X[1,:], Y[1,:])).T
print("Original: %.4f"%(calDiff(fref, fcmp)))
rcmp = np.asarray(simTransform(fref, fcmp, True))
print("Transform: %.4f"%(calDiff(fref, rcmp)))
fref_scaled = np.vstack(scaleShapes(fref[:,0], fref[:,1], ref_ipd)).T
fcmp_scaled = np.vstack(scaleShapes(fcmp[:,0], fcmp[:,1], ref_ipd)).T
print("Just Scaling: %.4f"%(calDiff(fref_scaled, fcmp_scaled)))
rcmp_scaled = np.vstack(scaleShapes(rcmp[:,0], rcmp[:,1], ref_ipd)).T
print("Both %.4f"%(calDiff(fref, rcmp_scaled)))
In [48]:
fref = np.vstack((X[0,:], Y[0,:])).T
QX = np.empty(shape = X.shape)
QY = np.empty(shape = Y.shape)
QX[0,:] = X[0,:].copy()
QY[0,:] = Y[0,:].copy()
toterr_before, toterr_after = 0.0, 0.0
for i, (fx, fy) in enumerate(zip(X[1:,:], Y[1:,:])):
cmp = np.vstack((fx, fy)).T
toterr_before += calDiff(fref, cmp)
q = np.asarray(simTransform(fref, cmp))
QX[i,:] = np.asarray(q[:,0])
QY[i,:] = np.asarray(q[:,1])
toterr_after += calDiff(fref, q)
print(toterr_before/X.shape[0], toterr_after/X.shape[0])
Given two sets of points $A$ and $B$, align points $A$ onto $B$.
1. Find the covariance matrix:
$$H = \sum_i (P_A^i - centroid_A).(P_B^i - centroid_B)$$2. Apply singular value decomposition (SVD)
$$\left[U, S, V\right] = SVD(H)$$3. Rotation matrix
$$R = V U^T$$4: Translation
$$T = -R\times centroid_A + centroid_B$$Finally:
$$P'_A = R P_A + T$$
In [49]:
plt.figure(figsize=(10,10))
plt.subplot(2, 2, 1)
plt.plot(X[0,:], Y[0,:], 'bo')
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(X[1,:], Y[1,:], 'g')
plt.plot(X[1,:], Y[1,:], 'g^')
plt.plot(QX[1,:], QY[1,:], 'r')
plt.subplot(2, 2, 2)
plt.plot(X[0,:], Y[0,:], 'bo')
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(X[100,:], Y[100,:], 'g')
plt.plot(X[100,:], Y[100,:], 'g^')
plt.plot(QX[100,:], QY[100,:], 'r')
plt.subplot(2, 2, 3)
plt.plot(X[0,:], Y[0,:], 'bo')
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(X[200,:], Y[200,:], 'g')
plt.plot(X[200,:], Y[200,:], 'g^')
plt.plot(QX[200,:], QY[200,:], 'r')
plt.subplot(2, 2, 4)
plt.plot(X[0,:], Y[0,:], 'bo')
plt.plot(X[0,:], Y[0,:], 'b')
plt.plot(X[300,:], Y[300,:], 'g')
plt.plot(X[300,:], Y[300,:], 'g^')
plt.plot(QX[300,:], QY[300,:], 'r')
plt.show()
In [53]:
meanface_x = np.mean(QX, axis=0)
meanface_y = np.mean(QY, axis=0)
plt.figure(figsize=(8,8))
plt.plot(meanface_x, meanface_y)
plt.show()
We want to develop a model that by chaning its parameters, we can generate new faces. But, we can arbitralily apply some distortions to faces.
In order to generate new faces, we can compute some vectors that have the highest variations among our current faces. Then, we can apply change the mean face along those faces which gives us a new face. This can be done by computing the eigen vevtors of the covariance matrix of these faces, also we call it "Eigen Faces".
In [54]:
D = np.concatenate((QX, QY), axis=1)
D.shape
Out[54]:
In [55]:
cov_mat = np.cov(D.T)
cov_mat.shape
Out[55]:
In [56]:
eig_values, eig_vectors = np.linalg.eig(cov_mat)
print(eig_values.shape, eig_vectors.shape)
In [57]:
plt.plot(eig_values)
Out[57]:
In [58]:
num_eigs = 20
Phi_matrix = eig_vectors[:,:num_eigs]
Phi_matrix.shape
Out[58]:
Finally, everything is ready to generate new faces. Applying a distortion along those eigen faces generates a new face according to the following formula:
$$\hat{r} = \bar{r} + \phi b$$where $\hat{r}$ is the new face, $\bar{r}$ is the mean face, $\phi$ is the matrix of eigen values, and $b$ is the distortion parameter along each eigen vector.
In [59]:
def construct_newface(meanface, Phi_matrix, b):
face = meanface + np.dot(Phi_matrix, b )
return (face[:76], face[76:])
For clarity, we generate distorion vectors of all zero values except one non-zero element. That non-zero element is a multiplication of the corresponding eigen-value times a small factor.
In [61]:
meanface = np.concatenate((meanface_x, meanface_y))
plt.figure(figsize=(10,10))
for i in range(4):
plt.subplot(2, 2, i+1)
b = np.zeros(shape=num_eigs)
for j in (-0.025, -0.02, -0.015, -0.01, 0, 0.01, 0.015, 0.02, 0.025):
b[i] = j*eig_values[i]
xnew, ynew = construct_newface(meanface, Phi_matrix, b=b)
plt.plot(xnew, ynew)
plt.show()
In [62]:
meanface.dump('models/meanshape.pkl')
eig_vectors.dump('models/eigenvectors.pkl')
eig_values.dump('models/eigenvalues.pkl')
In this article, we used a dataset of dace landmarks and developed a model for generating new faces. First, we applied scaling and linear translation to align the faces together. Then, we computed the mean face ($\bar{r}$) and the matrix eigen vectors ($\phi$). Then, using ditortions along the first four eigen vector, we generated new faces.
In the new faces, distortions along the first eigen vector resulted in mostly changing the width of faces, while the second eigen vector changes the face height and the shape of nose.
In [ ]: