Find Me

Michael duPont - CodeCamp 2017


Find Faces

The first thing we need to do is pick out faces from a larger image. Because the model for this is not user or case specific, we can use an existing model, load it with OpenCV, and tune the hyperparameters instead of building one from scratch, which we will have to do later.


In [ ]:
import cv2
import numpy as np

CASCADE = cv2.CascadeClassifier('findme/haar_cc_front_face.xml')

def find_faces(img: np.ndarray, sf=1.16, mn=5) -> np.array([[int]]):
    """Returns a list of bounding boxes for every face found in an image"""
    return CASCADE.detectMultiScale(
        cv2.cvtColor(img, cv2.COLOR_RGB2GRAY),
        scaleFactor=sf,
        minNeighbors=mn,
        minSize=(45, 45),
        flags=cv2.CASCADE_SCALE_IMAGE
    )

That's really all we need. Now let's test it by drawing rectangles around a few images of groups. Here's one example:


In [ ]:
import matplotlib.pyplot as plt
from matplotlib.image import imread, imsave
%matplotlib inline

plt.imshow(imread('test_imgs/initial/group0.jpg'))

In [ ]:
from glob import glob

def draw_boxes(bboxes: [[int]], img: 'np.array', line_width: int=2) -> 'np.array':
    """Returns an image array with the bounding boxes drawn around potential faces"""
    for x, y, w, h in bboxes:
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), line_width)
    return img

#Find faces for each test image
for fname in glob('test_imgs/initial/group*.jpg'):
    img = imread(fname)
    bboxes = find_faces(img)
    print(bboxes)
    imsave(fname.replace('initial', 'find_faces'), draw_boxes(bboxes, img))

plt.imshow(imread('test_imgs/find_faces/group0.jpg'))

After tuning the hyperparameters, we're getting good face identification over our test images.

Build Dataset

Base Corpus

Now let's use this to build a base corpus of "these faces are not mine" so we can augment it later with the face we want to target.


In [ ]:
#Creates cropped faces for imgs matching 'test_imgs/group*.jpg'

def crop(img: np.ndarray, x: int, y: int, width: int, height: int) -> np.ndarray:
    """Returns an image cropped to a given bounding box of top-left coords, width, and height"""
    return img[y:y+height, x:x+width]

def pull_faces(glob_in: str, path_out: str) -> int:
    """Pulls faces out of images found in glob_in and saves them as path_out
    Returns the total number of faces found
    """
    i = 0
    for fname in glob(glob_in):
        print(fname)
        img = imread(fname)
        bboxes = find_faces(img)
        for bbox in bboxes:
            cropped = crop(img, *bbox)
            imsave(path_out.format(i), cropped)
            i += 1
    return i

found = pull_faces('test_imgs/initial/group*.jpg', 'test_imgs/corpus/face{}.jpg')

print('Total number of base corpus faces found:', found)
plt.imshow(imread('test_imgs/corpus/face0.jpg'))

Now that we have some faces to work with, let's save them to a pickle file for use later on.


In [ ]:
from pickle import dump

#Creates base_corpus.pkl from face imgs in test_imgs/corpus
imgs = [imread(fname) for fname in glob('test_imgs/corpus/face*.jpg')]
dump(imgs, open('findme/base_corpus.pkl', 'wb'))

Target Corpus

Now we need to add our target data. Since this is going to power a personal project, I'm going to train it to recognize my face. Other than adding some new images, we can reuse the code from before but just supplying a different glob string.


In [ ]:
found = pull_faces('test_imgs/initial/me*.jpg', 'test_imgs/corpus/me{}.jpg')

print('Total number of target faces found:', found)
plt.imshow(imread('test_imgs/corpus/me0.jpg'))

That was easy enough. In order to have a large enough corpus of target faces, I included pictures of myself with other people and deleted their faces after the code block ran. It ended up having eleven target faces.

Model Training Data

Now that we have our faces, we need to create the features and labels that will be used to train our facial recognition model. We've already classified our data based on the face's filename; all we need to do is assign a 1 or 0 to each group for our labels. We'll also need to scale each image to a standard size. Thankfully the output for each bounding box is a square, so we don't have to worry about introducing distortions.


In [ ]:
#Load the two sets of images
from pickle import load

notme = load(open('findme/base_corpus.pkl', 'rb'))
me = [imread(fname) for fname in glob('test_imgs/corpus/me*.jpg')]

#Create features and labels
features = notme + me
labels = [0] * len(notme) + [1] * len(me)

#Preprocess images for the model
def preprocess(img: np.ndarray) -> np.ndarray:
    """Resizes a given image and remove alpha channel"""
    img = cv2.resize(img, (45, 45), interpolation=cv2.INTER_AREA)[:,:,:3]
    return img

features = [preprocess(face) for face in features]

Simple enough. Let's do a quick check before shuffling. The first image should be part of the base corpus:


In [ ]:
print('Is the target:', labels[0] == 1)
plt.imshow(features[0], cmap='gray')

And the last image should be of the target:


In [ ]:
print('Is the target:', labels[-1] == 1)
plt.imshow(features[-1], cmap='gray')

Looks good. Let's create a quick data and file checkpoint. This means we'll be able to load the file in from this point on without having to run most of the above code.


In [ ]:
#Convert into numpy arrays
features = np.array(features)
labels = np.array(labels)

dump(features, open('test_imgs/features.pkl', 'wb'))
dump(labels, open('test_imgs/labels.pkl', 'wb'))

DATA/FILE CHECKPOINT

The notebook can be run from scratch from this point onward.


In [ ]:
# DATA/FILE CHECKPOINT
from pickle import load
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread, imsave
%matplotlib inline
from findme.imageutil import crop, draw_boxes, preprocess
from findme.models import find_faces

features = load(open('findme/features.pkl', 'rb'))
labels = load(open('findme/labels.pkl', 'rb'))

features = features[-24:]
labels = labels[-24:]

That's it for our data. You'll notice that we only loaded a subset of our dataset. This ensures that the number of target and non-target images matches, which leads to a better model even though it has less data overall. We'll split our data in the next section.

Am I in This?

We've already created all of our data. Now for the model we're going to train. First, we need to convert our labels to one-hot encoding for use in the model. This means our output layer will have two nodes: True and False.


In [ ]:
from sklearn.preprocessing import OneHotEncoder

enc = OneHotEncoder()
labels = enc.fit_transform(labels.reshape(-1, 1)).toarray()
print('Not target label:', labels[0])
print('Is target label:', labels[-1])

Now we need to define our model architecture one layer at a time. We'll create three convolutional layers, two fully-connected layers, and the output layer.


In [ ]:
from keras.layers import Activation, Convolution2D, Dense, Dropout, Flatten, MaxPooling2D
from keras.metrics import binary_accuracy
from keras.models import Sequential

SHAPE = features[0].shape
NB_FILTER = 16

def make_model() -> Sequential:
    """Create a Sequential Keras model to boolean classify faces"""
    model = Sequential()
    #First Convolution
    model.add(Convolution2D(NB_FILTER, (3, 3), input_shape=SHAPE))
    model.add(Activation('relu'))
    model.add(MaxPooling2D())
    model.add(Dropout(0.1))
    # Second Convolution
    model.add(Convolution2D(NB_FILTER*2, (2, 2)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D())
    model.add(Dropout(0.2))
    # Third Convolution
    model.add(Convolution2D(NB_FILTER*4, (2, 2)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D())
    model.add(Dropout(0.3))
    # Flatten for Fully Connected
    model.add(Flatten())
    # First Fully Connected
    model.add(Dense(1024))
    model.add(Activation('relu'))
    model.add(Dropout(0.4))
    # Second Fully Connected
    model.add(Dense(1024))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    # Output
    model.add(Dense(2))
    model.compile(loss = 'mean_squared_error', optimizer = 'rmsprop', metrics=[binary_accuracy])
    return model

print(make_model().summary())

Now we need to train the model. Even though we have a large model in terms of its parameters, we can still let the model train for many epochs because our feature set is so small. On a MacBook Air, it takes around 30 seconds to train the model with 500 epochs. To save space, I've disabled the full training printout that Keras provides, but you can watch the accuracy progress yourself by changing verbose from 0 to 1.

We also need to shuffle our data because feeding all of the non-target and target faces into the model in order will lead to a biased model. Scikit-Learn has a convenient function to do this for us. Rather than just calling random, this function preserves the relationship between the feature and label indexes.


In [ ]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.utils import shuffle

model = KerasClassifier(build_fn=make_model, epochs=500, batch_size=len(labels), verbose=0)
X, Y = shuffle(features, labels, random_state=42)
model.fit(X, Y)

Let's quickly see how well it trained to the given data. Because the dataset is so small, we didn't want to keep any for a test or validation set. We'll test it on a new image later.


In [ ]:
preds = model.predict(features)
print('Non-target faces predicted correctly:', np.all(preds[:12] == 0))
print('Non-target faces predicted correctly:', preds[-12:] == 1))

That's it. While Keras has its own mechanisms for training and validating models, we're using a wrapper around our Keras model so it conforms to the Scikit-Learn model API. We can use fit and predict when working with the model in our code, and it let's us train and use our model with the other helper modules sk-learn provides. For example, we could have evaluated the model using StratifiedKFold and cross_val_score which would look like this:

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

model = KerasClassifier(build_fn=make_model, epochs=5, batch_size=len(labels), verbose=0)

# evaluate using 10-fold cross validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
result = cross_val_score(model, features, labels, cv=kfold)
print(result.mean())

This method allows us to determine how effective our model is but does not return a trained model for us to use.

Putting It Together

Lastly, let's create a single function that takes in an image and returns if the target was found and where.

First we'll load in our test image. Keep in mind that the model we just trained has never seen this image before and it contains multiple people (and a manatee statue).


In [ ]:
test_img = imread('test_imgs/evaluate/me1.jpg')
plt.imshow(test_img)

Now for the function itself. Because we've already made function around the core parts of our data pipeline, this function is going to be incredibly short yet powerful.


In [ ]:
def target_in_img(img: np.ndarray) -> (bool, np.array([int])):
    """Returns whether the target is in a given image and where"""
    for bbox in find_faces(img):
        face = preprocess(crop(img, *bbox))
        if model.predict(np.array([face])) == 1:
            return True, bbox
    return False, None

Yeah. That's it. Let's break down the steps:

  • find_faces returns a list of bounding boxes containing faces
  • We prepare each face by cropping the image to the bounding box, scaling to 45x45, and removing the alpha channel
  • The model predicts whether the face is or is not the target
  • If the target is found (pred == 1), return True and the current bounding box
  • If there aren't any faces or none of the faces belongs to the target, return False and None

Now let's test it. If it works properly, we should see a bounding bx appear around the target's face.


In [ ]:
found, bbox = target_in_img(test_img)

print('Target face found in test image:', found)
if found:
    plt.imshow(draw_boxes([bbox], test_img, line_width=20))

We're finally done.