Face Detection

Introduction

Hello! Back at it with another little project. Up till now, I've gotten quite familiar with most of the basic linear and non-linear classification methods...

I used basic linear (LDA, QDA, Logistic Regression) and non-linear methods (SVM, Trees, Gradient Boosted Trees, Basic NN) to predict the 2016-2017 All-NBA players
I used gradient boosting to classify easy listening and rock songs
I used gradient boosting and KNN to create a geographical heatmap of my home city's (Edmonton) most expensive neighbourhoods

I've been using a lot of very highly non-linear methods. I've actually thought about finding a use case where I could build more interpretable models (a la linear regression / single decision tree) but, in a turn of events, I'm actually going to go in the opposite direction and build an even more complex non-linear model in this series of posts haha...

I tried man... I really tried... I told myself at one point I would stay away from Neural Networks because I know how big of a rabbit hole they are. You got NN's, deep NNs, convolutional NNs, recurrent NNs... and that's just from someone who's been watching Youtube videos.

There are a few reasons why I decided to proceed with a deep learning project:

I had no idea how easy it was to spin up a NN these days...
I think I've built a decent set of fundamentals for machine learning, my objectives over the last few projects were absolutely to become familiar with different models, but that was only part of the story. The other part is that I wanted to build good habits and practices when I try to solve machine learning problems. This involved understanding the bias-variance trade-off, practicing cross-validation approaches, understanding how to tweak parameters easily and effectively, and simply becoming familiar with a common workflow in building models... through Python, I've been able to at least scratch the surface of all of these skills!
I wanted to integrate my machine learning knowledge so far with infrastructure knowledge. The last couple of times I've used gradient boosting, I've ended up sitting around for up to 2-3 hours at a time training my model. It's generally the cross-validation that kills me. Both NN and Gradient Boosted Tree tools nowadays (e.g. Tensorflow, xgboost respectively) come with multi-threaded GPU capabilities. The problem? The GPU on the Mac that I'm using to write this leaves something to be desired. In the case of Tensorflow, Mac-based GPU processing isn't even supported! This gives me an opportunity to get into AWS and infrastructure automation / design a little bit.

I'm sure I haven't convinced you that I'm ready to take on a Neural Network (I haven't even convinced myself), but let's get this show on the road.

Face Detection of Chi and Larissa

Good ol' face detection...

I'm going to do some very simple face detection in this project. I'm going to try to build a convolutional NN that can tell the difference between me and my girlfriend, Larissa. Only 2 classes to identify, not too many photos each (I'll aim for 100 - 200), hopefully pretty straightforward and simple... at least simple enough for me to kind of learn what's going on.

I want to start super simple because just thinking about face detection hurts... my face.

There are so many factors...

Dynamic characteristics (hair, facial expression, facial hair, makeup)
Clothing and accessories (glasses, top, hat)
Face size on picture
Face orientation on picture
Photographic properties of picture (brightness, contrast, white balance, focus, color tint)
Photo background

Because of this, I tried to make this as easy for myself as possible. I tried to eliminate any of the factors above that could have contributed in a more complex task for the model (perhaps some of these can be tested later on, though!). I'll go with the following controlled variables:

I will train on different facial expressions, but hair and makeup will remain constant (no makeup, and no hair in my case har har har)
No extra clothing or accessories other than the shirt that will be worn at the time of taking the photo
Standardized face size
Standardized face orientation
Standardized photographic properties as all photos will be taken on my iPhone in similar lighting conditions
White / light colored wall as background

This should provide enough of an introduction before I need to get into convolutional NNs, so let's stop here and continue in the next post!

Project Objectives

As always, let's explore and set some rough objectives for what we want to achieve throughout this project across the technology and tools, mathematics and statistics, machine learning, and domain knowledge:

Technology & Tools

Oh man... this one's gonna be a doozy. First of all, we'll start with TFlearn and Tensorflow. I've watched a few of sentdex's tutorials on TFlearn and Tensorflow, and he largely bases his tutorials on the official TFlearn and Tensorflow tutorials. I'll likely use TFlearn right off the bat because an easier abstraction would help me (I think) understand the idea of a CNN a bit more rather than worrying about debugging and standardizing my code. The second topic I'll likely explore is leveraging AWS resources for higher compute power whether through CPU or GPU. General reading on NNs often lead to the discussion of having enough compute power to train the model within a reasonable amount of time. sentdex also has tutorials on how to integrate Tensorflow with his system's GPU resources, however Tensorflow has stopped supporting GPU for OSX so if the need for compute gets too crazy I'll definitely have to leverage AWS. No fret, that's something I wanted to learn anyways!

Mathematics & Statistics

You know... This is the first time that I'm not so sure I'll learn anything in this realm, and the first time I'm not learning something completely new in all realms... I'm going to dive into Convolutional NNs, but I don't think there's much math or stats involved here. Parts of the CNN are just a simple DNN, but that's math we've already covered. It could be that I just don't know exactly what I'm getting myself into right now and that I might actually dive into deep math at some point, but for now, I can't quite think of any math / stats objectives that would help me here.

Machine Learning

So I've already reviewed a normal NN / DNN when I was predicting all-NBA players, but for image recognition, I've been reading about Convolutional NNs (CNN)... CNNs integrate a method of averaging to simplify the inputs to our NN and reduce variance in our model. This whole averaging mechanism is performed through steps called convlution, max pooling, and normalization, which we will explore in a bit of detail. The CNN itself is not extremely difficult to understand, but it's more of the tools that I will have to learn in this one that will probably take the most time.

Image Recognition

Image recognition... I've watched so many videos on image recognition and NNs that sometimes I'm starting to lose track of what the hell we are actually trying to do here. After going through a few examples of using a CNN to perform image recognition and walking through every step individually, we start to see how easy it is to fire one of these up. For me way back when, I think the mental block was that I didn't realize how NNs worked and that it was the pixels of an image that made up the inputs to the NN. Without understanding that, it's difficult to understand how a model would recognize all the edges and contours of a picture to decipher what the image was. How does a model know a leopard has spots? Or that a giraffe has a long neck? Or that a panda is black and white? After understanding that a NN is doing nothing but trying to map a linear combination of all its inputs to correlate to a specific class, it's becoming much clearer that image recognition methodology is extremely broad and can be applied to so many different things. The same model can be applied to recognizing animals, or governing self-driving cars, or telling the difference between me and my girlfriend! It's pretty amazing and I'm excited to scratch the surface here.