Hello! Back at it with another little project. Up till now, I've gotten quite familiar with most of the basic linear and non-linear classification methods...
I've been using a lot of very highly non-linear methods. I've actually thought about finding a use case where I could build more interpretable models (a la linear regression / single decision tree) but, in a turn of events, I'm actually going to go in the opposite direction and build an even more complex non-linear model in this series of posts haha...
I tried man... I really tried... I told myself at one point I would stay away from Neural Networks because I know how big of a rabbit hole they are. You got NN's, deep NNs, convolutional NNs, recurrent NNs... and that's just from someone who's been watching Youtube videos.
There are a few reasons why I decided to proceed with a deep learning project:
I'm sure I haven't convinced you that I'm ready to take on a Neural Network (I haven't even convinced myself), but let's get this show on the road.
Good ol' face detection...
I'm going to do some very simple face detection in this project. I'm going to try to build a convolutional NN that can tell the difference between me and my girlfriend, Larissa. Only 2 classes to identify, not too many photos each (I'll aim for 100 - 200), hopefully pretty straightforward and simple... at least simple enough for me to kind of learn what's going on.
I want to start super simple because just thinking about face detection hurts... my face.
There are so many factors...
Because of this, I tried to make this as easy for myself as possible. I tried to eliminate any of the factors above that could have contributed in a more complex task for the model (perhaps some of these can be tested later on, though!). I'll go with the following controlled variables:
This should provide enough of an introduction before I need to get into convolutional NNs, so let's stop here and continue in the next post!
As always, let's explore and set some rough objectives for what we want to achieve throughout this project across the technology and tools, mathematics and statistics, machine learning, and domain knowledge:
Oh man... this one's gonna be a doozy. First of all, we'll start with TFlearn and Tensorflow. I've watched a few of sentdex's tutorials on TFlearn and Tensorflow, and he largely bases his tutorials on the official TFlearn and Tensorflow tutorials. I'll likely use TFlearn right off the bat because an easier abstraction would help me (I think) understand the idea of a CNN a bit more rather than worrying about debugging and standardizing my code. The second topic I'll likely explore is leveraging AWS resources for higher compute power whether through CPU or GPU. General reading on NNs often lead to the discussion of having enough compute power to train the model within a reasonable amount of time. sentdex also has tutorials on how to integrate Tensorflow with his system's GPU resources, however Tensorflow has stopped supporting GPU for OSX so if the need for compute gets too crazy I'll definitely have to leverage AWS. No fret, that's something I wanted to learn anyways!
You know... This is the first time that I'm not so sure I'll learn anything in this realm, and the first time I'm not learning something completely new in all realms... I'm going to dive into Convolutional NNs, but I don't think there's much math or stats involved here. Parts of the CNN are just a simple DNN, but that's math we've already covered. It could be that I just don't know exactly what I'm getting myself into right now and that I might actually dive into deep math at some point, but for now, I can't quite think of any math / stats objectives that would help me here.
So I've already reviewed a normal NN / DNN when I was predicting all-NBA players, but for image recognition, I've been reading about Convolutional NNs (CNN)... CNNs integrate a method of averaging to simplify the inputs to our NN and reduce variance in our model. This whole averaging mechanism is performed through steps called convlution, max pooling, and normalization, which we will explore in a bit of detail. The CNN itself is not extremely difficult to understand, but it's more of the tools that I will have to learn in this one that will probably take the most time.
Image recognition... I've watched so many videos on image recognition and NNs that sometimes I'm starting to lose track of what the hell we are actually trying to do here. After going through a few examples of using a CNN to perform image recognition and walking through every step individually, we start to see how easy it is to fire one of these up. For me way back when, I think the mental block was that I didn't realize how NNs worked and that it was the pixels of an image that made up the inputs to the NN. Without understanding that, it's difficult to understand how a model would recognize all the edges and contours of a picture to decipher what the image was. How does a model know a leopard has spots? Or that a giraffe has a long neck? Or that a panda is black and white? After understanding that a NN is doing nothing but trying to map a linear combination of all its inputs to correlate to a specific class, it's becoming much clearer that image recognition methodology is extremely broad and can be applied to so many different things. The same model can be applied to recognizing animals, or governing self-driving cars, or telling the difference between me and my girlfriend! It's pretty amazing and I'm excited to scratch the surface here.