Welp, the mystery of a NN and CNN has definitely been demystified for now... at least the surface level of it.
I had to start somewhere! I wanted to take an easily digestible example and break it down as a CNN problem. I'd like to think I did that, or at least the whole process made sense to me with the scope I had in mind. Again, I honestly wasn't trying to rebuild AlexNet or anything... I was merely trying to actually get a NN up and running to do some very very basic task with menial accuracy. Part of this comes from a tools perspective, where just designing a CNN (even an extremely simple one that I did) and getting to know and use the tools, help in actually understanding the subject matter. Imagining a jump shot and actually praciticing a jump shot are two very different things, no matter what you're told haha. This was a great example of that for me. Having to go through and debug even getting the arrays into the right shape or understanding why tweaking a certain parmameter yielded long train times or inaccurate results force you to think past the code, with the code as a great guide.
At the end of the day, we were able to build a model that did a decent job at telling the difference between me and my girlfriend. Is this useful? Probably not. Is it even usable? Probably not - I'd likely have to do way more image pre-processing to remove the background, detect the face, balance out the brightness and contrast among so many other possibilities to even make this useful. But hey, if you ever need a model that can tell the difference between me without hair and my girlfriend with her hair down standing in the same place in my living room with a white background :).
Let's recap what we've learned.
Oh man, what to say here... Let's start with TFlearn:
TFlearn and TensorFlow. What an easy to understand abstraction to NNs! TFlearn provided such an abstracted window into NNs and CNNs... there are definitely pros and cons, the biggest pro being how easy it is to load and setup, and con being exactly the same thing - how easy it is to load and setup. CNNs are not a walk in the park, as post #2 demonstrates. Even with post #2, which took me days of watching videos to wrap my head around and write, just scratches the surface of how CNNs should be build. The concept is there, but the practical world experience is lacking. Anytime you make any tool easier, there is a chance you are making assumptions, and the more assumptions you make, the less control you're in when building your model. In this case, I'm not an expert in NNs as I never studied them at any point of my acamdemic or professional career, this is just based off of readings and video watching, so I couldn't quite dive directly into the deep end. I needed something that could abstract it for me as a learning experience. Because my objective of the project was to learn the very basics, I was okay with some of the assumptions that TFlearn made. Not once did I ever have to define any array sizes between layers other than the input and output layer, TFlearn is essentially abstracting that step and making assumptions about the size of one layer's output and the size of the next layer's input. Could I have learned more by building these steps myself in a lower abstracted language? ABSOLUTELY. But maybe I will have to save that for another project when I've got a bit more time to put in the effort!
Automation, automation, automation. It basically made this entire project possible! Remember those last few models in the last post that I trained for 100 epochs? Those took about 3 minutes to train... At 23 times slower, that would've taken 69 minutes to train. From 3 minutes to over an hour. Not only did it save me time, but it saved me the focus and the continuity in workflow to actually maintain my mindset throughout. I even trained a 500 epoch model for fun that I didn't post, so that would've taken over 6 hours to train. I'm sure I would've pushed through, but can you imagine how demotivating it would've been to wait overnight just to find out that you set one parameter wrong or you had a bug in the code? My goodnes... Just for my sanity, automation absolutely made this project possible. How much is my final AWS bill after all this?
Not even a fiver... Now that price is priceless. Sorry, I'll let myself out.
OpenCV also opened my eyes to a world of image processing within python. I didn't do too much with OpenCV this time, but watching the entire series of sentdex tutorials, it's really surprising how much stuff you can do with an open source platform these days...
Hmm math and stats... I'm not so sure I learned new things in this realm. Of what I can brainstorm, I re-inforced my knowledge of an NN, but it was more so in the convolution layers... the convolution, max pooling, RELU... not so much in the fully connected layer. I guess I saw how the partial derivatives reached back through the convolution layers to actually generate the filters, but that's not too much more than I didn't already know.
I suppose I was forced to think more about the central limit theorem and the bias variance trade off in the last post. From a statistical perspective, I saw how unstable the results of my CNN could be, and pondered and theorized about ways to make the experiment more robust statistically. Going from a simple decision tree to a random forest is one way to use the central limit theorem to your advantage to reduce variance, and ensembling NNs to find the variance in the confidence probabilities would be no different. That is pure statistical theory, nothing to do with machine learning! But since we've mentioned machine learning...
Well, we definitely did some tinkering in this domain haha. I learned what a CNN is. Plain and simple. The convolution and max pooling steps were completely foreign to me. Understanding the idea of a convolutional filters is key to CNNs, and was the main focus of my parameter tweaking. In fact, looking back now, I actually realized i completely forgot to play with the max parameter settings because I tunnelvisioned on the convolution steps so bad. Well, another item to add to the next to-do list I suppose. Anyways, understanding the whole filtering process and how it's using smaller filters to represent different areas of the image completely powers the CNN. Whereas a normal deep network takes the inputs of an image directly, a CNN takes the dot product representation of the image and the filters. These convolution layers of the CNN add a level of additional complexity on top of the deep network, but the results speak for themselves!
Well, there isn't much "domain knowledge" to be had here per se, but again, it was my first time working with an image as an input. Not only did I scratch the surface of image processing (OpenCV section above), but I actually modelled on images! Something about seeing pixels or abstractions of pixels as inputs just blows my mind. Makes you admire the world of digital haha. I think the domain knowledge to be had here is purely in image processing, though. All the issues I had throughout this project likely are issues with image processing... background detection and removal, face detection and isolation, brightness and contrast balancing... all of these are factors that affect the absolute values of the inputs. Perhaps the model parameter tweaking would contribute to the outcomes, but not until the image processing has taken place first. It's useless trying to tweak your filter parameters if one picture of Chi has a white background and another has a noisy background... the backgrounds will highly affect the filters that are created! It's really not a catch 22, but the image processing must come FIRST! Unfortunately I didn't dive too deep into the image processing side of it because that wasn't my focus, but it's absolutely useful to keep this experience in mind next time I try to do something like this.
That's all for now! What to do next... I'll have to think about it a bit.