Intro

1) CNN or ConvNets

2) If you feed them faces images they learn edges,dots bright spots, dark spots

3) Multi-Layer NN : Second layer eyes, noses and in third layer identify faces

4) CNN can learn video games by learning patterns

5) CNN can be used to learn from videos

Toy CNN

Lets take a two dimensional array of pixels like checker board where each square is light or dark is it a picture or X or O

Tricker cases: translation, scaling, rotation, weight

CNN: matches parts of images, rather than the whole thing

Example of the features are: 3*3 images ( parts of images) and then matched which is called Filtering

1. Filtering

The math behind this match is:

  • Line up the feature and the image patch
  • Multiply each image pixel by the corresponding feature pixel
  • add them up
  • divide by the total number of pixels in the feature
  • this becomes a map of where the feature occurs

2. Convolution

  • Trying every possible match and we can dry different filters
  • This act of creating a stack of filtered images is called a convolution layer
  • In covolution, we get a stack of filtered images

3. Pooling : Shrinking the image stack

  • Pick a window size(usually 2 or 3)
  • Pick a stride (usually 2)
  • Walk your window across your filtered images
  • From each window, take the maximum value (maxpooling)
  • see that pooling is indepenedent of the position of the maximum value
  • we do maxpooling for each of our stacked images and get smaller size of filtered images

4. Normalization

  • Keep the math from breaking by tweaking each of the values just a bit without blowing up
  • change every negative to zero using ReLUs
  • now we have smaller size of stack of images with no negative values

5. Can do Deep Stacking

Input Image => Convolution Layer -> ReLU -> Pooling (multiple)

6. Fully connected Layer

  • Every value gets a vote
  • we take all the cells of shrinked stack of images and we put them into a single list
  • each of them connects to one of our answers
  • when we feed in a picture of X, there wil be certain values that tend to be high
  • when we feed in a picture of O, there are certain values that ten to predict O
  • Based on weights that each value gets to vote with, we get a nice average vote at the end
  • So, in fully connected layer, a list of feature values becomes a list of votes
  • we can stack as many fully connected layers (hidden)

7. Learning

Where do all the magic numbers come from?

a) Features in convolutional layers

b) Voting weights in fully connected layers

A: Back Propogation ( an error in the final answer is used to determine how much the network adjusts and changes)

Gradient Descent: For each feature pixel and voting weight they adjust it up and down and see how error changes. The amount they adjusted is is guessed by how big the error is like sliding a ball to right and left.

8. Hyper-Parameters (in our hands)

  • Convolution:

    • Number of Features
    • Size of features
  • Pooling:

    • Window Size
    • Window stride
  • Fully Connected

    • Nunber of neurons

9. Architecture

a) How many of each type of layer?

b) In what order?

c) can we design a new type of layer?

9. Extras

Not just 2D or 3D images but we can also apply to structured data

- Images
- Sound (timesteps close to each other are closely related)
- Text - Position in sentence is column, and row is word in dictionary (take a filter from top to bottom and slide left to right)

10. Limitations

  • Convnets only capture local "spatial" patterns in data. If data cant be look like an image, then these are less useful. Example: a sturcutured customer data
  • Rule of thumb: If your data is just as useful after swapping any of your columns with each other then , you cant use Convolutional Neural Networks

ConvNets are graat at finding patterns and using classify images