| notebook.community

notebook.community

Edit and run

Intro

1) CNN or ConvNets

2) If you feed them faces images they learn edges,dots bright spots, dark spots

3) Multi-Layer NN : Second layer eyes, noses and in third layer identify faces

4) CNN can learn video games by learning patterns

5) CNN can be used to learn from videos

Toy CNN

Lets take a two dimensional array of pixels like checker board where each square is light or dark is it a picture or X or O

Tricker cases: translation, scaling, rotation, weight

CNN: matches parts of images, rather than the whole thing

Example of the features are: 3*3 images ( parts of images) and then matched which is called Filtering

1. Filtering

The math behind this match is:

Line up the feature and the image patch
Multiply each image pixel by the corresponding feature pixel
add them up
divide by the total number of pixels in the feature
this becomes a map of where the feature occurs

2. Convolution

Trying every possible match and we can dry different filters
This act of creating a stack of filtered images is called a convolution layer
In covolution, we get a stack of filtered images

3. Pooling : Shrinking the image stack

Pick a window size(usually 2 or 3)
Pick a stride (usually 2)
Walk your window across your filtered images
From each window, take the maximum value (maxpooling)
see that pooling is indepenedent of the position of the maximum value
we do maxpooling for each of our stacked images and get smaller size of filtered images

4. Normalization

Keep the math from breaking by tweaking each of the values just a bit without blowing up
change every negative to zero using ReLUs
now we have smaller size of stack of images with no negative values

5. Can do Deep Stacking

Input Image => Convolution Layer -> ReLU -> Pooling (multiple)

6. Fully connected Layer

Every value gets a vote
we take all the cells of shrinked stack of images and we put them into a single list
each of them connects to one of our answers
when we feed in a picture of X, there wil be certain values that tend to be high
when we feed in a picture of O, there are certain values that ten to predict O
Based on weights that each value gets to vote with, we get a nice average vote at the end
So, in fully connected layer, a list of feature values becomes a list of votes
we can stack as many fully connected layers (hidden)

7. Learning

Where do all the magic numbers come from?

a) Features in convolutional layers

b) Voting weights in fully connected layers

A: Back Propogation ( an error in the final answer is used to determine how much the network adjusts and changes)

Gradient Descent: For each feature pixel and voting weight they adjust it up and down and see how error changes. The amount they adjusted is is guessed by how big the error is like sliding a ball to right and left.

8. Hyper-Parameters (in our hands)

Convolution:
- Number of Features
- Size of features
Pooling:
- Window Size
- Window stride
Fully Connected
- Nunber of neurons

9. Architecture

a) How many of each type of layer?

b) In what order?

c) can we design a new type of layer?

9. Extras

Not just 2D or 3D images but we can also apply to structured data

- Images
- Sound (timesteps close to each other are closely related)
- Text - Position in sentence is column, and row is word in dictionary (take a filter from top to bottom and slide left to right)

10. Limitations

Convnets only capture local "spatial" patterns in data. If data cant be look like an image, then these are less useful. Example: a sturcutured customer data
Rule of thumb: If your data is just as useful after swapping any of your columns with each other then , you cant use Convolutional Neural Networks

ConvNets are graat at finding patterns and using classify images