Credit: Deep Learning A-Z™: Hands-On Artificial Neural Networks
The Neuron contains:
- Neuron
- Dendrites: receiver of the neuron
- Axon: transmitter of the signal for the neuron
How can we represent neuron in machine?
Input layer:
Output value: can be
Weights: are how neural networks learn.
What happen inside the Neuron?
There are more activation fucntions, but we are going to look at 4 different types of activation function.
Threshold function
Sigmoid function
Rectifier
Hyperbolic Tangent (tanh)
Example: Assuming the dependent variable is binary (y=0 or 1), which activation function can we use? We have 2 options:
- Threshold function: It fits perfect when we need 0 or 1
- Sigmoid function: between 0 and 1, gives us the probability of 0 and 1.
Basic form of a neural network
However, NN has an extra advantage that increases the accuracy which is hidden layers.
For each neuron in the hidden layers:
- The weights of each input variables are not equal. Some weights may have non-zero values, some weights may have zero values. Because not all inputs are important for that neuron. For example: the first neuron only care about 2 inputs: area and distance from city, we can explain that the further from city, the area of the property is larger. That is why we don't need to draw the line of the synapses which are not important.
There are many cost functions. Here is A list of cost functions used in neural networks, alongside applications.
How can we minimize the cost function?
Why we should not use this brute-force approach by trying out lots of parameters and inputs for weights?
We need a different approach: Gradient Descent
Intuition:
Gradient Descent requires the cost function to be a convex function. If our cost function is not convex, then our gradient descent will lead us to the local minimum instead of the global minimum.
However, Stochastic Gradient Descent does not require our cost function to be convex.
Differences between Gradient Descent (also called Batch Gradient Descent) and Stochastic Gradient Descent
Gradient Descent | Stochastic Gradient Descent |
---|---|
Calculate cost functions and adjust weights by taking all input values | Calculate the cost function and adjust the weight by looking at 1 row at a time |
runs lower | runs faster |
deterministic algorithm | stochatic algorithm (random) |
The reason why Stochastic Gradient Descent helps avoid the problem of stucking in local minimum is because Stochastic Gradient Descent has a much higher flunctuation. It is much more higher to find global minimum.
Additional Reading:
Forward Propagation: Information is entered into the input layer and then it is propagated forward to get the output value $\hat y$. The output values are then compared to actual values to compute errors. The errors are then back-propagated through the network in the opposite direction to train the network by adjusting the weights.
Backpropagation allows us to adjust all the weights at the same time.
Additional Reading: Chapter 2 - How the backpropagation algorithm works
Steps-by-steps walkthough in the training of ANN