What is Neural Network

At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called "spikes") that are channeled to outputs (axons). In our model, our dendrites are like the input features $\begin{equation}\begin{split}x_1,...,x_n\end{split}\end{equation}$, and the output is the result of our hypothesis function.

$\begin{equation}\begin{split}z = w^TX+b \Longrightarrow a= \sigma(z) \Longrightarrow l(a,y) \end{split}\end{equation}$

Vectorizing and feedforward

Activation Functions

Sigmoid

tanh

relu and leaky relu

relu $\begin{equation}\begin{split}\Longrightarrow f(x) = max(0,x)\end{split}\end{equation}$

leaky relu $\begin{equation}\begin {split} \Longrightarrow f(x) = max(0.001x,x)\end{split}\end{equation}$

softmax

$\begin{equation}\begin{split}P_t(a) = \frac {t_i}{\sum_{i=0}^{n} t_i }\end{split}\end{equation}$

$\begin{equation}\begin{split}t_i = e^{z^i}\end{split}\end{equation}$

At last

Mathematically:

For one example $x^{(i)}$: $$z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}$$ $$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$ $$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}$$ $$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$ $$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$

Given the predictions on all the examples, you can also compute the cost $J$ as follows: $$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small \tag{6}$$

Optimization Algorithms

Gradient descent
stochatic Gradient descent
mini batch ### Mini batch gradient descent

Update gradients after each mini-batch, a batch of size 64 or 128 or 256. When mini batch is of size 1, process is called stochastic gradient descent. with a reasonable mini batch size, you get benefit of vectorization as well as converge faster.

Tensorflow



In [1]:

    
import numpy as np 
import tensorflow as tf



In [2]:

    
var1 = tf.constant(20 , dtype= tf.float32, name= 'var1')
var2 = tf.constant(30 , dtype= tf.float32, name='var2')



In [3]:

    
loss = tf.Variable((var1 - var2) ** 2 , dtype=tf.float32, name= 'loss')



In [4]:

    
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    print(sess.run(loss))