At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called "spikes") that are channeled to outputs (axons). In our model, our dendrites are like the input features $\begin{equation}\begin{split}x_1,...,x_n\end{split}\end{equation}$, and the output is the result of our hypothesis function.
$\begin{equation}\begin{split}z = w^TX+b \Longrightarrow a= \sigma(z) \Longrightarrow l(a,y) \end{split}\end{equation}$
Mathematically:
For one example $x^{(i)}$: $$z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}$$ $$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$ $$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}$$ $$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$ $$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$
Given the predictions on all the examples, you can also compute the cost $J$ as follows: $$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small \tag{6}$$
Update gradients after each mini-batch, a batch of size 64 or 128 or 256. When mini batch is of size 1, process is called stochastic gradient descent. with a reasonable mini batch size, you get benefit of vectorization as well as converge faster.
In [1]:
import numpy as np
import tensorflow as tf
In [2]:
var1 = tf.constant(20 , dtype= tf.float32, name= 'var1')
var2 = tf.constant(30 , dtype= tf.float32, name='var2')
In [3]:
loss = tf.Variable((var1 - var2) ** 2 , dtype=tf.float32, name= 'loss')
In [4]:
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print(sess.run(loss))