##### Derivation of small change in cost function due to small change in weight or bias

Consider 3-node network with one input node, one hidden node, and one output node.

Let us first expand $(y-a)$:

$$(y-a)=(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)$$

Where here $\sigma(x)=\frac{1}{1+e^{-x}}$, $i_1$ represents the input, and $w_1$, $b_1$, $w_2$, $b_2$ represent the weights and biases for the idden and output layers respectively.

All together we have:

$$tan^2(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)$$

So to figure out if our cost function responds to a small change in weights or biases, we take all four partial derivatives and consider:

$$\lim_{\Delta X \to\ 0} \frac{\partial tan^2(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)}{\partial X} \Bigg\vert_{X=X+\Delta X}$$

Where $X \in \{w_1, w_2, b_1, b_1\}$.

Consider $X=w_1$:

$$\frac{\partial tan^2(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)}{\partial w_1} \Bigg\vert_{w_1=w_1+\Delta w_1}=\frac{2 sin(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)}{cos^3(\sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a)} \frac{\partial \sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a}{\partial w_1} \Bigg\vert_{w_1=w_1+\Delta w_1}$$

As we can see already, the complexity of these networks can become unmanageable quickly...onward!

$$\frac{\partial \sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)-a}{\partial w_1} \Bigg\vert_{w_1=w_1+\Delta w_1}=\frac{\partial \sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)}{\partial w_1}-\frac{\partial a}{\partial w_1} \Bigg\vert_{w_1=w_1+\Delta w_1}=\frac{\partial \sigma(\sigma(i_1 w_1+b_1) w_2 + b_2)}{\partial w_1} \Bigg\vert_{w_1=w_1+\Delta w_1}$$