Notation:
$(x,y)$ is one training example, and $(x^{(i)}, y^{(i)})$ refers to the ith training example.
Hypothesis function for univariate data $h_\mathbf{w}(x) = w_0 + w_1 x$ with parameters $\mathbf{w}=(w_0, w_1)$
Given $n$ data points, the goal is to minimize the cost given below:
$$ \begin{align} J(\mathbf{w}) & = \frac{1}{2 n} \displaystyle \sum_{i=1}^{n}\left(h_\mathbf{w}(x)^{(i)} - y^{(i)}\right)^2 \\ & = \frac{1}{2 n} \displaystyle \sum_{i=1}^{n}\left(w_0 + w_1 x^{(i)} - y^{(i)}\right)^2 \end{align} $$Moving in the direction of Gradient (first derivative) of the cost function:
$$\nabla J(\mathbf{w}) = \left(\frac{\partial J(\mathbf{w})}{\partial w_0}, \frac{\partial J(\mathbf{w})}{\partial w_1} \right)$$where the partial derivatives is given as follows
$$\begin{align} \frac{\partial J(\mathbf{w})}{\partial w_0} & = \displaystyle \frac{1}{2 n} \displaystyle \sum_{i=1}^{n} 2.(w_0 + w_1 x^{(i)} - y^{(i)}) = \displaystyle \frac{1}{n} \displaystyle \sum_{i=1}^{n} (w_0 + w_1 x^{(i)} - y^{(i)}) \\ \frac{\partial J(\mathbf{w})}{\partial w_1} & = \displaystyle \frac{1}{2 n} \displaystyle \sum_{i=1}^{n} 2.x^{(i)}.(w_0 + w_1 x^{(i)} - y^{(i)}) = \displaystyle \frac{1}{n} \displaystyle \sum_{i=1}^{n} x^{(i)} (w_0 + w_1 x^{(i)} - y^{(i)}) \end{align}$$
In [ ]: