Overflow and Underflow Computation

Optimization: Gradeint Descent

Let $f:\mathbb{R}\to\mathbb{R}$:

$$f(x+\epsilon) \approx f(x) + \epsilon f'(x)$$

$\displaystyle \left\{\begin{array}{lr}\text{If }f'(x)>0:& f(x+\epsilon)>f(x)\\\text{If }f'(x)<0:& f(x+\epsilon)<f(x)\end{array}\right.$

The short form of the above will be $f(x + \epsilon sign(f'(x))) > f(x)$

First and Second Derivatives

Let $f$ be a multi-input, single output function, i.e.: $f:\mathbb{R}^n\to\mathbb{R}$. Then, vector $\mathbf{x}$ is the input for function $f(\mathbf{x})$, and partial derivative of $f$ with respect to the ith input is $\frac{\partial f(\mathbf{x})}{\partial x_i}$.

Gradient

The vector containing all partial derivatives of $f$ is denoted by $\nabla_\mathbf{x} f(\mathbf{x}) = \left(\frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, ... \frac{\partial f(\mathbf{x})}{\partial x_n}\right)$

Jacobian Matrix

If $\mathbf{f}$ is a multi-input, multi-output function, i.e. $\mathbf{f}:\mathbb{R}^n\to\mathbb{R}^m$ ($\mathbf{f}(\mathbf{x}) = \left(f_1(\mathbf{x}), f_2(\mathbf{x}), ...,f_n(\mathbf{x})\right)$), then Jacobian of $\mathbf{f}$ denoted by $J$ is matrix whose $i,j$th element is partial derivative of the $i$th output with resepect to the $j$th input: $J_{i,j} = \frac{\partial f_i(\mathbf{x})}{\partial x_j}$:

$$J=\left[\begin{array}{cccc}\frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_1}&...&\frac{\partial f_1}{\partial x_n}\\\frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&...&\frac{\partial f_2}{\partial x_n}\\..&..&..&..\\\frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&...&\frac{\partial f_n}{\partial x_n}\end{array}\right]$$

Hessian Matrix

Important properties of Hessian

$v^T \mathbf{H}\ v$
- If $v$ is one of the eigenvectors of $\mathbf{H}$:
- If $v$ is not an eigenvector:

Build intuition: What it means if $\mathbf{H}$ is positive-definite, (i.e. all eigen-values are positive)?

Taylor Series Expansion

$$f(\mathbf{x}) \approx f(\mathbf{x}^{(0)}) + (\mathbf{x}-\mathbf{x}^{(0)})\nabla_\mathbf{x}f(\mathbf{x}^{(0)}) + \frac{1}{2} (\mathbf{x}-\mathbf{x}^{(0)})^T\mathbf{H}(\mathbf{x}-\mathbf{x}^{(0)})$$

Optimal Step Size: $\displaystyle \epsilon^* = \frac{g^T g}{g^T \mathbf{H} g}$

Gradient Descent Failure Case

Different second derivative in different directions
Step size

Newton's method



In [ ]: