Vector: $x = \left[x_1 \ x_2 \ ... \ x_n\right] = \left[\begin{array}{c}x_1\\x_2\\.\\.\\x_n \end{array} \right]$
Norm: $x^T x = \|x\|_2^2$
Norm of a matrix: $\sqrt{trace(X^T X)}$
Trace of a mtrix: $\displaystyle trac(X) = \sum_i X_{i,i}$
Inverse of a matrix: $X^{-1} X = X X^{-1} = I$
Determinant
Set of linearly dependent vectors $x_t$ $\exists w, t^* \text{such that} x_{t^*} = \sum_{t \ne t^*} w_t x_t$
definition: rank of a matrix is the number of linearly independent min(#column, #row)
definition: range of a matrix
$$\mathcal{R}(X) = \left\{x \in R^n | \exists w \text{sucj that} x = \sum_j w_j X_{.,j} \right\}$$
definition: Nullspace of a matrix: $\left\{ x \in R^n | x \in \mathcal{R}(X)\right\}$
$U$ spans the row space and $V$ spans the column space of $X$. ???
$U^T U = I$ $V^TV = I$
The set of all vectors that results in zero: $A x = 0$ where $x$ is null-space of $A$.
$$\{x\in \mathbb{R}^n | x \notin \mathcal{R})(X)\}$$Eigenvalues and eigenvectors (Square matrix) $$\left\{ \lambda_i, u_i| Xu_i = \lambda_i u_i \text{ and } u_i^T u_i = 1_{i=j}\right\}$$
Properties
derivative: $\displaystyle \frac{d}{dx}f(x) = \lim_{\Delta\to 0} \frac{f(x + \Delta) - f(x)}{\Delta}$
partial deriviative
$$
Example: $f(x,y) = \frac{x^2}{y}$
probasbility space: Triplet $\left( \Omega, \mathcal{F}, P\right)$
properties of probabilities
conditional $P(x_1|x_2)$
$$\sum_{x_1} P(X_1 = x_1|X_2 = c) = 1$$
Given two random variables $x_1$ and $x_2$, and having the marginal probabilities $P(x_1),P(x_2)$, can we get their joint distribution $P(x_1,x_2)$?
Answer: No. We also need to know the conditional distribution: $P(x_1|x_2)$
generative models
(Q) Have a discrete distribution, we want to sample from this data using the probability distribution.
(A) how about continous case? We have $\mathbf{p}[x]$ where $x$ is a continous variable.
$$\mathbf{E}[f(x)] = \int_x f(x) \frac{\mathbf{p}(x)}{q(X)} q(x) dx \approx \frac{1}{K} \sum_k f(x_k) \frac{\mathbf{p}(x)}{q(x)}$$
selected points are not independent
$$x^{(1)} \rightarrow^{T(x'\leftarrow x)} x^{(2)} \rightarrow x^{(3)} ... \rightarrow x^{(K)}$$
$T(x' \leftarrow x)$ is a transition operator, that must satify certain properties
MCMC method which uses the following transition operator $T(x' \leftarrow x)$:
return $x'$
Often, we simply cycle through the variables in random order
$$H(X) = \sum_i p_i \log_2 (\frac{1}{p_i})$$
$\sqrt{p}$ and $\sqrt{q}$ are unit vectors
$$D_H(p,q) = \frac{1}{2}\|\sqrt{p} - \sqrt{q}\|_2$$
$$D_{KL}(p,q) = \sum_i p_i \log_2 \frac{p_i}{q_i}$$
where $r=\frac{p+q}{2}$ is the average distribution
In [ ]: