We are all quite familiar with linear equations and their solutions, starting from high school mathematics. The different types of linear equations and their solutions play a crucial role in various engineering applications, and are often posed in a stochastic framework.
Consider the following simple example, $$ h_{11}x_1 + h_{12}x_2 = y_1 \,\,\,\,\,\,\,\,\,\, ... (1) $$ $$ h_{21}x_1 + h_{22}x_2 = y_2 \,\,\,\,\,\,\,\,\,\, ... (2) $$
Where, the $h_{ij}$s, $\forall \,\, i, j \in \{1,2\}$ and the $y_i$s, $i \in \{1, 2\}$ are known. The goal is to determine the value of $x_1$ and $x_2$.
Using the method of eliminations, we would solve this problem as follows,
Multiply Eq. 1 by $\frac{h_{21}}{h_{11}}$ and subtract it from Eq. 2. This will result in the following equation. $$ \left(h_{22} - h_{12}\frac{h_{21}}{h_{11}}\right)x_2 = y_2 - \frac{h_{21}}{h_{11}}y_1 $$ $$ \implies x_2 = \frac{1}{h_{11}h_{22} - h_{12}h_{21}} \left(-h_{21}y_1 + h_{11}y_2 \right) $$
Substitute $x_2$ in Eq.1 (or Eq.2) and obtain $x_1$. $$ \implies x_1 = \frac{1}{h_{11}h_{22} - h_{12}h_{21}} \left(h_{22}y_1 - h_{12}y_2 \right) $$
The above solution exists, provided $h_{11}h_{22} - h_{12}h_{21} \neq 0$. As the size of this problem increases, the expressions for the solution will become quite complicated. However, the use of matrix notations will not only help keep the notations simple, but will also provide additional insights into the problem.
A linear system of equations for a given set of coefficients $h_{i,j}$, and $y_i$s, $\forall i,j \in \{1, 2, \ldots N\}$, can be written down as the following,
$$ \mathbf{H}\mathbf{x} = \mathbf{y} \,\,\,\,\,\,\,\,\,\, ... (3) $$Where, $\mathbf{H}$ is a $N \times N$ matrix with the coefficients $h_{ij}$, $\mathbf{y} = \left[y_1, y_2, \cdots y_N\right]^{T}$ and $\mathbf{x} = \left[x_1, x_2, \cdots x_N\right]^{T}$. The solution to Eq. (3) is given by the following, $$ \mathbf{x} = \mathbf{H}^{-1}\mathbf{y} \,\,\,\,\,\,\,\,\,\, ... (4) $$
provided $\mathbf{H}^{-1}$ exists, i.e. $\det \mathbf{H} \neq 0$. When the inverse of $\mathbf{H}$ exists, then the Eq. (3) has a unique solution given by Eq. (4).
An interesting variant of the above problem is when there are more equations than unknowns, i.e. $\mathbf{H}$ is a tall rectangular $M \times N$ matrix with $M > N$. This is called an overdetermined problem, and there exists a solution only when the system of equations is consistent.
When an overdetermined system is not consistent, then there exists not $\mathbf{x}$ which satisfies Eq. (3). Such inconsistent overdetermined problems have been in study as early as the nineteenth century. However, it is possible to find a solution $\mathbf{\hat{x}}$ which is a close approximation of $\mathbf{y}$, i.e. $$ \mathbf{H}\mathbf{\hat{x}} \approx \mathbf{y} \,\,\,\,\,\,\,\,\,\, ... (5) $$
The error in the approximation $\mathbf{v} = \mathbf{y} - \mathbf{H}\mathbf{\hat{x}}$ is called the residual, and thus we have,
$$\mathbf{y} = \mathbf{H}\mathbf{\hat{x}} + \mathbf{v} \,\,\,\,\,\,\,\,\,\, ... (6) $$How can we find $\mathbf{\hat{x}}$ so that the "size" of the residual is kept as low as possible? The residual's "size" can be formally defined in an infinite number of ways, but a standard definition for "size" is the squared error of the residual, which can also be thought of as square of the length of the residual vector.
Thus, we can now formulate an optimization problem, where our goal is to choose $\mathbf{\hat{x}}$ such that the squared error of the residual is minimized. i.e.
$$ \mathbf{\hat{x}} = \min_{\mathbf{x}} \left\lVert\mathbf{y} - \mathbf{H}\mathbf{x}\right\rVert^2 $$Where, $\left\lVert\mathbf{v}\right\rVert^2 = \sum_{i=1}^{M} v_i^2 $ and $v_i$s are the components of the residual vector $\mathbf{v}$.
In [1]:
from IPython.core.display import HTML
def css_styling():
styles = open("../styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[1]: