4. Inner Product Spaces

4.1. Definitions and Examples

4.1.1. Two vectors $v,w \in \mathbb{R}^2$ are orthogonal if the angle between $v$ and $w$

$$\theta = \arccos\left( \frac{v^1w^1 + v^2w^2}{\|v\|_2 \|w\|_2} \right)$$

is $\pi/2$. This is the case if and only if $v^1w^1 + v^2w^2 = 0$.

Similarly, we define the dot product of two vectors $v,w \in \mathbb{R}^n$ as the sum

$$v \cdot w = v^1w^1 + \cdots + v^nw^n,$$

and declare $v$ and $w$ to be orthogonal in case their dot product is 0.

4.1.2. To speak of orthogonality in a broader context, we generalize the dot product. An inner product on a vector space $V$ over $\mathbb{F}$ is a function $\langle \cdot, \cdot \rangle: V \times X \to \mathbb{F}$ that satisfies the following properties:

  • linearity in the first coordinate: $\langle av + bw, x \rangle = a \langle v, x \rangle + b \langle w, x \rangle$ for all $v,w,x \in V$ and $a,b \in \mathbb{F}$;
  • Hermitian property: $\langle w, v \rangle = \overline{\langle v, w \rangle}$, the complex conjugate of $\langle v, w \rangle$, for all $v,w \in V$;
  • positive-semidefinitness: $\langle v, v \rangle \geq 0$ for all $v \in V$;
  • uniqueness of zero dot product: $\langle v,v \rangle = 0$ if and only if $v = 0_V$.

An inner product space is a vector space equipped with an inner product.

4.1.3. An inner product space is a special kind of normed linear space. Indeed, for each inner product $\langle \cdot, \cdot \rangle$ on a vector space $V$, the function $\|\cdot\|:V \to [0,\infty)$ defined by the formula

$$\|v\| = \sqrt{\langle v,v \rangle}$$

is a norm (§3.2.1) satisfying the parallelogram law

$$\|v+w\|^2 + \|v-w\|^2 = 2(\|v\|^2 + \|w\|^2).$$

Conversely, every norm $\|\cdot\|$ on $V$ that satisfies the parallelogram law induces an inner product (§4.1.2), given by the polarization identity

$$\langle v,w \rangle = \frac{1}{4} \left( \|v + w\|^2 - \|v-w\|^2 + i \|v + iw\|^2 - i \|v - iw\|^2 \right).$$

This, in particular, shows that every inner product space is a metric space (§3.2.2). If the associated metric is complete, we say that the inner product space is a Hilbert space.

4.1.4. The quintessential example of a Hilbert space is the $L_2$-space, whether it be the Euclidean version (§3.3)

$$l_2^n = \left\{ v \in \mathbb{R}^n : \|v\|_2 < \infty \right\}$$

with inner product

$$\langle v ,w \rangle = \sum_{i=1}^n v^i \overline{w^i},$$

the sequence space version (§3.4)

$$l_2 = \left\{ (a_n)_{n=0}^\infty : \|(a_n)_n\|_2 < \infty \right\}$$

with inner product

$$\langle (a_n)_n, (b_n)_n \rangle = \sum_{n=0}^\infty a_n \overline{b_n},$$

or the function space version (§3.5)

$$L_2(\mathbb{R}) = \left\{ f:\mathbb{R} \to \mathbb{R} : \|f\|_2 < \infty \right\}$$

with inner product

$$\langle f, g \rangle = \int_{-\infty}^\infty f(t) \overline{g(t)} \, dt.$$

In all three cases, we observe that

$$|\langle f, g \rangle| \leq \langle |f|, |g| \rangle = \|fg\|_1 \leq \|f\|_2\|g\|_2,$$

where the last inequality is a consequence of Hölder's inequality (§3.3.2, §3.4.2, §3.5.2). It turns out this inequality holds on all inner product spaces, in the following form:

$$|\langle v, w \rangle| \leq \|v\|\|w\|.$$

This is the most important tool in the theory of inner product spaces, known as the Schwarz inequality. The reader might be familiar with the $l_2^n$-space version of it, the Cauchy–Schwarz inequality.

4.2. Orthogonality

4.2.1. On an inner product space $V$, we say two vectors $v$ and $w$ are orthogonal if $\langle v, w \rangle = 0$. We write $v \perp w$ to denote that $v$ and $w$ are orthogonal.

4.2.2. The Pythagorean theorem, perhaps the most famous theorem in mathematics, admits a natural generalization on inner product spaces. Indeed, we have that

$$\|v + w\|^2 = \|v\|^2 + \|w\|^2$$

whenever $v \perp w$.

4.2.3. The set of all vectors orthogonal to a fixed vector $v$ is called the orthogonal complement of $v$ and is denoted by $v^\perp$. Generalizing, we define the orthogonal complement of a set $E$ as

$$E^\perp = \bigcup_{v \in E} v^\perp.$$

4.2.4. In $\mathbb{R}^2$, every vector can be written as a sum of two orthogonal vectors:

$$(v,w) = (v,0) + (0,w).$$

We shall see presently that this is true of all Hilbert spaces (§4.1.3).

A projection of a vector space $V$ is a function that maps all vectors in $V$ into a linear subspace (§3.1.2) of $V$ in a structure-preserving way. Formally, a projection of a vector space $V$ onto a linear subspace $M$ is a surjective linear transformation $P:V \to M$ such that $P|_M = I$, the identity linear transformation

$$Ix = x.$$

Since $P|_M = I$, we see that $P \circ P = P$. In fact, every linear transformation $T$ that satisfies the idempotence property

$$T \circ T = T$$

is a projection.

Every projection $P$ of a vector space $V$ admits a direct-sum decomposition

$$V = P(V) \oplus (I-P)(V)$$

of $V$. What this means is that each $x \in V$ can be written as the sum

$$x = y+z$$

with $y \in P(V)$ and $z \in (I-P)(V)$ in precisely one way: namely, $y = Px$ and $z = (I-P)x = x - Px$. In light of the decomposition, we often refer to $P$ as the projection of $V$ onto $P(V)$ along $(I-P)(V)$.

To harness the power of orthogonality, we ought to have a direct-sum decomposition of orthogonal subspaces. This, in general, is not possible for inner product spaces. On a Hilbert space (§4.1.3), however, we can always find an orthogonal projection onto a closed linear subspace.

Let $\mathcal{H}$ be a Hilbert space. The norm induced by the inner product turns $\mathcal{H}$ into a metric space, and so it makes sense to talk about closed sets (§2.1.2). Whenever $M$ is a linear subspace of $\mathcal{H}$ that is also a closed subset of $\mathcal{H}$ ("closed linear subspace"), there exists a projection $P:\mathcal{H} \to M$ such that

$$(I-P)(M) = M^\perp.$$

This, in particular, implies that

$$\mathcal{H} = M \oplus M^\perp.$$

$P$ is called an orthogonal projection of $\mathcal{H}$ onto $M$ along $M^\perp$ (§4.2.3).

4.2.5. An immediate consequence of the existence of orthogonal projections (§4.2.4) is the existence of nontrivial orthogonal complements. Indeed, if $M$ is a closed proper linear subspace of $\mathcal{H}$—a closed linear subspace (§4.2.4) that is also a proper subset of $\mathcal{H}$—then $M^\perp$ is nonempty.

This, in particular, implies that

$$(\ker l)^\perp$$

is nontrivial whenever $l \in \mathcal{H}^*$, whence a quick computation shows that

$$lx = \langle x, (\overline{lz})z \rangle$$

for any choice of $z \in (\ker l)^\perp$ with $\|z\|_{\mathcal{H}} = 1$. Moreover,

$$\|(\overline{lz})z\|_{\mathcal{H}} = |(\overline{lz})|\|z\|_{\mathcal{H}} = |lz| \leq \|l\|_{\mathcal{H}^*}\|z\|_{\mathcal{H}} = \|l\|_{\mathcal{H}^*}$$

by the definition of the operator norm (§3.7.3). Conversely,

$$|lz| = |\overline{lz}|\|z\|_{\mathcal{H}} = \|(\overline{lz}z\|_{\mathcal{H}} = \|(\overline{lz}z\|_{\mathcal{H}}\|z\|_{\mathcal{H}},$$

and so $\|l\|_{\mathcal{H}^*} \leq \|u\|_{\mathcal{H}}$ by the definition of the operator norm. It follows that

$$\|l\|_{\mathcal{H}^*} = \|(\overline{lz})z\|_{\mathcal{H}}$$

To summarize, we have the Riesz representation theorem: every bounded linear functional $l \in \mathcal{H}^*$ on a Hilbert space $\mathcal{H}$ admits $u_l \in \mathcal{H}$ such that $\|l\|_{\mathcal{H}^*} = \|u_l\|_{\mathcal{H}}$ and

$$lx = \langle x, u_l \rangle$$

for all $x \in \mathcal{H}$. It turns out that $u_l$ is unique as well, as we discuss below (§4.3.3).

4.3. Unitary Classification of Hilbert Spaces

4.3.1. A linear operator $T:(X, \langle\cdot,\cdot\rangle_X) \to (Y,\langle\cdot,\cdot\rangle_Y)$ is unitary if

$$\langle Tx, Tx' \rangle_Y = \langle x,x' \rangle_X$$

for all $x,x' \in X$. If $T$ is, in addition, a bijection, then $T$ is called a unitary isomorphism. We remark that the inverse function of a unitary isomorphism is a unitary isomorphism.

Two inner prodcut spaces are said to be unitarily isomorphic if there is a unitary isomorphism between them. Unitary isomorphic inner product spaces are understood to be indistinguishable, as far as the structurutal properties of inner product spaces are concerned.

4.3.2. A linear operator $T:(X, \|\cdot\|_X) \to (Y, \|\cdot\|_Y)$ is isometric if

$$\|Tx\|_Y = \|x\|_X$$

for all $x \in X$.

Let us now assume that $X$ and $Y$ are inner product spaces, whose norms are induced by their inner products (§4.1.3). By the polarization identity (§4.1.3), every isometric linear operator is unitary, and vice versa. It follows, in particular, that an isometric isomorphism is a unitary isomorphism.

We note that every isometric linear operator is injective. Indeed, if $Tx = Tx'$, then

$$0_Y = Tx - Tx' = T(x-x').$$

Since

$$\|x-x'\|_X = \|T(x-x')\|_Y = \|0_Y\| = 0,$$

we conclude that $x - x' = 0_X$, i.e., $x = x'$.

It follows that every surjective isometric linear operator is an isometric isomorphism, hence a unitary isomorphism.

4.3.3. We show that the Riesz representation theorem (§4.2.5) furnishes a unitary isomorphism between a Hilbert space $\mathcal{H}$ and its dual space $\mathcal{H}^*$. We define a linear operator $T:\mathcal{H}^* \to \mathcal{H}$ by the formula

$$Tl = u_l,$$

where $u_l$ is a vector corresponding to $l$ given in the Riesz representation theorem. Since $\|l\|_{\mathcal{H}^*} = \|u_l\|_{\mathcal{H}}$ for all $l \in \mathcal{H}^*$, we see that $T$ is isometric.

It thus suffices to show that $T$ is surjective (§4.3.2). To this end, we fix $u \in \mathcal{H}$ and observe that the function $l_u:\mathcal{H} \to \mathbb{F}$ defined by the formula

$$l_ux = \langle x, u \rangle$$

is a bounded linear functional on $\mathcal{H}$. It now follows that $Tl_u = u$, and so $T$ is surjective.

We conclude that $T$ is an isometric isomorphism, hence a unitary isomorphism.

4.3.4. We now classify Hilbert spaces up to a unitary isomorphism.

We say that a set of collection of vectors $\mathscr{U} = \{u_\alpha\}_{\alpha}$ in a Hilbert space $\mathcal{H}$, potentially uncountable, is an orthonormal set if $\|u_\alpha\| = 1$ for all $\alpha$ and $u_\alpha \perp u_\beta$ for all distinct $\alpha$ and $\beta$. If, in addition, $\mathscr{U}$ satisfies one of the properties below, we say $\mathscr{U}$ is an orthonormal basis of $\mathcal{H}$:

  • $\mathscr{U}$ is complete, i.e., $\mathscr{U}^\perp = \{0\}$;
  • Parseval's identity holds for $\mathscr{U}$, i.e.,
$$\|x\|_{\mathcal{H}} = \sum_{\alpha} |\langle x, u_\alpha \rangle|^2$$

for all $x \in \mathcal{H}$;

  • the orthonormal expansion identity
$$ x = \sum_{\alpha} \langle x, u_\alpha \rangle u_\alpha$$

holds for all $x \in \mathcal{H}$.

It follows from Zorn's lemma that every Hilbert space has an orthonormal basis.

The above uncountable sums are not, in fact, uncountable. For each $x \in \mathcal{H}$, there are at most countable indices $\alpha$ such that $\langle x, u_\alpha \rangle \neq 0$. The infinite sum in the orthonormal expansion identity is to be understood in terms of unconditional convergence, which stipulates an existence of a vector $L$ such that the sequence of partial sums of the infinite sum converges to $L$ for all possible reordering of indices.

The fact that the infinite sums above must be countable is a counsequence of Bessel's inequality

$$\sum_{\alpha} |\langle x, u_\alpha \rangle|^2 \leq \|x\|^2_{\mathcal{H}},$$

which holds for every orthonormal set $\{u_\alpha\}_{\alpha}$.

4.3.5. We finish the unitary classification of Hilbert spaces by constructing the discrete $l_2$ spaces as follows.

Let $\mathcal{I}$ be a nonempty set, and let $\mathbb{F}^\mathcal{I}$ be the set of all funtions $f:\mathcal{I} \to \mathbb{F}$. We define the $l_2$-norm of $f \in \mathbb{F}^\mathcal{I}$ to be the sum

$$\|f\|_2 = \sum_{\alpha \in \mathcal{I}} |f(\alpha)|^2,$$

which makes sense if the set of nontrivial values

$$\{\alpha \in \mathcal{I} : f(\alpha) \neq 0\}$$

is at most countable. We define $l_2(\mathcal{I})$ to be the collection of all $f \in\mathbb{F}^\mathcal{I}$ with finite $l_2$-norm, with pointwise addition and pointwise scalar multiplication.

$l_2(\mathcal{I})$ is a Hilbert space with inner product

$$\langle f,g \rangle = \sum_{\alpha \in \mathcal{I}} f(\alpha) \overline{g(\alpha)},$$

where $\overline{g(\alpha)}$ is the complex conjugate of $g(\alpha)$.

Now, we fix a Hilbert space $\mathcal{H}$ and find its orthonormal basis $\{u_{\alpha}\}_{\alpha \in \mathcal{I}}$ with index set $\mathcal{I}$ (§4.3.4). We define $T:\mathcal{H} \to l_2(\mathcal{I})$ by the formula

$$T \left( \sum_{\alpha \in \mathcal{I}} \langle x, u_\alpha \rangle u_\alpha \right) = \sum_{\alpha \in \mathcal{I}} \langle x, u_\alpha \rangle \mathbb{1}_\alpha$$

where $1_\alpha:\mathcal{I} \to \mathbb{R}$ is 1 at $\alpha$ and 0 everywhere else on $\mathcal{I}$. $T$ is a unitary isomorphism.

We conclude that every Hilbert space is unitarily isomorphic to $l_2(\mathcal{I})$ for some set $\mathcal{I}$.

4.4. Hilbert Spaces with a Countable Orthonormal Basis

4.4.1. We now study three examples of Hilbert spaces with countable orthonormal bases.

A topological condition for countability is separability: a Hilbert space has a countable orthonormal basis if and only if $\mathcal{H}$ is separable. We shall not discuss separability in depth.

4.4.2. The function space $L_2(\mathbb{R})$ (§3.4) has a countable orthonormal basis.

A dyadic interval in $\mathbb{R}$ is an interval of the form

$$[m2^{-k}, (m+1)2^{-k})$$

for some $m,k \in \mathbb{Z}$. Given a dyadic interval $I = [m2^{-k},(m+1)2^{-k})$, we let

$$\begin{align*} I_L &= [m2^{-k}, (m+1/2)2^{-k}) \\ I_R &= [(m+1/2)2^{-k}, (m+1)2^{-k}) \end{align*}$$

and define the Haar function associated with the interval $I$

$$h_I = \frac{1}{\sqrt{\vert I \vert}}(\chi_{I_L} - \chi_{I_R}).$$

Here $\vert I \vert$ is $2^{-k}$, the length of $I$. For any set $E$, the indicator function $\chi_E$ is defined by the formula

$$\chi_E(x) = \begin{cases} 1 & \mbox{ if } x \in E \\ 0 & \mbox{ if } x \not\in E. \end{cases}$$

The collection of Haar functions

$$\{h_I : \mbox{$I$ is a dyadic interval in $\mathbb{R}$}\}$$

is a countable orthonormal basis of $L_2(\mathbb{R})$, called the Haar wavelet basis of $L_2(\mathbb{R})$.

4.4.3. In general, $\varphi \in L_2(\mathbb{R})$ is called a wavelet if the set

$$\{\varphi_{m,k} : m,k \in \mathbb{Z}\}$$

consisting of the functions

$$\varphi_{m,k}(x) = 2^{-m/2} \varphi(2^mx - n)$$

is a countable orthonormal basis of $L_2(\mathbb{R})$.

Wavelets are useful in signal processing and compressed sensing.

4.4.4. We now consider the function space $L_2([-\pi,\pi])$ consisting of functions $f:[-\pi,\pi] \to \mathbb{C}$ whose $L_2$-norm

$$\|f\|_2 = \frac{1}{2\pi} \int_{-\pi}^\pi |f(t)|^2 \, dt$$

is finite. With pointwise addition and scalar multiplicaiton, $L_2([-\pi,\pi])$ is a Banach space. The inner product

$$\langle f,g \rangle = \frac{1}{2\pi} \int_{-\pi}^\pi f(t) \overline{g(t)} \, dt$$

turns $L_2([-\pi,\pi])$ into a Hilbert space.

The Fourier basis

$$\{e^{inx} : n \in \mathbb{Z}\}$$

is a countable orthonormal basis of $L_2([-\pi,\pi])$. The orthonormal expansion

$$f = \sum_{n \in \mathbb{Z}} \langle f,e^{inx} \rangle e^{inx}$$

is called the Fourier series of $f$, and the quantity

$$\langle f,e^{inx} \rangle$$

the $n$th Fourier coefficient of $f$. The double-ended sum is understood to be the limit

$$\lim_{N \to \infty} \sum_{n=-N}^N \langle f, e^{inx} \rangle e^{inx}.$$

Parseval's identity (§4.3.4) takes the form

$$\|f\|_2 = \sum_{n \in \mathbb{Z}} |\langle f, e^{inx} \rangle|^2,$$

known as the Plancherel identity. This, in particular, shows that the right-hand side converges.

4.4.5. The sequence space $l_2$ (§3.2) has a countable orthonormal basis.

To see this, we first consider a related sequence space $l_2(\mathbb{Z})$, which consists of double-ended sequence $(a_n)_{n \in \mathbb{Z}}$ with inner product

$$\langle (a_n)_{n \in \mathbb{Z}}, (b_n)_{n \in \mathbb{Z}} \rangle_{l_2(\mathbb{Z})} = \sum_{n \in \mathbb{Z}} a_n \overline{b_n}$$

We show that $l_2(\mathbb{Z})$ is isometrically isomorphic to $L_2([-\pi,\pi])$. To this end, we define a linear operator $T:l_2(\mathbb{Z}) \to L_2([-\pi,\pi])$ by setting

$$T(a_n)_{n \in \mathbb{Z}} = \sum_{n \in \mathbb{Z}} a_n e^{inx}.$$

Observe that

$$\|(a_n)_{n \in \mathbb{Z}}\|_{l_2(\mathbb{Z})} = \sum_{n \in \mathbb{Z}} |a_n|^2 = \sum_{n \in \mathbb{Z}} |a_n e^{inx}|^2 = \left\|\sum_{n \in \mathbb{Z}} a_n e^{inx}\right\|_2.$$

Therefore, $T$ is an isometric linear operator.

Now, for each $f \in L_2([-\pi,\pi])$, the Plancherel identity (§4.4.5) implies that the Fourier coefficients of $f$ form a sequence in $l_2(\mathbb{Z})$. It follows that $T$ is surjective, whence $T$ is an isometric isomorphism. We conclude that $l_2(\mathbb{Z})$ is isometrically isomorphic to $L_2([-\pi,\pi])$

Now, we define a bijective function $f:\mathbb{N} \to \mathbb{Z}$ by setting

$$f(n) = \begin{cases} \frac{n}{2} & \mbox{ if } n \mbox{ is even}; \\ \frac{-n-1}{2} & \mbox{ if } n \mbox{ is odd}.\end{cases}$$

The function $S:l_2 \to l_2(\mathbb{Z})$ defined by the formula

$$S(a_n)_n = (a_{f^{-1}(m)})_{m \in \mathbb{Z}}$$

is easily seen to be an isometric isomorphism.

It now follows that $T \circ S:l_2 \to L_2([-\pi,\pi])$ is an isometric isomorphism, whence $l_2$ is unitarily isomorphic to $L_2([-\pi,\pi])$. Since the Fourier basis

$$\{e^{inx} : n \in \mathbb{Z}\}$$

is a countable orthonormal basis of $L_2([-\pi,\pi])$ (§4.4.5), we conclude that

$$\{(T \circ S)^{-1}(e^{inx}) : n \in \mathbb{Z}\}$$

is a countable orthonormal basis of $l_2$.

4.5. Finite-Dimensional Hilbert Spaces

4.5.1. Every finite-dimensional (§3.10.1) inner product space is a Hilbert space.

Let $\mathcal{H}$ be a finite-dimensional Hilbert space with a basis $\{e^1,\ldots,e^n\}$. By setting

$$u_1 = \frac{e^1}{\|e^1\|_{\mathcal{H}}}$$

and

$$u^{k+1} = \frac{e^{k+1} - \sum_{i=1}^k \langle e^{k+1}, e^i \rangle e^i}{ \left\|e^{k+1} - \sum_{i=1}^k \langle e^{k+1}, e^i \rangle e^i \right\|_{\mathcal{H}}}$$

for each $k \geq 1,$ we obtain an orthonormal basis

$$\{u^1,\ldots,u^n\}$$

of $\mathcal{H}$. This is called the Gram–Schmidt process.

The Gram–Schmidt process produces a finite orthonormal basis for each finite-dimensional Hilbert space. It follows that every finite dimensional Hilbert space is unitarily isomorphic to $l_2(\mathcal{I})$ for some finite $\mathcal{I}$ (§4.3.5). Since $\mathbb{F}^n$ with the dot product (§4.3.5) has its $n$ coordinate vectors (§3.8.1) as its orthonormal basis, $\mathbb{F}^n$ with the dot product is unitarily isomorphic to $l_2(\{1,2,\ldots,n\})$. It follows that every finite-dimensional Hilbert space with a basis of size $n$ is unitarily isomorphic to $\mathbb{F}^n$.

An important property of finite-dimensional Hilbert space is the ability to extend orthonormal sets to orthonormal bases. Formally, if $\{v^1,\ldots,v^k\}$ is an orthonormal set in a Hilbert space $\mathcal{H}$ with a basis of size $n$, then there exist vectors $v^{k+1},\ldots,v^n$ in $\mathcal{H}$ such that $\{v^1,\ldots,v^n\}$ is an orthonormal basis of $\mathcal{H}$. This is done by finding a basis of the subspace orthogonal to all of $v^1,\ldots,v^k$ and applying the Gram–Schmit process.

4.5.2. Given a vector space $V$ over $\mathbb{F}$ with a finite basis $\{v^1,\ldots,v^n\}$, the function $\langle \cdot, \cdot \rangle:V \times V \to \mathbb{F}$ defined by the formula

$$\left\langle \sum_{i=1}^n a_iv^i , \sum_{j=1}^n b_jv^i \right\rangle = \sum_{i=1}^n a_i\overline{b_i},$$

is an inner product on $V$. With this inner product, $\{v^1,\ldots,v^n\}$ becomes an orthonormal basis of $V$. It follows that $V$ is unitarily isomorphic to $\mathbb{F}^n$ (§4.5.1).

It thus suffices to examine $\mathbb{R}^n$ and $\mathbb{C}^n$.

4.5.3. The conjugate transpose of an $m$-by-$n$ complex matrix $A$ (§3.8.1) is the $n$-by-$m$ matrix

$$A^* = \begin{bmatrix} \overline{a_{11}} & \overline{a_{21}} & \cdots & \overline{a_{m1}} \\ \overline{a_{12}} & \overline{a_{22}} & \cdots & \overline{a_{m2}} \\ \vdots & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ \overline{a_{1n}} & \overline{a_{2n}} & \cdots & \overline{a_{mn}} \end{bmatrix}.$$

We say that $A$ is Hermitian if $A^* = A$. Hermitian matrices are necessarily square matrices, and an $n$-by-$n$ matrix $A$ is Hermitian if and only if

$$(Av) \cdot w = v \cdot (Aw)$$

for all $v,w \in \mathbb{R}^n$. Here, the $\cdot$ symbol denotes the standard dot product (§4.1.1).

We note that

$$(AB)^* = B^*A^*$$

for all matrices $A$ and $B$.

4.5.4. An $n$-by-$n$ matrix $A$ is unitary if its conjugate transpose $A^*$ is its inverse:

$$A^*A = AA^* = I.$$

$A$ is unitary if and only if

$$(Av) \cdot (Aw) = v \cdot w$$

for all $v,w \in \mathbb{R}^n$. Here, the $\cdot$ symbol denotes the standard dot product (§4.1.1). This, in particular, implies that $A$ is unitary if and only if the linear operator associated with $A$ (§3.8.3) is unitary (§4.3.1).

We also note that $A$ is unitary if and only if its column vectors form an orthonormal basis (§4.3.4) of $\mathbb{C}^n$ with respect to the standard dot product.

4.5.5. The most important fact about Hermitian matrices is the spectral theorem, which states that every Hermitian matrix is unitarily diagonalizable. In other words, if $A$ is Hermitian, then there exists a unitary matrix $P$ and a real diagonal matrix $D$ such that

$$A = PDP^*.$$

We can assume furthermore that $D$ is of the form

$$D = \operatorname{diag}(d_1,\ldots,d_k,0,\ldots,0).$$

How do we make sense of the spectral theorem? We know that the column vectors $\{p^1,\ldots,p^n\}$ of $P$, therefore the row vectors of $P^*$, constitute an orthonormal basis of $\mathbb{C}^n$ (§4.5.3). Since

$$P^* v = \begin{bmatrix} p^1 \cdot v \\ p^2 \cdot v \\ \vdots \\ p^n \cdot v \end{bmatrix}$$

for all $v \in \mathbb{C}^n$, we see that

$$P^* p^i = e^i,$$

the $i$th coordinate vector of $\mathbb{C}^n$ (§3.8.1). Multiplying $D = \operatorname{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)$ with $e^i$ yields

$$De^i = \lambda_i e^i.$$

Since $Pe^i = p^i$, we see that

$$Ap^i = PDP^* p^i = \lambda p^i.$$

In summary, the spectral theorem applied to a Hermitian matrix $A$ yields an orthonormal basis $\{p^i,\ldots,p^n\}$ and real numbers $\lambda_1,\ldots,\lambda_n$ such that

$$Ap^i = \lambda_ip^i$$

for all $1 \leq i \leq n$.

4.5.6. If $\lambda$ is a scalar and $v$ is a nonzero vector such that

$$Av = \lambda v,$$

we say that $\lambda$ is an eigenvalue of $A$ and $v$ an eigenvector of $A$ associated with the eigenvalue $\lambda$. In light of these definitions, we say that the spectral theorem provides an eigendecomposition of a Hermitian matrix.

Conversely, if $\{v^1,\ldots,v^n\}$ is an orthonormal basis of $\mathbb{C}^n$ consisting of eigenvectors of $A$ associated with the eigenvalues $\lambda_1,\ldots,\lambda_n$, respectively, then we have the diagonalization

$$A = PDP^*$$

where $D = \operatorname{diag}(\lambda_1,\ldots,\lambda_n)$ and $P$ consists of $v^1,\ldots,v^n$ as its column vectors. Indeed,

$$AP = [Av^1 \mid \cdots \mid Av^n] = [\lambda v^1 \mid \cdots \mid \lambda v^n],$$

whence

$$P^*AP = \operatorname{diag}(\lambda_1,\ldots,\lambda_n).$$

We remark that $P$ is unitary, as its column vectors are orthonormal (§4.5.5).

4.5.7. The diagonalization

$$A = PDP^*$$

given by eigendecomposition (§4.5.5) does not guarantee that $A$ is Hermitian. On the other hand,

$$A^*A = (PDP^*)(PDP^*) = PD(P^*P)DP^* = PD^2P^*$$

and

$$AA^* = (PDP^*)(PDP^*)^* = (PDP^*)(PD^*P^*) = PD(P^*P)D^*P^* = PDD^*P^*.$$

Since

$$a^2 = |a|^2 = a\overline{a}$$

for all $a \in \mathbb{C}$, we see that $D^2 = DD^*$, and so

$$A^*A = AA^*.$$

This condition is known as normality. The spectral theorem for normal matrices states that $A$ is unitarily diagonalizable if and only if $A$ is normal. Note that this version of the spectral theorem does not guarantee that the diagonalization consists entirely of real entries.

4.5.8. We observe that a unitary matrix $U^*$ is normal, as $U^*U = I = UU^*$. By the spectral theorem for normal matrices (§4.5.7), there exists a unitary matrix $V$ and a diagonal matrix $D$ such that

$$U = VDV^*.$$

Since

$$(V^*UV)(V^*UV)^* = V^*UVVU^*V = V^*UU^*V = V^*V = I,$$

we see that

$$D = V^*UV$$

is unitary. Therefore, the column vectors of $D = \operatorname{diag}(\lambda_1,\ldots,\lambda_n)$ must form an orthonormal basis of $\mathbb{C}^n$. It follows that

$$|\lambda_i| = 1$$

for all $1 \leq i \leq n$.

4.5.9. We now turn our attention to real matrices. The transpose of an $m$-by-$n$ matrix $A$ is the $n$-by-$m$ matrix

$$A^t = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix}.$$

If the entries of $A$ are real, then $A^t = A^*$, the conjugate transpose of $A$ (§4.5.2).

We say that a matrix $A$ is symmetric if $A^t = A$. If $A$ is a real matrix, then $A$ is symmetric if and only if $A$ is Hermitian. Therefore, the spectral theorem (§4.5.5) implies that a symmetric $A$ is unitarily diagonalizable, i.e., there exists a unitary matrix $O$ and a real diagonal matrix $D$ with real entries such that

$$A = ODO^*.$$

A stronger result holds, in fact: the spectral theorem for real symmetric matrices states that $O$ is a real matrix. This implies that $O$ is orthogonal, i.e.,

$$O^tO = OO^t = I.$$