3.1.1. We now discuss one of the most frequently-used class of metric spaces: normed linear spaces. To this end, we recall that a real vector space is a set $V$ equipped with a vector addition operation $+:V \times V \to V$ and a scalar multiplication operation $\cdot:\mathbb{R} \times V \to V$ that satisfy the following conditions:
By substituting $\mathbb{R}$ with $\mathbb{C}$, we obtain a complex vector space. To simplify the exposition, we shall speak of a vector space over a field $\mathbb{F}$, where $\mathbb{F}$ can denote either $\mathbb{R}$ or $\mathbb{C}$.
3.1.2. Given a vector space $V$ over $\mathbb{F}$, we say that a subset $M$ of $V$ is a linear subspace of $V$ if $ax + by \in M$ whenever $a,b \in \mathbb{F}$ and $x,y \in M$. In other words, inh eriting the vector-space structure of $V$ turns $M$ into a vector space.
For example, the Euclidean $n$-space $\mathbb{R}^n$ with coordinatewise addition
$$v+w = (v^1+w^1,\ldots,v^n+w^n)$$and coordinatewise scalar multiplication
$$av = (av^1,\ldots,av^n)$$is a vector space. Whenever $m \leq n$, the subset
$$\{(v^1,\ldots,v^m,0,\ldots,0): v^1,\ldots,v^m \in \mathbb{R}\}$$is a linear subspace of $\mathbb{R}^n$.
3.2.1. A norm on a vector space $V$ over $\mathbb{F}$ is a nonnegative function $\|\cdot\|:V \to [0,\infty)$ that satisfies the following properties:
A vector space $V$ equipped with a norm $\|\cdot\|$ is called a normed linear space $(V,\|\cdot\|)$.
Norms generalize the absolute value function on $\mathbb{R}$ and the Euclidean distance function on $\mathbb{R}^2$.
3.2.2. We observe that
$$d(v,w) = \|v-w\|$$is always a metric (§2.1.1) on $V$. If the corresponding metric is complete, the norm is said to be complete. A normed linear space whose norm is complete is called a Banach space.
3.3.1. The Euclidean $n$-space $\mathbb{R}^n$ (§3.1.2) with the usual Euclidean norm
$$\|\vec{v}\| = \sqrt{\sum_{i=1}^n |v^i|^2}$$is a Banach space. More generally, the $p$-norm
$$\|\vec{v}\|_p = \left( \sum_{i=1}^n |v^i|^p \right)^{1/p}$$is a complete norm on $\mathbb{R}^n$ whenever $p \geq 1$. The limit
$$\|\vec{v}\|_\infty = \lim_{p \to \infty} \|\vec{v}\|_p = \max_{1 \leq i \leq n} |v^i|$$is also a complete norm on $\mathbb{R}^n$, called the $\infty$-norm.
3.3.2. The $p$-norms satisfy Hölder's inequality
$$\| \vec{v} \bullet \vec{w} \|_1 \leq \|\vec{v}\|_p \|\vec{w}\|_q$$whenever $\frac{1}{p} + \frac{1}{q} = 1$, where $\vec{v} \bullet \vec{w}$ is the coordinatewise product
$$\vec{v} \bullet \vec{w} = (v^1 w^1, \ldots, v^n w^n).$$The inequality holds even when $p = 1$ and $q =\infty$.
3.3.3. There are other norms on $\mathbb{R}^n$, but none of them are too different from the $p$-norms. We take up this matter in §3.10.
3.4.1 We now generalize the $p$-norm to infinitely many coordinates. Given a sequence $(x_n)_{n=0}^\infty$ of numbers in $\mathbb{F}$, we define the $l_p$-norm of $(x_n)_n$ to be the quantity
$$\left( \sum_{i=0}^\infty |x_n|^p \right)^{1/p},$$provided the infinite sum converges. The space $l_p$ of all sequences with finite $l_p$-norm with coordinatewise addition
$$(x_n)_n + (y_n)_n = (x_n + y_n)_n$$and scalar multiplication
$$a \cdot (x_n)_n = (ax_n)_n$$is a Banach space. Similarly, the vector space $l^\infty$ of all bounded sequences equipped with the $l_\infty$-norm
$$\|(x_n)_n\|_\infty = \sup_{n \geq 0} |x_n|$$is a Banach space.
3.4.2. The sequence $p$-norms also satisfy Hölder's inequality
$$\| (x_n)_n \bullet (y_n)_n \|_1 \leq \|(x_n)_n\|_p \|(y_n)_n\|_q$$whenever $\frac{1}{p} + \frac{1}{q} = 1$, where $(x_n)_n \bullet (y_n)_n$ is the termwise product
$$(x_n)_n \bullet (y_n)_n = (x_ny_n)_n.$$The inequality holds even when $p = 1$ and $q =\infty$.
3.5.1. We now examine the continuous analogues of the sequence spaces. For each $p \geq 1$, we can define the $L_p$-norm of a function $f:\mathbb{R} \to \mathbb{R}$ as follows:
$$\|f\|_p = \left( \int_{-\infty}^\infty |f(t)|^p \, dt \right)^{1/p}.$$The space $L_p(\mathbb{R})$ of all functions $f:\mathbb{R} \to \mathbb{R}$ whose $L_p$-norm is finite, equipped with pointwise addition
$$(f+g)(x) = f(x) + g(x)$$and scalar multiplication
$$(af)(x) = af(x)$$is a normed linear space, provided that we declare two functions that differ only on a null set to be equal. Whether the norm is complete depends on how the integral is defined. Under the framework of Lebesgue integration, the Lebesgue space $L_p(\mathbb{R})$ is a Banach space.
3.5.2. The Lebesgue $p$-norms also satisfy Hölder's inequality
$$\| fg \|_1 \leq \|f\|_p \|g\|_q$$whenever $\frac{1}{p} + \frac{1}{q} = 1$, where $fg$ is the pointwise product
$$fg(x) = f(x)g(x).$$3.6.1. Let $(K,d_K)$ be a compact metric space (§2.4). The vector space $\mathscr{C}(K)$ of all continuous (§2.2.1) functions $f:(K, d_K) \to \mathbb{R}$ equipped with the uniform norm
$$\|f\|_u = \max_{x \in K} |f(x)|$$is a Banach space. (The name derives from the norm's relationship to uniform convergence.) The norm always exists and is finite, as every continuous function on a compact metric space is bounded and attains its maximum.
3.6.2. There is a nice characterization of compact subspaces (§2.1.3) of $\mathscr{C}(K)$, called the Arzelà–Ascoli theorem. A collection $\mathcal{C}$ of functions in $\mathscr{C}(K)$ is a compact subset of $\mathscr{C}(K)$ if and only if the following conditions hold:
This characterization of compactness is useful in developing the mathematical foundations of reproducing kernel Hilbert spaces.
3.7.1. Let $(V,\|\cdot\|_V)$ and $(W,\|\cdot\|_W)$ be normed linear spaces over $\mathbb{F}$. A linear transformation, or a linear operator, from $V$ to $W$ is a function $T:V \to W$ such that
We often write $Tx$ in place of $T(x)$.
3.7.2. We note that the kernel of $T$
$$\ker T = \{x \in V: Tx = 0_W\}$$is a linear subspace of $V$, and that the image of $T$
$$\operatorname{im} T = \{Tx : x \in V\}$$is a linear subspace of $W$.
3.7.3. The operator norm $\|T\|_{\operatorname{op}}$ of a linear operator $T:(V,\|\cdot\|_V) \to (W,\|\cdot\|_W)$ is the smallest constant $C$ such that
$$\|Tx\|_W \leq C\|x\|_V$$for all $x \in X$. In other words, $\|T\|_{\operatorname{op}}$ is the smallest Lipschitz constant of $T$. The operator norm also equals
$$\sup_{x \in V} \frac{\|Tx\|_W}{\|v\|_V}.$$The collection $\mathscr{L}(V,W)$ of all linear operators $T:(V,\|\cdot\|_V) \to (W,\|\cdot\|_W)$ with finite operator norm is called an operator space. The operator space with pointwise addition
$$(T+S)x = Tx + Sx,$$scalar multiplication
$$(aT)x = aTx,$$and the operator norm $\|\cdot\|_{\operatorname{op}}$ is a normed linear space. If, in addition, $(W,\|\cdot\|_W)$ is a Banach space, then so is $(\mathscr{L}(V,W), \|\cdot\|_{V \to W})$.
3.7.4. Since $\mathbb{F}$ is a Banach space, the operator space $\mathscr{L}(V, \mathbb{F})$ is a Banach space (§3.2.2) for every choice of vector space $V$ over $\mathbb{F}$. The operator space $\mathscr{L}(V, \mathbb{F})$ is called the dual space of $V$ and is denoted by $V^*$. An element of $V^*$ is called a bounded linear functional on $V$.
3.7.5. All operator norms satisfy the norm inequality
$$\|TS\|_{\operatorname{op}} \leq \|T\|_{\operatorname{op}}\|S\|_{\operatorname{op}},$$where $TS$ is the function composition of operators $T$ and $S$. To see this, we observe that
$$\|TSv\| \leq \|T\|_{\operatorname{op}}\|Sv\| \leq \|T\|_{\operatorname{op}}\|S\|_{\operatorname{op}}\|v\|$$for every vector $v$. The norm inequality now follows.
3.8.1. Let $T:\mathbb{R}^n \to \mathbb{R}^m$ be a linear operator. The matrix representation of $T$ is an $m$-by-$n$ matrix
$$A = \begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}$$whose columns are defined by the formula
$$\begin{pmatrix} a_{1i} \\ \vdots \\ a_{mi} \end{pmatrix} = Te^i.$$Here $e^i$ is the $i$th coordinate vector
$$(0,0,\ldots,0,\underbrace{1}_{\mbox{$i$th}},0,\ldots,0)$$of $\mathbb{R}^n$
3.8.2. Given the matrix representation $A$ of a linear operator $T:\mathbb{R}^n \to \mathbb{R}^m$, we can compute $Tv$ for an arbitrary choice of $v \in \mathbb{R}^n$ by taking the matrix-vector product
$$Av = \begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} v^1 \\ \vdots \\ v^n \end{pmatrix} = \begin{pmatrix} a_{11} v^1 + \cdots + a_{1n} v^n \\ \vdots \\ a_{m1} v^1 + \cdots + a_{mn} v^n \end{pmatrix}.$$Similarly, we can compute the composite operator $S \circ T$ of two linear operators $T: \mathbb{R}^n \to \mathbb{R}^m$ and $S:\mathbb{R}^m \to \mathbb{R}^p$ by taking the matrix representations $A = (a_{ij})$ and $B = (b_{ij})$ of $T$ and $S$, respectively and computing their matrix product $BA$, whose entries are given by the formula
$$(BA)_{ij} = b_{i1} a_{1j} + \cdots + b_{in} a_{nj}.$$3.8.3. We remark that there is one-to-one correspondence between matrices and linear operators. We have already seen that every linear operator has a matrix representation (§3.8.1). Conversely, for each $m$-by-$n$ matrix $A$, there is a corresponding linear operator $T_A: \mathbb{R}^n \to \mathbb{R}^m$ defined by the formula
$$T_Av = Av.$$3.8.4. The one-to-one correspondence suggests that we can define a matrix operator norm analogous to the operator norm. Of particular interest are the matrix $p$-norms: for each $p > 1$, we define the $p$-norm of an $m$-by-$n$ matrix $A$ to be the smallest constant $C$ such that
$$\|Av\|_p \leq C\|v\|_p$$for all vectors $v \in \mathbb{R}^n$. The vector $p$-norms are taken to be the Euclidean $p$-norms (§3.3).
If $p = 1$, then
$$\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m |a_{ij}|.$$If $p = \infty$, then
$$\|A\|_\infty = \max_{1 \leq i \leq m} \sum_{j=1}^n |a_{ij}|.$$The matrix 2-norm of $A$ equals the largest singular value of $A$. We discuss the singular value decomposition in §6.
3.9.1. An important matrix norm that is not an operator norm is the Frobenius norm
$$\|A\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2}.$$What we mean by a matrix norm here is a norm on the vector space of all matrices of a fixed size, equipped with entrywise addition and scalar multiplication. We stipulate, in addition, that
$$\|AB\| \leq \|A\|\|B\|$$for all matrices $A$ and $B$. The $p$-norms inherit this property from the norm inequality of operator norms (§3.7.5).
As for the Frobenius norm, we fix an $n$-by-$s$ matrix $A$ and an $s$-by-$m$ matrix $B$ and observe that
$$\|AB\|_F^2 = \sum_{i=1}^n \sum_{j=1}^m |a_{i1}b_{1j} + \cdots +a_{ip}b_{pj}|^2 = \sum_{i=1}^n \sum_{j=1} | a_i \cdot b_j|^2 \leq \sum_{i=1}^n \sum_{j=1} \|a_i\|_2^2\|b_j\|_2^2$$by Hölder's inequality (§3.3.2). It now follows that
$$\|AB\|_F^2 = \sum_{i=1}^n \sum_{j=1} \|a_i\|_2^2\|b_j\|_2^2 = \sum_{i=1}^n (\|a_i\|_2)^2 \sum_{j=1}^m (\|b_j\|_2)^2 = \sum_{i=1}^n \sum_{k=1}^s |a_{ik}|^2 + \sum_{j=1}^m \sum_{k=1}^s |b_{kj}|^2 = \|A\|_F^2\|B\|_F^2,$$whence
$$\|AB\|_F \leq \|A\|_F\|B\|_F.$$3.9.2. Let us derive an alternate computational formula for the Frobenius norm.
The Hermitian transpose of an $m$-by-$n$ matrix $A = (a_{ij})$ is the $n$-by-$m$ matrix $A^* = (\overline{a_{ji}})$. The trace of an $m$-by-$n$ matrix $A$ is the sum of its diagonal entries:
$$\operatorname{tr} A = a_11 + a_{22} + a_{kk},$$where $k = \min(m,n)$.
We now observe that
$$\begin{align*} A^*A &= \begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} \overline{a_{11}} & \cdots & \overline{a_{m1}} \\ \vdots & \ddots & \vdots \\ \overline{a_{1n}} & \cdots & \overline{a_{mn}} \end{pmatrix} \\ &= \begin{pmatrix} a_{11}\overline{a_{11}} + \cdots + a_{1n} \overline{a_{1n}} & \cdots & a_{11}\overline{a_{n1}} + \cdots + a_{1n} \overline{a_{mn}} \\ \vdots & \ddots & \vdots \\ a_{m1}\overline{a_{11}} + \cdots + a_{mn} \overline{a_{1n}} & \cdots & a_{m1}\overline{a_{m1}} + \cdots + a_{mn} \overline{a_{mn}}\end{pmatrix} \end{align*}.$$Therefore,
$$\operatorname{tr}(A^*A) = \sum_{i=1}^m \sum_{j=1}^m a_{ij} \overline{a_{ij}} = \sum_{i=1}^m \sum_{j=1}^m |a_{ij}|^2,$$and so
$$\|A\|_F = \sqrt{\operatorname{tr}(A^*A)}.$$Similarly,
$$\|A\|_F = \sqrt{\operatorname{tr}(AA^*)}.$$3.10.1. We say that a vector space $V$ over $\mathbb{F}$ is finite-dimensional if there exists a finite subset $\{e^1,\ldots,e^n\}$ such that, for each vector $v \in V$, there is precisely one subset $\{a_1,\ldots,a_n\}$ of $\mathbb{F}$ such that
$$v = a_1e^1 + \cdots + a_ne^n.$$The subset $\{e^1,\ldots,e^n\}$ is called a basis of $V$.
3.10.2. The standard example of a finite-dimensional vector space is the Euclidean $n$-space $\mathbb{R}^n$ (§3.1.2) and its complex counterpart, the complex $n$-space
$$\mathbb{C}^n = \{(z^1,\ldots,z^n) : z^1,\ldots,z^n \in \mathbb{C}\}$$equipped with coordinatewise addition and scalar multiplication with respect to $\mathbb{C}$. The set of coordinate vectors (§3.8.1) $\{e^1,\ldots,e^n\}$ constitutes a basis in both $\mathbb{R}^n$ and $\mathbb{C}^n$.
It is also possible to view $\mathbb{C}^n$ as a vector space over $\mathbb{R}$ by taking $\mathbb{C}^n$ to be the set of $2n$-tuples
$$(x^1,y^1, x^2,y^2,\ldots,x^n,y^n)$$of real numbers $x^1,y^1,\ldots,x^n,y^n$. We, of course, understand $(x^j,y^j)$ to mean $z^j = x^j + i y^j$.
3.10.3. We have already discussed the $p$-norms on $\mathbb{R}^n$ (§3.3.1). The same norms can be defined on $\mathbb{C}^n$ as well, with the absolute value function with the complex abasolute value function. We shall show that all other norms on $\mathbb{R}^n$ and $\mathbb{C}^n$ are equivalent to the $p$-norms in a precise sense.
3.10.4. We say that two norms $\|\cdot\|_a$ and $\|\cdot\|_b$ on a vector space $V$ are equivalent if there are constants $c > 0$ and $C > 0$ such that
$$c\|v\|_a \leq \|v\|_b \leq C\|v\|_a$$for all $v \in V$. The equivalence of norms is transitive: if $\|\cdot\|_a$ and $\|\cdot\|_b$ are equivalent, if $\|\cdot\|_b$ and $\|\cdot\|_c$ are equivalent, then $\|\cdot\|_a$ and $\|\cdot\|_c$ are equivalent.
We shall show that all norms on $\mathbb{R}^n$ are equivalent. By transitivity, it suffices to show that an arbitrary norm $\|\cdot\|$ on $\mathbb{R}^n$ is equivalent to $\|\cdot\|_2$, the 2-norm (§3.3.1).
3.10.5. We need to establish a preliminary lemma: the norm function $\|\cdot\|:\mathbb{R}^n \to \mathbb{R}$ is continuous (§2.2.1) with respect to the metric induced by the 2-norm (§3.2.2) and the metric induced by the absolute value metric on $\mathbb{R}$. This is equivalent to the statement that, for each $\varepsilon > 0$, there exists a $\delta > 0$ such that
$$\left\vert\|x\| - \|y\|\right\vert < \varepsilon$$whenever $\|x-y\|_2 < \delta$.
To show this, we first bound $\vert\|x\| - \|y\|\vert$ by the reverse triangle inequality:
$$\left\vert\|x\| - \|y\|\right\vert \leq \|x-y\|.$$We now take the basis expansions $x = x_1e^1 + \cdots + x_ne^n$ and $y = y_1e^1 + \cdots + y_ne^n$ with respect to the basis of coordinate vectors $\{e^1,\ldots,e^n\}$ (§3.10.2). It then follows from the triangle inequality (§3.2.1) that
$$\|x - y\| = \left\| \sum_{i=1}^n (x_i - y_i) e^n \right\| \leq \sum_{i=1}^n \|(x_i - y_i) e^i\|$$Since
$$\|(x_i-y_i)e^i\| = \vert x_i-y_i \vert\|e^i\| = \vert x_i-y_i \vert$$for each $1 \leq i \leq n$, we see that
$$\left\vert\|x\|-\|y\|\right\vert \leq \sum_{i=1}^n \|(x_i-y_i)e^i\| = \sum_{i=1}^n \vert x_i-y_i \vert = \|x-y\|_1,$$the 1-norm of $x-y$ (§3.3.1).
We now set $v = x-y$ and $w = (1,1,\ldots,1)$ and invoke Hölder's inequality (§3.3.2) with $p=q=2$ to conclude that
$$\left\vert\|x\|-\|y\|\right\vert \leq \|x-y\|_1 = \|v \bullet w\|_1 \leq \|v\|_2 \|w\|_2 = \sqrt{n} \|x-y\|_2.$$Setting $\delta = \frac{\varepsilon}{\sqrt{n}}$, we now see that $\|x-y\| < \delta$ implies
$$\left\vert \|x\| - \|y\| \right\vert \leq \sqrt{n}\|x-y\|_2 < \sqrt{n}\delta = \varepsilon,$$as was to be shown. We conclude that $\|\cdot\|:\mathbb{R}^n \to \mathbb{R}$ is continuous with respect to the 2-norm.
3.10.6. We now show that $\|\cdot\|$ is equivalent to $\|\cdot\|_2$ on $\mathbb{R}^n$.
$K = \{x \in \mathbb{R}^n : \|x\|_2 = 1\}$ is closed and bounded, and so the Heine–Borel theorem implies that $K$ is compact (§2.4.3). $\|\cdot\|$ is continuous with respect to $\|\cdot\|_2$, and so the extreme value theorem (§2.4.4) implies that $\|\cdot\|$ achieves its maximum and minimum on $K$. We set
$$c = \min_{x \in K} \|x\| \hspace{1em}\mbox{and}\hspace{1em} C = \max_{x \in K} \|x\|.$$This, in particular, implies that
$$c\|x\|_2 \leq \|x\| \leq C\|x\|_2$$for all $x \in K$.
We now extend the norm equivalence inequality to all choices of $x \in \mathbb{R}^n$. If $x = 0_{\mathbb{R}^n}$, the inequality holds trivially. If $x \neq 0_{\mathbb{R}^n}$ and $\|x\|_2 \neq 1$, then we deifne $x' = \|x\|_2^{-1} x$, so that $\|x'\|_2 = 1$. By what we have shown above,
$$c\|x'\|_2 \leq \|x'\| \leq C\|x'\|_2.$$Now, $\|x'\|_2 = \|x\|_2^{-1}\|x\|_2$ and $\|x'\| = \|x\|_2^{-1}\|x\|$, and so the above inequality is equivalent to
$$c\|x\|_2 \leq \|x\| \leq C\|x\|_2,$$as desired.
This concludes the proof of the equivalence of norms on $\mathbb{R}^n$.
3.10.7. For $\mathbb{C}^n$, we can simply carry out the argument for $\mathbb{R}^{2n}$.
As for an arbitrary vector space $V$ with a basis $\{v^1,\ldots,v^n\}$, we fix two norms $\|\cdot\|_a$ and $\|\cdot\|_b$ on $V$ and define a function (§3.7.1) $T:V \to \mathbb{R}^n$ by setting
$$T(a_1v^1 + \cdots +a_nv^n) = a_1e^1 + \cdots + a_ne^n.$$$T$ is a bijective linear operator, whose inverse $T^{-1}:\mathbb{R}^n \to V$ is also a linear operator.
We now define norms $\|\cdot\|_{a'}$ and $\|\cdot\|_{b'}$ on $\mathbb{R}^n$ by setting
$$\|x\|_{a'} = \|T^{-1}x\|_a \hspace{1em}\mbox{and}\hspace{1em} \|y\|_{b'} = \|T^{-1}y\|_b$$for all $x,y \in \mathbb{R}^n$. Since all norms on $\mathbb{R}^n$ are equivalent, we can find constants $c > 0$ and $C > 0$ such that
$$c\|x\|_{a'} \leq \|x\|_{b'} \leq C\|x\|_{a'}$$for all $x \in \mathbb{R}^n$.
It follows that, for each $v \in V$,
$$c\|v\|_a = c\|Tv\|_{a'} \leq \|Tv\|_{b'} = \|v\|_b$$and that
$$\|v\|_b = \|Tv\|_{b'} \leq C\|Tv\|_{a'} = C\|v\|_a.$$We conclude that
$$c\|v\|_a \leq \|v\|_b \leq C\|v\|_a$$for all choices of $v \in V$, whence $\|\cdot\|_a$ and $\|\cdot\|_b$ are equivalent.