In [1]:
# Copyright (c) Thalesians Ltd, 2018-2019. All rights reserved
# Copyright (c) Paul Alexander Bilokon, 2018-2019. All rights reserved
# Author: Paul Alexander Bilokon <paul@thalesians.com>
# Version: 3.0 (2019.05.27)
# Previous versions: 1.0 (2018.08.03), 2.0 (2019.04.19)
# Email: education@thalesians.com
# Platform: Tested on Windows 10 with Python 3.6

Linear algebra — part i: vector spaces

Motivation

In data science, machine learning (ML), and artificial intelligence (AI), we usually deal not with single numbers but with multivariate (i.e. containing multiple elements or entries) lists of numbers — mathematically speaking, vectors, — and multivariate tables of numbers — mathematically speaking, matrices. Therefore we solve multivariate equations, apply multivariate calculus to find optima of multivariate functions, etc.

The branch of mathematics that studies vectors, matrices, and related mathematical objects is called linear algebra. It is one of the most practically useful areas of mathematics in applied work and a prerequisite for data science, machine learning (ML), and artificial intelligence (AI).

Objectives

  • To consider numbers as examples of mathematical objects.
  • To introduce a different kind of mathematical object — vector — first in two dimensions.
  • To demonstrate the importance of two-dimensional vectors in data science.
  • To introduce vector arithmetics: vector addition and multiplication by scalars.
  • To show how vectors and vector arithmetics can be implemented in Python.
  • To introduce the vector norm and relate it to the length of a vector.
  • To introduce the inner product and relate it to the angle between two vectors.
  • To consider vectors in three dimensions.
  • To show how vectors can be generalized to four- and higher-dimensional spaces.
  • To demonstrate the importance of higher-dimensional vectors in data science.
  • To consider vector spaces in general (i.e. not just the Euclidean vector spaces).
  • To show that functions also form a vector space.
  • To introduce linear combinations and examine the notions of linear (in)dependence, span, and basis.
  • To introduce subspaces.
  • To explain how one can obtain the equation of a (hyper)plane.

In [2]:
%matplotlib inline

In [3]:
import math
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

Numbers

In everyday life, we are used to doing arithmetics with numbers, such as


In [4]:
5 + 3


Out[4]:
8

and


In [5]:
10 * 5


Out[5]:
50

The numbers 5 and 3 are mathematical objects.

Indeed, when we think about mathematics, we probably think of numbers as the fundamental objects of study. Numbers used for counting, namely $1, 2, 3, 4, 5, \ldots$ (and so on) are called natural numbers. We say that they belong to the set (i.e. a collection of objects) of natural numbers, $\mathbb{N}$, and write $$3 \in \mathbb{N}$$ to indicate that, for example, 3 belongs to this set.

Not all numbers are quite as straightforward (quite as natural). For example, the number zero wasn't invented (discovered?) until much later than the natural numbers. We sometimes write $\{0\} \cup \mathbb{N}$ to denote the set containing precisely the natural numbers along with 0. That is, the set that is the union (denoted by $\cup$) of the set of natural numbers $\mathbb{N}$ and the singleton set (i.e. a set containing exactly one element) $\{0\}$. (In mathematics we often use curly brackets to define sets by enumerating their elements.)

In mathematics we often use curly brackets to define sets by enumerating their elements, as we did in the case of $\{0\}$. While the notation $\mathbb{N}$ is standard for the set of natural numbers, we could, using the curly bracket notation, write out this set as $$\mathbb{N} = \{1, 2, 3, 4, 5, \ldots\}.$$

Then, there are the negative numbers, $\ldots, -5, -4, -3, -2, -1$. These, together with 0 and the natural numbers are collectively referred to as integers, or are said to belong to the set of integers, denoted $\mathbb{Z}$: $$\mathbb{Z} = \{\ldots, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, \ldots\}.$$

Since every element of $\mathbb{N}$ is in $\mathbb{Z}$, we say that $\mathbb{N}$ is a subset of $\mathbb{Z}$, and write $$\mathbb{N} \subseteq \mathbb{Z}.$$ Two sets $A$ and $B$ are said to be equal if $A \subseteq B$ and also $B \subseteq A$, in other words, if $A$ and $B$ contain exactly the same elements. In this case we write $$A = B.$$

The negative number $-3$ is sometimes referred to as the additive inverse of $3$, because adding it to $3$ yields zero:


In [6]:
3 + (-3)


Out[6]:
0

There are other, somewhat unnatural numbers, such as the multiplicative inverse of 3, $\frac{1}{3}$. When multiplied by its multiplicative inverse, a number yields not zero, but one, the identity or unit:


In [7]:
3 * (1 / 3)


Out[7]:
1.0

The fractions, such as $\frac{1}{3}$, $\pi$, $e$, along with the integers, form a set of real numbers, $\mathbb{R}$. Clearly, both $\mathbb{N}$ and $\mathbb{Z}$ are subsets of $\mathbb{R}$: $$\mathbb{N} \subseteq \mathbb{Z} \subseteq \mathbb{R}.$$

Real numbers obey certain rules (in mathematics we say axioms) of arithmetics, e.g. multiplication is distributive over addition:


In [8]:
3 * (0.5 + 100) == 3 * 0.5 + 3 * 100


Out[8]:
True

To find out more about these rules, read Harold Davenport's book The Higher Arithmetic: https://www.amazon.co.uk/Higher-Arithmetic-Introduction-Theory-Numbers/dp/0521722365

Vectors in two dimensions

One can think of other kinds of mathematical objects. They may or may not be composed of numbers.

In order to specify the location of a point on a two-dimensional plane you need a mathematical object composed of two different numbers: the $x$- and $y$-coordinates. Such a point may be given by a single mathematical object, $\mathbf{v} = \begin{pmatrix} 3 \\ 5 \end{pmatrix}$, where we understand that the first number specifies the $x$-coordinate, while the second the $y$-coordinate.

When we were defining sets, the order didn't matter. Moreover, the multiplicity of elements in a set is ignored. Thus $$\{\text{Newton}, \text{Leibnitz}\}$$ is exactly the same set as $$\{\text{Leibnitz}, \text{Newton}\}$$ and $$\{\text{Newton}, \text{Leibnitz}, \text{Newton}\}.$$ There are exactly two elements in this set (we only count the distinct elements), Leibnitz and Newton.

Not so with points: $$\begin{pmatrix} 3 \\ 5 \end{pmatrix}$$ is distinct from $$\begin{pmatrix} 5 \\ 3 \end{pmatrix}$$ and both of these are distinct from $$\begin{pmatrix} 3 \\ 5 \\ 3 \end{pmatrix}.$$ The order of the elements and their multiplicity matter for points.

We can visualize the point $\mathbf{v} = \begin{pmatrix} 3 \\ 5 \end{pmatrix}$ by means of a plot:


In [9]:
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.plot(3, 5, 'o', label='$\mathbf{v}$')
plt.axis([-5.5, 5.5, -5.5, 5.5])
plt.xticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.yticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');


It may be useful to think of this object, $\begin{pmatrix} 3 \\ 5 \end{pmatrix}$, which we shall call a vector, as displacement from the origin, $\begin{pmatrix} 0 \\ 0 \end{pmatrix}$. We can then read $\begin{pmatrix} 3 \\ 5 \end{pmatrix}$ as "go to the right (of the origin) by three units, and then go up (from the origin) by five units". Therefore vectors may be visualized as arrows, specifying a direction, as well as as points.


In [10]:
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, 3, 5, shape='full', head_width=1, length_includes_head=True)
plt.axis([-5.5, 5.5, -5.5, 5.5])
plt.xticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.yticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');


Two-dimensional vectors are said to belong to a set called the Euclidean 2-plane, denoted $\mathbb{R}^2$. The set of real numbers, $\mathbb{R}$, is sometimes referred to as the Euclidean real line, and may also be written as $\mathbb{R}^1$ (although this is rarely done in practice).

An application: two-dimensional vectors in data science

In data science, we often think of the $x$-coordinate as the input variable and the $y$-coordinate as the output variable. Consider, for example the "diabetes dataset" from Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression", Annals of Statistics (with discussion), 407-499: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html


In [11]:
from sklearn import datasets
dataset = datasets.load_diabetes()

In this dataset, "Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of $n = 442$ diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline."

Let us consider points (vectors) where the $x$-coordinate represents the body mass index (input) and the $y$-coordinate the aforementioned "quantitative measure of disease progression" (output) corresponding to the input.


In [12]:
dataset_x = dataset.data[:, 2]
dataset_y = dataset.target

In data science our goal is to find the relationship between the output and the input. If such a relationship exists, we may be able to explain or predict the output by means of the input.

One such input-output point is


In [13]:
(dataset_x[0], dataset_y[0])


Out[13]:
(0.061696206518688498, 151.0)

another


In [14]:
(dataset_x[1], dataset_y[1])


Out[14]:
(-0.051474061238806101, 75.0)

And so on. There are


In [15]:
len(dataset_x)


Out[15]:
442

such points in total, the last one being


In [16]:
(dataset_x[len(dataset_x)-1], dataset_y[len(dataset_x)-1])


Out[16]:
(-0.073030302716424106, 57.0)

We can visualize these points by plotting them on an $xy$-plane, just as we visualized vectors as points before:


In [17]:
plt.plot(dataset_x[0], dataset_y[0], 'o', label='first point')
plt.plot(dataset_x[1], dataset_y[1], 'o', label='second point')
plt.plot(dataset_x[len(dataset_x)-1], dataset_y[len(dataset_x)-1], 'o', label='last point')
plt.legend();


To get a better idea of the relationship between the input and output, let us plot all available points, just as we plotted the three points above. The result is a scatter plot:


In [18]:
plt.plot(dataset_x, dataset_y, 'o')
plt.xlabel('body mass index')
plt.ylabel('disease progression');


The scatter plot shows that the points follow a certain pattern. In particular, the disease progression ($y$-coordinate, output) increases with the patient's body mass index ($x$-coordinate, input). Visualization by means of the scatter plot has helped us spot this relationship.

But the point we are really trying to make here is that vectors are extremely useful in data science. Before we could produce a scatter plot, we had to start thinking of each input-output pair as a vector.

Vector arithmietics

Would it make sense to define addition for vectors? And if it would, how would we define it? Thinking of vectors as displacements gives us a clue: the sum of two vectors, $\mathbf{u}$ and $\mathbf{v}$, could be defined by "go in the direction specified by $\mathbf{u}$, then in the direction specified by $\mathbf{v}$".

If, for example, $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, then their sum would be obtained as follows:

  • Start at the origin.
  • Move in the direction specified by $\mathbf{u}$: "go to the right by five units, and then go up by three units".
  • Then move in the direction specified by $\mathbf{v}$: "go to the right by four units, and then go up by six units".

The end result?


In [19]:
u = np.array([5, 3])
v = np.array([4, 6])
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


Geometrically, we have appended the arrow representing the vector $\mathbf{v}$ to the end of the arrow representing the vector $\mathbf{u}$ drawn starting at the origin.

What if we started at the origin, went in the direction specified by $\mathbf{v}$ and then went in the direction specified by $\mathbf{u}$? Where would we end up?


In [20]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


We would end up in the same place. More generally, for any vectors $\mathbf{u}$ and $\mathbf{v}$, vector addition is commutative, in other words, $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$.


In [21]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


The sum $\mathbf{u} + \mathbf{v}$ (which, of course, is equal to $\mathbf{v} + \mathbf{u}$ since vector addition is commutative) is itself a vector, which is represented by the diagonal of the parallelogram formed by the arrows above.


In [22]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.arrow(0, 0, u[0] + v[0], u[1] + v[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


We observe that the sum of $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ is given by adding them elementwise or coordinate-wise: $\mathbf{u} + \mathbf{v} = \begin{pmatrix} 5 + 4 \\ 3 + 6 \end{pmatrix} = \begin{pmatrix} 9 \\ 9 \end{pmatrix}$.

It is indeed unsurprising that vector addition is commutative, since the addition of ordinary numbers is commutative: $$\mathbf{u} + \mathbf{v} = \begin{pmatrix} 5 + 4 \\ 3 + 6 \end{pmatrix} = \begin{pmatrix} 4 + 5 \\ 6 + 3 \end{pmatrix} = \mathbf{v} + \mathbf{u}.$$

Multiplication by scalars

Would it make sense to multiply a vector, such as $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ by a number, say $\alpha = 1.5$ (we'll start referring to ordinary numbers as scalars to distinguish them from vectors)? A natural way to define scalar multiplication of vectors would also be elementwise: $$\alpha \mathbf{u} = 1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} 1.5 \cdot 5 \\ 1.5 \cdot 3 \end{pmatrix} = \begin{pmatrix} 7.5 \\ 4.5 \end{pmatrix}.$$

How can we interpret this geometrically? It turns out that we obtain a vector whose length is $1.5$ times that of $\mathbf{u}$, and whose direction is the same as that of $\mathbf{u}$.


In [23]:
alpha = 1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


What if, instead, we multiplied $\mathbf{u}$ by $\beta = -1.5$? Well, $$\beta \mathbf{u} = -1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} -7.5 \\ -4.5 \end{pmatrix}.$$


In [24]:
alpha, beta = 1.5, -1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.arrow(0, 0, beta * u[0], beta * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\beta \mathbf{u}$', beta * u + (-.5, 1.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


Geometrically, we have obtained a vector whose length is $1.5$ times that of $\mathbf{u}$, and whose direction is the opposite (because $\beta$ is negative) to that of $\mathbf{u}$.

Vectors in Python

In Python we use the NumPy library, which we usually import with


In [25]:
import numpy as np

to represent vectors as NumPy arrays:


In [26]:
u = np.array([3., 5.])
v = np.array([4., 6.])

We can then add vectors:


In [27]:
u + v


Out[27]:
array([  7.,  11.])

And multiply them by scalars:


In [28]:
1.5 * u


Out[28]:
array([ 4.5,  7.5])

In [29]:
-1.5 * u


Out[29]:
array([-4.5, -7.5])

An application: finding the mean datapoint

Let us go back to the diabetes dataset, which we started considering. We saved the $x$-coordinates as


In [30]:
dataset_x


Out[30]:
array([ 0.06169621, -0.05147406,  0.04445121, -0.01159501, -0.03638469,
       -0.04069594, -0.04716281, -0.00189471,  0.06169621,  0.03906215,
       -0.08380842,  0.01750591, -0.02884001, -0.00189471, -0.02560657,
       -0.01806189,  0.04229559,  0.01211685, -0.0105172 , -0.01806189,
       -0.05686312, -0.02237314, -0.00405033,  0.06061839,  0.03582872,
       -0.01267283, -0.07734155,  0.05954058, -0.02129532, -0.00620595,
        0.04445121, -0.06548562,  0.12528712, -0.05039625, -0.06332999,
       -0.03099563,  0.02289497,  0.01103904,  0.07139652,  0.01427248,
       -0.00836158, -0.06764124, -0.0105172 , -0.02345095,  0.06816308,
       -0.03530688, -0.01159501, -0.0730303 , -0.04177375,  0.01427248,
       -0.00728377,  0.0164281 , -0.00943939, -0.01590626,  0.0250506 ,
       -0.04931844,  0.04121778, -0.06332999, -0.06440781, -0.02560657,
       -0.00405033,  0.00457217, -0.00728377, -0.0374625 , -0.02560657,
       -0.02452876, -0.01806189, -0.01482845, -0.02991782, -0.046085  ,
       -0.06979687,  0.03367309, -0.00405033, -0.02021751,  0.00241654,
       -0.03099563,  0.02828403, -0.03638469, -0.05794093, -0.0374625 ,
        0.01211685, -0.02237314, -0.03530688,  0.00996123, -0.03961813,
        0.07139652, -0.07518593, -0.00620595, -0.04069594, -0.04824063,
       -0.02560657,  0.0519959 ,  0.00457217, -0.06440781, -0.01698407,
       -0.05794093,  0.00996123,  0.08864151, -0.00512814, -0.06440781,
        0.01750591, -0.04500719,  0.02828403,  0.04121778,  0.06492964,
       -0.03207344, -0.07626374,  0.04984027,  0.04552903, -0.00943939,
       -0.03207344,  0.00457217,  0.02073935,  0.01427248,  0.11019775,
        0.00133873,  0.05846277, -0.02129532, -0.0105172 , -0.04716281,
        0.00457217,  0.01750591,  0.08109682,  0.0347509 ,  0.02397278,
       -0.00836158, -0.06117437, -0.00189471, -0.06225218,  0.0164281 ,
        0.09618619, -0.06979687, -0.02129532, -0.05362969,  0.0433734 ,
        0.05630715, -0.0816528 ,  0.04984027,  0.11127556,  0.06169621,
        0.01427248,  0.04768465,  0.01211685,  0.00564998,  0.04660684,
        0.12852056,  0.05954058,  0.09295276,  0.01535029, -0.00512814,
        0.0703187 , -0.00405033, -0.00081689, -0.04392938,  0.02073935,
        0.06061839, -0.0105172 , -0.03315126, -0.06548562,  0.0433734 ,
       -0.06225218,  0.06385183,  0.03043966,  0.07247433, -0.0191397 ,
       -0.06656343, -0.06009656,  0.06924089,  0.05954058, -0.02668438,
       -0.02021751, -0.046085  ,  0.07139652, -0.07949718,  0.00996123,
       -0.03854032,  0.01966154,  0.02720622, -0.00836158, -0.01590626,
        0.00457217, -0.04285156,  0.00564998, -0.03530688,  0.02397278,
       -0.01806189,  0.04229559, -0.0547075 , -0.00297252, -0.06656343,
       -0.01267283, -0.04177375, -0.03099563, -0.00512814, -0.05901875,
        0.0250506 , -0.046085  ,  0.00349435,  0.05415152, -0.04500719,
       -0.05794093, -0.05578531,  0.00133873,  0.03043966,  0.00672779,
        0.04660684,  0.02612841,  0.04552903,  0.04013997, -0.01806189,
        0.01427248,  0.03690653,  0.00349435, -0.07087468, -0.03315126,
        0.09403057,  0.03582872,  0.03151747, -0.06548562, -0.04177375,
       -0.03961813, -0.03854032, -0.02560657, -0.02345095, -0.06656343,
        0.03259528, -0.046085  , -0.02991782, -0.01267283, -0.01590626,
        0.07139652, -0.03099563,  0.00026092,  0.03690653,  0.03906215,
       -0.01482845,  0.00672779, -0.06871905, -0.00943939,  0.01966154,
        0.07462995, -0.00836158, -0.02345095, -0.046085  ,  0.05415152,
       -0.03530688, -0.03207344, -0.0816528 ,  0.04768465,  0.06061839,
        0.05630715,  0.09834182,  0.05954058,  0.03367309,  0.05630715,
       -0.06548562,  0.16085492, -0.05578531, -0.02452876, -0.03638469,
       -0.00836158, -0.04177375,  0.12744274, -0.07734155,  0.02828403,
       -0.02560657, -0.06225218, -0.00081689,  0.08864151, -0.03207344,
        0.03043966,  0.00888341,  0.00672779, -0.02021751, -0.02452876,
       -0.01159501,  0.02612841, -0.05901875, -0.03638469, -0.02452876,
        0.01858372, -0.0902753 , -0.00512814, -0.05255187, -0.02237314,
       -0.02021751, -0.0547075 , -0.00620595, -0.01698407,  0.05522933,
        0.07678558,  0.01858372, -0.02237314,  0.09295276, -0.03099563,
        0.03906215, -0.06117437, -0.00836158, -0.0374625 , -0.01375064,
        0.07355214, -0.02452876,  0.03367309,  0.0347509 , -0.03854032,
       -0.03961813, -0.00189471, -0.03099563, -0.046085  ,  0.00133873,
        0.06492964,  0.04013997, -0.02345095,  0.05307371,  0.04013997,
       -0.02021751,  0.01427248, -0.03422907,  0.00672779,  0.00457217,
        0.03043966,  0.0519959 ,  0.06169621, -0.00728377,  0.00564998,
        0.05415152, -0.00836158,  0.114509  ,  0.06708527, -0.05578531,
        0.03043966, -0.02560657,  0.10480869, -0.00620595, -0.04716281,
       -0.04824063,  0.08540807, -0.01267283, -0.03315126, -0.00728377,
       -0.01375064,  0.05954058,  0.02181716,  0.01858372, -0.01159501,
       -0.00297252,  0.01750591, -0.02991782, -0.02021751, -0.05794093,
        0.06061839, -0.04069594, -0.07195249, -0.05578531,  0.04552903,
       -0.00943939, -0.03315126,  0.04984027, -0.08488624,  0.00564998,
        0.02073935, -0.00728377,  0.10480869, -0.02452876, -0.00620595,
       -0.03854032,  0.13714305,  0.17055523,  0.00241654,  0.03798434,
       -0.05794093, -0.00943939, -0.02345095, -0.0105172 , -0.03422907,
       -0.00297252,  0.06816308,  0.00996123,  0.00241654, -0.03854032,
        0.02612841, -0.08919748,  0.06061839, -0.02884001, -0.02991782,
       -0.0191397 , -0.04069594,  0.01535029, -0.02452876,  0.00133873,
        0.06924089, -0.06979687, -0.02991782, -0.046085  ,  0.01858372,
        0.00133873, -0.03099563, -0.00405033,  0.01535029,  0.02289497,
        0.04552903, -0.04500719, -0.03315126,  0.097264  ,  0.05415152,
        0.12313149, -0.08057499,  0.09295276, -0.05039625, -0.01159501,
       -0.0277622 ,  0.05846277,  0.08540807, -0.00081689,  0.00672779,
        0.00888341,  0.08001901,  0.07139652, -0.02452876, -0.0547075 ,
       -0.03638469,  0.0164281 ,  0.07786339, -0.03961813,  0.01103904,
       -0.04069594, -0.03422907,  0.00564998,  0.08864151, -0.03315126,
       -0.05686312, -0.03099563,  0.05522933, -0.06009656,  0.00133873,
       -0.02345095, -0.07410811,  0.01966154, -0.01590626, -0.01590626,
        0.03906215, -0.0730303 ])

and the $y$-coordinates as


In [31]:
dataset_y


Out[31]:
array([ 151.,   75.,  141.,  206.,  135.,   97.,  138.,   63.,  110.,
        310.,  101.,   69.,  179.,  185.,  118.,  171.,  166.,  144.,
         97.,  168.,   68.,   49.,   68.,  245.,  184.,  202.,  137.,
         85.,  131.,  283.,  129.,   59.,  341.,   87.,   65.,  102.,
        265.,  276.,  252.,   90.,  100.,   55.,   61.,   92.,  259.,
         53.,  190.,  142.,   75.,  142.,  155.,  225.,   59.,  104.,
        182.,  128.,   52.,   37.,  170.,  170.,   61.,  144.,   52.,
        128.,   71.,  163.,  150.,   97.,  160.,  178.,   48.,  270.,
        202.,  111.,   85.,   42.,  170.,  200.,  252.,  113.,  143.,
         51.,   52.,  210.,   65.,  141.,   55.,  134.,   42.,  111.,
         98.,  164.,   48.,   96.,   90.,  162.,  150.,  279.,   92.,
         83.,  128.,  102.,  302.,  198.,   95.,   53.,  134.,  144.,
        232.,   81.,  104.,   59.,  246.,  297.,  258.,  229.,  275.,
        281.,  179.,  200.,  200.,  173.,  180.,   84.,  121.,  161.,
         99.,  109.,  115.,  268.,  274.,  158.,  107.,   83.,  103.,
        272.,   85.,  280.,  336.,  281.,  118.,  317.,  235.,   60.,
        174.,  259.,  178.,  128.,   96.,  126.,  288.,   88.,  292.,
         71.,  197.,  186.,   25.,   84.,   96.,  195.,   53.,  217.,
        172.,  131.,  214.,   59.,   70.,  220.,  268.,  152.,   47.,
         74.,  295.,  101.,  151.,  127.,  237.,  225.,   81.,  151.,
        107.,   64.,  138.,  185.,  265.,  101.,  137.,  143.,  141.,
         79.,  292.,  178.,   91.,  116.,   86.,  122.,   72.,  129.,
        142.,   90.,  158.,   39.,  196.,  222.,  277.,   99.,  196.,
        202.,  155.,   77.,  191.,   70.,   73.,   49.,   65.,  263.,
        248.,  296.,  214.,  185.,   78.,   93.,  252.,  150.,   77.,
        208.,   77.,  108.,  160.,   53.,  220.,  154.,  259.,   90.,
        246.,  124.,   67.,   72.,  257.,  262.,  275.,  177.,   71.,
         47.,  187.,  125.,   78.,   51.,  258.,  215.,  303.,  243.,
         91.,  150.,  310.,  153.,  346.,   63.,   89.,   50.,   39.,
        103.,  308.,  116.,  145.,   74.,   45.,  115.,  264.,   87.,
        202.,  127.,  182.,  241.,   66.,   94.,  283.,   64.,  102.,
        200.,  265.,   94.,  230.,  181.,  156.,  233.,   60.,  219.,
         80.,   68.,  332.,  248.,   84.,  200.,   55.,   85.,   89.,
         31.,  129.,   83.,  275.,   65.,  198.,  236.,  253.,  124.,
         44.,  172.,  114.,  142.,  109.,  180.,  144.,  163.,  147.,
         97.,  220.,  190.,  109.,  191.,  122.,  230.,  242.,  248.,
        249.,  192.,  131.,  237.,   78.,  135.,  244.,  199.,  270.,
        164.,   72.,   96.,  306.,   91.,  214.,   95.,  216.,  263.,
        178.,  113.,  200.,  139.,  139.,   88.,  148.,   88.,  243.,
         71.,   77.,  109.,  272.,   60.,   54.,  221.,   90.,  311.,
        281.,  182.,  321.,   58.,  262.,  206.,  233.,  242.,  123.,
        167.,   63.,  197.,   71.,  168.,  140.,  217.,  121.,  235.,
        245.,   40.,   52.,  104.,  132.,   88.,   69.,  219.,   72.,
        201.,  110.,   51.,  277.,   63.,  118.,   69.,  273.,  258.,
         43.,  198.,  242.,  232.,  175.,   93.,  168.,  275.,  293.,
        281.,   72.,  140.,  189.,  181.,  209.,  136.,  261.,  113.,
        131.,  174.,  257.,   55.,   84.,   42.,  146.,  212.,  233.,
         91.,  111.,  152.,  120.,   67.,  310.,   94.,  183.,   66.,
        173.,   72.,   49.,   64.,   48.,  178.,  104.,  132.,  220.,   57.])

We then visualized the data points using a scatter plot:


In [32]:
plt.plot(dataset_x, dataset_y, 'o')
plt.xlabel('body mass index')
plt.ylabel('disease progression');


We can use Python's list comprehensions to obtain a list of data points:


In [33]:
data_points = [np.array([[x], [y]]) for x, y in zip(dataset_x, dataset_y)]

In [34]:
data_points[0]


Out[34]:
array([[  6.16962065e-02],
       [  1.51000000e+02]])

Then use vector arithmetics to find the


In [35]:
data_points_mean = (1. / len(data_points)) * np.sum(data_points, axis=0)

In [36]:
data_points_mean


Out[36]:
array([[ -8.04534920e-16],
       [  1.52133484e+02]])

and add this point to our scatter plot in a different colour:


In [37]:
plt.plot([dp[0] for dp in data_points], [dp[1] for dp in data_points], 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label="data points' mean")
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();


Direction from one vector to another

We have already seen that a vector, say $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$, can be thought of as a direction from the origin:


In [38]:
u = np.array([5, 3])

In [39]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], shape='full', head_width=1, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');


Suppose that we have another vector, say $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$:


In [40]:
v = np.array([4, 6])

In [41]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');


Can we work out the direction from $\mathbf{u}$ to $\mathbf{v}$?

Mathematically, this direction is given by the vector $$\mathbf{d} = \mathbf{v} - \mathbf{u},$$


In [42]:
v = np.array([4, 6])
d = v - u

In [43]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(0, 0, d[0], d[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate('$\mathbf{d}$', v - u + (-.5, -2.))
plt.arrow(u[0], u[1], d[0], d[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate('$\mathbf{d}$', u + v - u + (.75, -2.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');


We could draw $\mathbf{d} = \mathbf{v} - \mathbf{u}$ from the origin, or we could draw it starting at the "arrow tip" of $\mathbf{u}$, in which case it will connect the "arrow tips" of $\mathbf{u}$ and $\mathbf{v}$.

The length of a vector: vector norm

A vector has a length (the size of the arrow) as well as a direction. We have already seen that $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $$\alpha \mathbf{u} = 1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} 1.5 \cdot 5 \\ 1.5 \cdot 3 \end{pmatrix} = \begin{pmatrix} 7.5 \\ 4.5 \end{pmatrix}$$ have the same direction, but $\alpha \mathbf{u}$ is $\alpha$ times longer:


In [44]:
alpha = 1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


How do we obtain the length of a vector? By Pythagoras's theorem, we add up the coordinates and take the square root:


In [45]:
beta = -1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u} = (5, 3)$', u + (-.5, 1.))
plt.arrow(0, 0, u[0], 0, head_width=.75, length_includes_head=True)
plt.annotate(r'$(5, 0)$', np.array([u[0], 0]) + (-.5, -1.))
plt.arrow(u[0], 0, 0, u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$(0, 3)$', u + (.5, -1.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();


The resulting quantity, which is equal to the length of the vector, is called the norm of the vector and is denoted by $$\|\mathbf{u}\| = \sqrt{u_1^2 + u_2^2} = \sqrt{5^2 + 3^2} = \sqrt{34} = 5.8309518... .$$

In NumPy, we can manually compute the length of a vector...


In [46]:
u = np.array([5, 3])
u


Out[46]:
array([5, 3])

In [47]:
np.sqrt(np.sum(u * u))


Out[47]:
5.8309518948453007

...or use the library function np.linalg.norm:


In [48]:
np.linalg.norm(u)


Out[48]:
5.8309518948453007

Distance between vectors

We have already seen that $$\mathbf{d} = \mathbf{v} - \mathbf{u}$$

gives the direction from vector $\mathbf{u}$ to vector $\mathbf{v}$.

The distance between these two vectors is given by $$\|\mathbf{d}\| = \|\mathbf{v} - \mathbf{u}\|.$$

In our example,


In [49]:
np.linalg.norm(v - u)


Out[49]:
3.1622776601683795

An application: a neighbourhood of a point

We have used vector arithmetics to compute the mean of the data points and add it to the scatter plot:


In [50]:
plt.plot([dp[0] for dp in data_points], [dp[1] for dp in data_points], 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label='mean data point')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();


Let us now compute the distance from each data point to data_points_mean:


In [51]:
distances_to_mean = [np.linalg.norm(dp - data_points_mean) for dp in data_points]

It may be interesting to examine the histogram of these distances:


In [52]:
plt.hist(distances_to_mean, bins=20);


In many machine learning algorithms, such as clustering, we end up finding datapoints within a certain fixed distance from — within a neighbourhood of — a data point.

Let us now compute the standard deviation of the distances...


In [53]:
sd_distance_to_mean = np.sqrt(np.var(distances_to_mean))

...and highlight those datapoints which fall within one standard deviation distance from the mean:


In [54]:
points_within_1_sd_from_mean = [dp for dp in data_points if np.linalg.norm(dp - data_points_mean) <= sd_distance_to_mean]

In [55]:
plt.plot(dataset_x, dataset_y, 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label='mean data point')
plt.plot([dp[0] for dp in points_within_1_sd_from_mean], [dp[1] for dp in points_within_1_sd_from_mean], 'yo')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();


We see that, because the units of body mass index are so much less than the units of the quantitative measure of disease progression, the body mass index doesn't contribute much to the distance. This difference in units can confuse some clustering algorithms.

Let us therefore normalize the units, so that, for both variables, they fall within $[0, 1]$:


In [56]:
normalized_data_points = (data_points - np.min(data_points, axis=0)) \
        / (np.max(data_points, axis=0) - np.min(data_points, axis=0))

In [57]:
plt.plot([dp[0] for dp in normalized_data_points], [dp[1] for dp in normalized_data_points], 'o');


Let us obtain the set of points whose distance from the mean falls within one standard deviation for these, normalized, units:


In [58]:
normalized_data_points_mean = np.mean(normalized_data_points, axis=0)
distances_to_mean = [np.linalg.norm(dp - normalized_data_points_mean) for dp in normalized_data_points]
sd_distance_to_mean = np.sqrt(np.var(distances_to_mean))

In [59]:
points_within_1_sd_from_mean = [
    dp for dp in normalized_data_points \
    if np.linalg.norm(dp - normalized_data_points_mean) <= sd_distance_to_mean]

In [60]:
plt.plot([dp[0] for dp in normalized_data_points], [dp[1] for dp in normalized_data_points], 'o')
plt.plot(normalized_data_points_mean[0], normalized_data_points_mean[1], 'ro', label='mean data point')
plt.plot([dp[0] for dp in points_within_1_sd_from_mean], [dp[1] for dp in points_within_1_sd_from_mean], 'yo')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();


The inner product, the angle between two vectors

The inner product or dot product of two vectors is the sum of products of their respective coordinates: $$\langle \mathbf{u}, \mathbf{v} \rangle = u_1 \cdot v_1 + u_2 \cdot v_2.$$

In particular, for $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, it is given by $$\langle \mathbf{u}, \mathbf{v} \rangle = 5 \cdot 4 + 3 \cdot 6 = 38.$$

We can check our calculations using Python:


In [61]:
np.dot(u, v)


Out[61]:
38

Geometrically speaking, the inner product, when appropriately normalized, gives the cosine of the angle in radians between two vectors, $\theta$.

To be more precise, $$\cos \theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\| \mathbf{u} \| \| \mathbf{v} \|}.$$

Thus the angle between $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ is given by


In [62]:
angle = np.arccos(np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)))
angle


Out[62]:
0.44237422297674489

in radians, or


In [63]:
angle / np.pi * 180.


Out[63]:
25.34617594194669

in degrees.

We can visually verify that this is indeed true:


In [64]:
u = np.array([5, 3])
v = np.array([4, 6])
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .75))
plt.xlim(-10, 10)
plt.ylim(-10, 10);


Note also that $$\| \mathbf{u} \| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}.$$

Two vectors $\mathbf{u}$ and $\mathbf{v}$ are said to be orthogonal or perpendicular if the angle between them is 90 degrees ($\frac{\pi}{2}$ radians). Since $\cos \frac{\pi}{2} = 0$, this is equivalent to saying $$\langle \mathbf{u}, \mathbf{v} \rangle = 0.$$

Consider, for example, $\mathbf{u}$ and $\mathbf{w} = \begin{pmatrix} 1 \\ -\frac{5}{3} \end{pmatrix}$,


In [65]:
u = np.array([5, 3])
w = np.array([1, -5./3.])

In [66]:
np.dot(u, w)


Out[66]:
0.0

In [67]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, w[0], w[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{w}$', w + (-.5, -.75))
plt.xlim(-10, 10)
plt.ylim(-10, 10);


Notice that the inner product is commutative, $$\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle.$$

Furthermore, if $\alpha$ is a scalar, then $$\langle \alpha \mathbf{u}, \mathbf{v} \rangle = \alpha \langle \mathbf{u}, \mathbf{v} \rangle,$$ and $$\langle \mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = \langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle;$$ these two properties together are referred to as linearity in the first argument.

The inner product is positive-definite. In other words, for all vectors $\mathbf{u}$, $$\langle \mathbf{u}, \mathbf{u} \rangle \geq 0,$$ and $$\langle \mathbf{u}, \mathbf{u} \rangle = 0$$ if and only if $\mathbf{u}$ is the zero vector, $\mathbf{0}$, i.e. the vector whose elements are all zero.

Vectors in three dimensions

So far, we have considered vectors that have two coordinates each, corresponding to coordinates on the two-dimensional plane, $\mathbb{R}^2$. Instead, we could consider three-dimensional vectors, such as $\mathbf{a} = \begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix}$ and $\mathbf{b} = \begin{pmatrix} 4 \\ 6 \\ 4 \end{pmatrix}$:


In [68]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlim((-10, 10))
ax.set_ylim((-10, 10))
ax.set_zlim((-10, 10))
ax.quiver(0, 0, 0, 3, 5, 7, color='blue', label='$\mathbf{a}$')
ax.quiver(0, 0, 0, 4, 6, 4, color='green', label='$\mathbf{b}$')
ax.plot([0], [0], [0], 'o', markerfacecolor='none', label='origin')
ax.legend();


In the three-dimensional case, vector addition and multiplication by scalars are defined elementwise, as before:


In [69]:
a = np.array((3., 5., 7.))
b = np.array((4., 6., 4.))
a + b


Out[69]:
array([  7.,  11.,  11.])

In [70]:
alpha


Out[70]:
1.5

In [71]:
alpha * a


Out[71]:
array([  4.5,   7.5,  10.5])

In [72]:
beta = -alpha
beta * a


Out[72]:
array([ -4.5,  -7.5, -10.5])

In [73]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlim((-10, 10))
ax.set_ylim((-10, 10))
ax.set_zlim((-10, 10))
ax.quiver(0, 0, 0, a[0], a[1], a[2], color='blue', label='$\mathbf{a}$')
ax.quiver(0, 0, 0, -alpha * a[0], -alpha * a[1], -alpha * a[2], color='green', label='$-1.5 \mathbf{a}$')
ax.plot([0], [0], [0], 'o', markerfacecolor='none', label='origin')
ax.legend();


Vectors in higher dimensions

We needn't restrict ourselves to three-dimensional vectors. We could easily define $\mathbf{c} = \begin{pmatrix} 4 \\ 7 \\ 8 \\ 2 \end{pmatrix}$ and $\mathbf{d} = \begin{pmatrix} -12 \\ 3 \\ 7 \\ 3 \end{pmatrix}$, and do arithmetics elementwise, as before:


In [74]:
c = np.array((4, 7, 8, 2))
d = np.array((-12, 3, 7, 3))
c + d


Out[74]:
array([-8, 10, 15,  5])

In [75]:
alpha * c


Out[75]:
array([  6. ,  10.5,  12. ,   3. ])

We wouldn't be able to visualize four-dimensional vectors. We can nonetheless gain some geometric intuition by "pretending" that we deal with familiar two- and three-dimensional spaces.

Notice that it would only make sense to talk about adding the vectors $\mathbf{u}$ and $\mathbf{v}$ if they have the same number of elements.

In general, we talk about the vector space of two-dimensional vectors, $\mathbb{R}^2$, the vector space of three-dimensional vectors, $\mathbb{R}^3$, the vector space of four-dimensional vectors, $\mathbb{R}^4$, etc. and write $$\begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix} \in \mathbb{R}^3$$ meaning that the vector $\begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix}$ is an element of $\mathbb{R}^3$. It makes sense to talk about the addition of two vectors if they belong to the same vector space.

Application: higher-dimensional vectors in data science

In data science, we usually deal with tables of observations. Such as this table from another dataset, the real estate valuation dataset from the paper

Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271

which can be found on https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set


In [76]:
import pandas as pd
df = pd.DataFrame({
    'transaction date': [2012.917, 2012.917, 2013.583, 2013.500, 2012.833], 'house age': [32.0, 19.5, 13.3, 13.3, 5.0],
    'distance to the nearest MRT station': [84.87882, 306.59470, 561.98450, 561.98450, 390.56840],
    'number of convenience stores': [10, 9, 5, 5, 5],
    'latitude': [24.98298, 24.98034, 24.98746, 24.98746, 24.97937],
    'longitude': [121.54024, 121.53951, 121.54391, 121.54391, 121.54245],
    'house price per unit area': [37.9, 42.2, 47.3, 54.8, 43.1]
}, columns=[
    'transaction date', 'house age', 'distance to the nearest MRT station', 'number of convenience stores',
    'latitude', 'longitude', 'house price per unit area'
])
df


Out[76]:
transaction date house age distance to the nearest MRT station number of convenience stores latitude longitude house price per unit area
0 2012.917 32.0 84.87882 10 24.98298 121.54024 37.9
1 2012.917 19.5 306.59470 9 24.98034 121.53951 42.2
2 2013.583 13.3 561.98450 5 24.98746 121.54391 47.3
3 2013.500 13.3 561.98450 5 24.98746 121.54391 54.8
4 2012.833 5.0 390.56840 5 24.97937 121.54245 43.1

Vectors are a natural way to represent table columns. In particular, if our goal is to predict (well, explain) house price per unit area using the other columns, we define the required outputs as a vector $$\begin{pmatrix} 37.9 \\ 42.2 \\ 47.3 \\ 54.8 \\ 43.1 \end{pmatrix}.$$

Or, in NumPy,


In [77]:
house_price = np.array([37.9, 42.2, 47.3, 54.8, 43.1])
house_price


Out[77]:
array([ 37.9,  42.2,  47.3,  54.8,  43.1])

Thus NumPy is one of the most commonly useful Python libraries, a workhorse underlying the work of many other libraries, such as Pandas.

Machine learning algorithms, such as linear regression, can then operate on this object to give us the desired results.

Application: dot-product and correlation

In data science we often talk about the concept of correlation. (If you are not sure about what this is, please look it up.)

Let us generate some correlated normal (Gaussian) data.


In [78]:
means = [3., 50.]
stds = [3., 7.]
corr = 0.75
covs = [[stds[0]**2          , stds[0]*stds[1]*corr], 
        [stds[0]*stds[1]*corr,           stds[1]**2]] 
data = np.random.multivariate_normal(means, covs, 50000).T
plt.scatter(data[0], data[1]);


Geometrically, correlation corresponds to the cosine of the angle between the centred (i.e. adjusted by the mean) data vectors:


In [79]:
centred_data = np.empty(np.shape(data))
centred_data[0,:] = data[0,:] - np.mean(data[0,:])
centred_data[1,:] = data[1,:] - np.mean(data[1,:])

In [80]:
np.dot(centred_data[0,:], centred_data[1,:]) / (np.linalg.norm(centred_data[0,:]) * np.linalg.norm(centred_data[1,:]))


Out[80]:
0.74805435092098194

Vectors in general: vector spaces

Mathematicians like abstraction. Indeed, much of the power of mathematics is in abstraction. The notions of a vector and vector space can be further generalized as follows.

Formally, a vector space (or linear space) over a field) $F$ (such as real numbers, $\mathbb{R}$) is a set $V$ together with two operations that satisfy the following eight axioms, the first four axioms stipulate the properties of vector addition alone, whereas the last four involve scalar multiplication:

  • A1: Associativity of addition: $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$.
  • A2: Commutativity of addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ for all $\mathbf{u}, \mathbf{v} \in V$.
  • A3: Identity element of addition: there exists an element $\mathbf{0} \in V$, called the zero vector, such that $\mathbf{0} + \mathbf{v} = \mathbf{v}$ for all $\mathbf{v} \in V$.
  • A4: Inverse elements of addition: for each $\mathbf{v} \in V$, there exists its additive inverse $-\mathbf{v} \in V$, such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$.
  • S1: Distributivity of scalar multiplication over vector addition: $\alpha(\mathbf{u} + \mathbf{v}) = \alpha \mathbf{u} + \alpha \mathbf{v}$ for all $\mathbf{u}, \mathbf{v} \in V$, $\alpha \in F$.
  • S2: Distributivity of scalar multiplication over field addition: $(\alpha + \beta)\mathbf{v} = \alpha \mathbf{v} + \beta \mathbf{v}$ for all $\alpha, \beta \in F$, $\mathbf{v} \in V$.
  • S3: Compatibility of scalar multiplication with field multiplication: $\alpha (\beta \mathbf{v}) = (\alpha \beta) \mathbf{v}$ for all $\alpha, \beta \in F$, $\mathbf{v} \in V$.
  • S4: Identity element of scalar multiplication, preservation of scale: $1 \mathbf{v} = \mathbf{v}$ for the multiplicative identity $1 \in F$, and all $\mathbf{v} \in V$.

The sets $\mathbb{R}^2$, $\mathbb{R}^3$, $\mathbb{R}^4$, are all vector spaces and the special vectors, whose elements are all zeros, $$\begin{pmatrix} 0 \\ 0 \end{pmatrix} \in \mathbb{R}^2, \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \in \mathbb{R}^3, \ldots$$ are their corresponding zero vectors.

Let us use NumPy to verify that, for example, $\mathbb{R}^3$ adheres to the above axioms:


In [81]:
u = np.array((3., 5., 7.))
v = np.array((4., 6., 4.))
w = np.array((-3., -3., 10.))

Let us check A1 (associativity of addition):


In [82]:
(u + v) + w


Out[82]:
array([  4.,   8.,  21.])

In [83]:
u + (v + w)


Out[83]:
array([  4.,   8.,  21.])

In [84]:
(u + v) + w == u + (v + w)


Out[84]:
array([ True,  True,  True], dtype=bool)

In [85]:
np.all((u + v) + w == u + (v + w))


Out[85]:
True

Let's verify A2 (commutativity of addition):


In [86]:
np.all(u + v == v + u)


Out[86]:
True

Now let's A3 (identity element of addition):


In [87]:
zero = np.zeros(3)
zero


Out[87]:
array([ 0.,  0.,  0.])

In [88]:
np.all(zero + v == v)


Out[88]:
True

And A4 (inverse elements of addition):


In [89]:
np.all(np.array(v + (-v) == zero))


Out[89]:
True

Let's confirm S1 (distributivity of scalar multiplication over vector addition):


In [90]:
alpha = -5.
beta = 7.

In [91]:
np.all(alpha * (u + v) == alpha * u + alpha * v)


Out[91]:
True

S2 (distributivity of scalar multiplication over field addition):


In [92]:
np.all((alpha + beta) * v == alpha * v + beta * v)


Out[92]:
True

S3 (compatibility of scalar multiplication with field multiplication):


In [93]:
np.all(alpha * (beta * v) == (alpha * beta) * v)


Out[93]:
True

Finally, let's confirm S4 (identity element of scalar multiplication):


In [94]:
np.all(1 * v == v)


Out[94]:
True

The vector space of functions

There are some more unexpected vector spaces, such as the vector space of functions. Consider functions from real numbers to real numbers, $f: \mathbb{R} \rightarrow \mathbb{R}$, $g: \mathbb{R} \rightarrow \mathbb{R}$. We can define the sum of these functions as another function, $$(f + g): \mathbb{R} \rightarrow \mathbb{R},$$ such that it maps its argument $x$ to the sum of $f(x)$ and $g(x)$: $$f + g: x \mapsto f(x) + g(x).$$ We can similarly define the product of a function $f$ with a scalar $\alpha \in \mathbb{R}$: $$\alpha f: \mathbb{R} \rightarrow \mathbb{R}, \quad \alpha f: x \mapsto \alpha f(x).$$ It is then easy to see that, functions, with addition and scalar multiplication defined in this manner, satisfy the axioms of a vector space:


In [95]:
u = lambda x: 2. * x

In [96]:
v = lambda x: x * x

In [97]:
w = lambda x: 3. * x + 1.

In [98]:
def plus(f1, f2):
    return lambda x: f1(x) + f2(x)

A1 (associativity of addition):


In [99]:
lhs = plus(plus(u, v), w)

In [100]:
rhs = plus(u, plus(v, w))

In [101]:
lhs(5.) == rhs(5.)


Out[101]:
True

In [102]:
lhs(10.) == rhs(10.)


Out[102]:
True

A2 (commutativity of addition):


In [103]:
plus(u, v)(5.) == plus(v, u)(5.)


Out[103]:
True

S1 (distributivity of scalar multiplication over vector addition):


In [104]:
def scalar_product(s, f):
    return lambda x: s * f(x)

In [105]:
lhs = scalar_product(alpha, plus(u, v))
rhs = plus(scalar_product(alpha, u), scalar_product(alpha, v))
lhs(5.) == rhs(5.)


Out[105]:
True

We can verify the other axioms in a similar manner.

Linear combinations, linear independence, span, and basis

A weighted (by scalars) sum of vectors is called a linear combination: $$\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \alpha_3 \mathbf{v}_3 + \ldots + \alpha_k \mathbf{v}_k,$$ for example, $$3.5 \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix} + 2.7 \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix} + 2.35 \begin{pmatrix} 1 \\ 1 \\ 1.5 \end{pmatrix}.$$


In [106]:
3.5 * np.array([-3., 3., 5.]) + 2.7 * np.array([25., 7., 13.]) + 2.35 * np.array([1., 1., 1.5])


Out[106]:
array([ 59.35 ,  31.75 ,  56.125])

Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k$ are said to be linearly independent if none of them can be written as a linear combination of the remaining vectors. Thus $$\begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}, \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1.5 \end{pmatrix}$$ are linearly independent, whereas $$\begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}, \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}, \begin{pmatrix} 34 \\ -2 \\ -2 \end{pmatrix}$$ aren't, because $$\begin{pmatrix} 34 \\ -2 \\ -3 \end{pmatrix} = -3 \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix} + \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}.$$

Vectors are set to span a particular vector space if any vector in that vector space can be written as a linear combination of those vectors.

Consider the vectors $\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$.

Can we obtain another vector, say $\mathbf{w} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}$ as a linear combination of $\mathbf{u}$ and $\mathbf{v}$? In other words, can we find the scalars $x_1$ and $x_2$ such that $$x_1 \mathbf{u} + x_2 \mathbf{v} = \mathbf{w}?$$

This seems easy enough: what we really need is $$x_1 \begin{pmatrix} 4 \\ 6 \end{pmatrix} + x_2 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix},$$ i.e. $$\begin{pmatrix} 4 x_1 \\ 6 x_1 \end{pmatrix} + \begin{pmatrix} 5 x_2 \\ 3 x_2 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix},$$ or $$\begin{pmatrix} 4 x_1 + 5 x_2 \\ 6 x_1 + 3 x_2 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}.$$

The left-hand side and the right-hand side must be equal coordinatewise. Thus we obtain a system of linear equations $$4 x_1 + 5 x_2 = -7,$$ $$6 x_1 + 3 x_2 = 3.$$

From the second linear equation, we obtain $$x_1 = \frac{3 - 3 x_2}{6} = \frac{1 - x_2}{2}.$$ We substitute this into the first linear equation, obtaining $$4 \cdot \frac{1 - x_2}{2} + 5 x_2 = -7,$$ whence $x_2 = -3$, and so $x_1 = \frac{1 - (-3)}{2} = 2$.

Let's check:


In [107]:
u = np.array([4, 6])
v = np.array([5, 3])
x1 = 2.; x2 = -3.
x1 * u + x2 * v


Out[107]:
array([-7.,  3.])

We notice that there is nothing special about $\mathbf{w} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}$ in the above example. We could take a general $\mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}$ and find such $x_1, x_2$ that $$x_1 \mathbf{u} + x_2 \mathbf{v} = \mathbf{b}.$$

Our linear system then becomes $$4 x_1 + 5 x_2 = b_1,$$ $$6 x_1 + 3 x_2 = b_2.$$

From the second linear equation, we obtain $$x_1 = \frac{b_2 - 3 x_2}{6}.$$ We substitute this into the first linear equation, obtaining $$x_2 = \frac{1}{3} b_1 - \frac{2}{9} b_2,$$ hence $$x_1 = -\frac{1}{6} b_1 + \frac{5}{18} b_2.$$

We can check that these results are consistent with the above when $b_1 = -7$, $b_2 = 3$:


In [108]:
b = np.array([-7, 3])
x = np.array([-1./6. * b[0] + 5./18. * b[1], 1./3. * b[0] - 2./9. * b[1]])
x


Out[108]:
array([ 2., -3.])

Indeed they are.

A set of vectors that span their vector space and are linearly independent are called a basis for that space.

For example, the vectors $$\mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad \mathbf{e}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$$ span the vector space $\mathbb{R}^2$. Any vector $\mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}$ in $\mathbb{R}^2$ can be written as a linear combination of these vectors, namely as $$\mathbf{b} = b_1 \mathbf{e}_1 + b_2 \mathbf{e}_2.$$

$\{\mathbf{e}_1, \mathbf{e}_2\}$ is what is known as the standard basis for $\mathbb{R}^2$, but there are others. We have already seen that the vectors in $\left\{\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}, \mathbf{v} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}\right\}$ span $\mathbb{R}^2$. In fact, they are linearly independent and also form a basis of $\mathbb{R}^2$.

We have already seen that the change of basis from $\{\mathbf{e}_1, \mathbf{e}_2\}$ to $\{\mathbf{u}, \mathbf{v}\}$ is given by the above solution to the linear system, namely $$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} -\frac{1}{6} b_1 + \frac{5}{18} b_2 \\ \frac{1}{3} b_1 - \frac{2}{9} b_2 \end{pmatrix}.$$ Thus we can rewrite $$\mathbf{b} = b_1 \mathbf{e}_1 + b_2 \mathbf{e}_2 = x_1 \mathbf{u} + x_2 \mathbf{v}.$$

Change of basis forms the basis (no pun intended) of many statistical and machine learning techniques, such as the principal components analysis (PCA).

It can be shown that all bases (that's the plural of the word basis) for a particular vector space have the same number of elements called the dimension of that vector space. Thus $\mathbb{R}^2$ is two-dimensional, $\mathbb{R}^3$ is three-dimensional, etc. Whereas it can be shown that the vector space of functions, which we introduced above, is infinite-dimensional. The study of infinite-dimensional vector spaces gives rise to a separate discipline called infinite-dimensional analysis.

If you are interested in infinite-dimensional analysis, have a look at Charalambos D. Aliprantis's book Infinite Dimensional Analysis: A Hitchhiker's Guide: https://www.amazon.co.uk/Infinite-Dimensional-Analysis-Hitchhikers-Guide/dp/3540326960/

Subspaces

A subset of a vector space is itself a vector space if it contains the zero vector and is closed under addition and scalar multiplication. It is then called a subspace of the original space.

For example, all multiples of $\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, $$\alpha \mathbf{u}, \quad \alpha \in \mathbb{R},$$ form a one-dimensional subspace of the two dimensional vector space $\mathbb{R}^2$. (Note that the zero vector, which must be present in any vector space, is present in this subspace, since $0 \cdot \mathbf{u}$ is in it.)

Subspaces of Euclidean spaces

Consider a subspace of a three-dimensional Euclidean space. Suppose that it contains a vector, say $$\mathbf{v}_1 = \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}.$$


In [109]:
v1 = np.array([-3, 3, 5])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
ax.scatter(v1[0], v1[1], v1[2]);


But, by the axioms of a vector space, it must also contain all scalar multiples of $\mathbf{v}_1$, e.g. $-1.5 \cdot \mathbf{v}_1$, $1.5 \cdot \mathbf{v}_1$, $3 \cdot \mathbf{v}_1$, $\ldots$.


In [110]:
v1 = np.array([-3, 3, 5])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
ax.scatter(v1[0], v1[1], v1[2])
scalars = np.linspace(-3, 3, 25)
multiples_of_v1 = np.array([s * v1 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2]);


Mathematically, this means that our subspace contains the line going through $\mathbf{v}_1$ and through the origin.

Our subspace of $\mathbb{R}^3$ could well be that line.

But what if it contains another vector, $\mathbf{v}_2$, which is linearly independent of $\mathbf{v}_1$, say $\mathbf{v}_2 = \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}$?

The same argument as above applies, so our vector space must also contain all scalar multiples of $\mathbf{v}_2$:


In [111]:
v1 = np.array([-3, 3, 5])
v2 = np.array([25, 7, 13])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
scalars = np.linspace(-3, 3, 50)
multiples_of_v1 = np.array([s * v1 for s in scalars])
multiples_of_v2 = np.array([s * v2 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2])
ax.scatter(multiples_of_v2[:,0], multiples_of_v2[:,1], multiples_of_v2[:,2])
ax.scatter(v1[0], v1[1], v1[2], s=100)
ax.scatter(v2[0], v2[1], v2[2], s=100);


But the axioms of the vector space require that the vector space also contain all linear combinations of $\mathbf{v}_1$ and $\mathbf{v}_2$, $$\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2.$$ Geometrically, this means that it must contain the plane containing $\mathbf{v}_1$, $\mathbf{v}_2$ and all their linear combinations (including the origin).


In [112]:
v1 = np.array([-3, 3, 5]); v2 = np.array([25, 7, 13])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
scalars = np.linspace(-3, 3, 50)
multiples_of_v1 = np.array([s * v1 for s in scalars])
multiples_of_v2 = np.array([s * v2 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2])
ax.scatter(multiples_of_v2[:,0], multiples_of_v2[:,1], multiples_of_v2[:,2])
ax.scatter(v1[0], v1[1], v1[2], s=100)
ax.scatter(v2[0], v2[1], v2[2], s=100)
xx, yy = np.meshgrid(range(-80, 80), range(-20, 20))
z = (4. * xx + 164 * yy) * (1. / 96.)
ax.plot_surface(xx, yy, z, alpha=0.2);


What if the subspace contains another vector, which is linearly independent of both $\mathbf{v}_1$ and $\mathbf{v}_2$, say $\mathbf{v}_3$?

If that is the case, since we can obtain any point in $\mathbb{R}^3$ as a linear combination of three vectors, our subspace is actually equal to $\mathbb{R}^3$.

It is worth mentioning that the subspace containing one and only one vector, $\begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}$, satisfies all the axioms of a vector space and is therefore a (trivial) subspace of $\mathbb{R}^3$.

Thus, geometrically, a subspace of the Euclidean space $\mathbb{R}^3$ is either the zero vector, or a line through the origin, or a plane through the origin, or the entire $\mathbb{R}^3$.

We can extend this reasoning to higher-dimensional Euclidean spaces.

A subspace of the Euclidean space $\mathbb{R}^4$ is either the zero vector, or a line through the origin, or a plane through the origin, or a three-dimensional hyperplane through the origin (which we cannot easily visualize), or the entire $\mathbb{R}^4$.

Not every line in $\mathbb{R}^3$ is a subspace of $\mathbb{R}^3$. Only the lines going through the origin, as a vector space must contain the zero vector.

Similarly, not every plane in $\mathbb{R}^3$ is a subspace of $\mathbb{R}^3$. Only the planes containing the origin.

The equation of a plane

Consider a plane in 3D, which contains the point $$\mathbf{p} = \begin{pmatrix} p_1 \\ p_2 \\ p_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}$$ and such that the vector $$\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \\ 2 \end{pmatrix}$$ is orthogonal to that plane.

A vector is orthogonal to a plane iff it is orthogonal to vectors joining any two points in that plane.

In particular, for any point $$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix},$$ the vector from it to $\mathbf{p}$ is orthogonal to $\mathbf{v}$, $$\langle (\mathbf{x} - \mathbf{p}), \mathbf{v} \rangle = 0.$$

This equation is sufficient for us to work out the equation of the plane:

$$\text{Left-hand side} = \langle (\mathbf{x} - \mathbf{p}), \mathbf{v} \rangle \\ = (x_1 - p_1) \cdot v_1 + (x_2 - p_2) \cdot v_2 + (x_3 - p_3) \cdot v_3 \\ = (x_1 - 1) \cdot 1 + (x_2 - 2) \cdot 1 + (x_3 - 3) \cdot 2 \\ = x_1 + x_2 + 2x_3 - 9.$$

Putting this together with the right-hand side, we get the equation of the plane, $$x_1 + x_2 + 2x_3 = 9.$$

Every point in that plane satisfies this equation. In particular, we can check that $\mathbf{p}$ satisfies it: $$p_1 + p_2 + 2p_3 = 1 + 2 + 2 \cdot 3 = 9.$$

And conversely, if a point satisfies this equation, it belongs to that plane. For example, if we pick $x_1 = 2, x_2 = 1$, then $$2 + 1 + 2x_3 = 9$$ implies that $x_3 = 3$. The point $\begin{pmatrix} 2 \\ 1 \\ 3 \end{pmatrix}$ also belongs to this plane.

You may be wondering, how we obtained the equation of the plane containing the vectors $\mathbf{v}_1 = \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}$ and $\mathbf{v}_2 = \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}$ above.

The so-called cross-product operation on two vectors gives a vector that is orthogonal to both of them:


In [113]:
v1 = np.array([-3., 3., 5.])
v2 = np.array([25., 7., 13.])
np.cross(v1, v2)


Out[113]:
array([   4.,  164.,  -96.])

This gives us a vector that is orthogonal (normal) to the plane containing our two points. We can then proceed to obtain the equation of the plane, just as we did above.

Bibliography

  1. An excellent, deeper introduction to linear algebra can be found in Professor Gilbert Strang's video lectures for the 18.06 Linear Algebra course at MIT.
  2. The supporting textbook for that course is Introduction to Linear Algebra, 5th edition, by Gilbert Strang.
  3. A more recent version of this book updated for data science and deep learning is Linear Algebra and Learning from Data, by Gilbert Strang, published in 2019.
  4. Another good text on linear algebra is Linear Algebra, 3rd edition, by John B. Fraleigh and Raymond A. Beauregard.
  5. One may also be recommended to read Schaum's Outline of Linear Algebra, 6th edition, by Seymour Lipschutz and Marc Lipson.
  6. Finally, we recommend getting hold of M2N1 — Numerical Analysis lecture notes by Brad Baxter, which, in addition to theory, contain some useful exercises.