In [1]:
# Copyright (c) Thalesians Ltd, 2018-2019. All rights reserved
# Copyright (c) Paul Alexander Bilokon, 2018-2019. All rights reserved
# Author: Paul Alexander Bilokon <paul@thalesians.com>
# Version: 3.0 (2019.05.27)
# Previous versions: 1.0 (2018.08.03), 2.0 (2019.04.19)
# Email: education@thalesians.com
# Platform: Tested on Windows 10 with Python 3.6
In data science, machine learning (ML), and artificial intelligence (AI), we usually deal not with single numbers but with multivariate (i.e. containing multiple elements or entries) lists of numbers — mathematically speaking, vectors, — and multivariate tables of numbers — mathematically speaking, matrices. Therefore we solve multivariate equations, apply multivariate calculus to find optima of multivariate functions, etc.
The branch of mathematics that studies vectors, matrices, and related mathematical objects is called linear algebra. It is one of the most practically useful areas of mathematics in applied work and a prerequisite for data science, machine learning (ML), and artificial intelligence (AI).
In [2]:
%matplotlib inline
In [3]:
import math
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
In everyday life, we are used to doing arithmetics with numbers, such as
In [4]:
5 + 3
Out[4]:
and
In [5]:
10 * 5
Out[5]:
The numbers 5 and 3 are mathematical objects.
Indeed, when we think about mathematics, we probably think of numbers as the fundamental objects of study. Numbers used for counting, namely $1, 2, 3, 4, 5, \ldots$ (and so on) are called natural numbers. We say that they belong to the set (i.e. a collection of objects) of natural numbers, $\mathbb{N}$, and write $$3 \in \mathbb{N}$$ to indicate that, for example, 3 belongs to this set.
Not all numbers are quite as straightforward (quite as natural). For example, the number zero wasn't invented (discovered?) until much later than the natural numbers. We sometimes write $\{0\} \cup \mathbb{N}$ to denote the set containing precisely the natural numbers along with 0. That is, the set that is the union (denoted by $\cup$) of the set of natural numbers $\mathbb{N}$ and the singleton set (i.e. a set containing exactly one element) $\{0\}$. (In mathematics we often use curly brackets to define sets by enumerating their elements.)
In mathematics we often use curly brackets to define sets by enumerating their elements, as we did in the case of $\{0\}$. While the notation $\mathbb{N}$ is standard for the set of natural numbers, we could, using the curly bracket notation, write out this set as $$\mathbb{N} = \{1, 2, 3, 4, 5, \ldots\}.$$
Then, there are the negative numbers, $\ldots, -5, -4, -3, -2, -1$. These, together with 0 and the natural numbers are collectively referred to as integers, or are said to belong to the set of integers, denoted $\mathbb{Z}$: $$\mathbb{Z} = \{\ldots, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, \ldots\}.$$
Since every element of $\mathbb{N}$ is in $\mathbb{Z}$, we say that $\mathbb{N}$ is a subset of $\mathbb{Z}$, and write $$\mathbb{N} \subseteq \mathbb{Z}.$$ Two sets $A$ and $B$ are said to be equal if $A \subseteq B$ and also $B \subseteq A$, in other words, if $A$ and $B$ contain exactly the same elements. In this case we write $$A = B.$$
The negative number $-3$ is sometimes referred to as the additive inverse of $3$, because adding it to $3$ yields zero:
In [6]:
3 + (-3)
Out[6]:
There are other, somewhat unnatural numbers, such as the multiplicative inverse of 3, $\frac{1}{3}$. When multiplied by its multiplicative inverse, a number yields not zero, but one, the identity or unit:
In [7]:
3 * (1 / 3)
Out[7]:
The fractions, such as $\frac{1}{3}$, $\pi$, $e$, along with the integers, form a set of real numbers, $\mathbb{R}$. Clearly, both $\mathbb{N}$ and $\mathbb{Z}$ are subsets of $\mathbb{R}$: $$\mathbb{N} \subseteq \mathbb{Z} \subseteq \mathbb{R}.$$
Real numbers obey certain rules (in mathematics we say axioms) of arithmetics, e.g. multiplication is distributive over addition:
In [8]:
3 * (0.5 + 100) == 3 * 0.5 + 3 * 100
Out[8]:
To find out more about these rules, read Harold Davenport's book The Higher Arithmetic: https://www.amazon.co.uk/Higher-Arithmetic-Introduction-Theory-Numbers/dp/0521722365
One can think of other kinds of mathematical objects. They may or may not be composed of numbers.
In order to specify the location of a point on a two-dimensional plane you need a mathematical object composed of two different numbers: the $x$- and $y$-coordinates. Such a point may be given by a single mathematical object, $\mathbf{v} = \begin{pmatrix} 3 \\ 5 \end{pmatrix}$, where we understand that the first number specifies the $x$-coordinate, while the second the $y$-coordinate.
When we were defining sets, the order didn't matter. Moreover, the multiplicity of elements in a set is ignored. Thus $$\{\text{Newton}, \text{Leibnitz}\}$$ is exactly the same set as $$\{\text{Leibnitz}, \text{Newton}\}$$ and $$\{\text{Newton}, \text{Leibnitz}, \text{Newton}\}.$$ There are exactly two elements in this set (we only count the distinct elements), Leibnitz and Newton.
Not so with points: $$\begin{pmatrix} 3 \\ 5 \end{pmatrix}$$ is distinct from $$\begin{pmatrix} 5 \\ 3 \end{pmatrix}$$ and both of these are distinct from $$\begin{pmatrix} 3 \\ 5 \\ 3 \end{pmatrix}.$$ The order of the elements and their multiplicity matter for points.
We can visualize the point $\mathbf{v} = \begin{pmatrix} 3 \\ 5 \end{pmatrix}$ by means of a plot:
In [9]:
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.plot(3, 5, 'o', label='$\mathbf{v}$')
plt.axis([-5.5, 5.5, -5.5, 5.5])
plt.xticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.yticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');
It may be useful to think of this object, $\begin{pmatrix} 3 \\ 5 \end{pmatrix}$, which we shall call a vector, as displacement from the origin, $\begin{pmatrix} 0 \\ 0 \end{pmatrix}$. We can then read $\begin{pmatrix} 3 \\ 5 \end{pmatrix}$ as "go to the right (of the origin) by three units, and then go up (from the origin) by five units". Therefore vectors may be visualized as arrows, specifying a direction, as well as as points.
In [10]:
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, 3, 5, shape='full', head_width=1, length_includes_head=True)
plt.axis([-5.5, 5.5, -5.5, 5.5])
plt.xticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.yticks([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5.])
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');
Two-dimensional vectors are said to belong to a set called the Euclidean 2-plane, denoted $\mathbb{R}^2$. The set of real numbers, $\mathbb{R}$, is sometimes referred to as the Euclidean real line, and may also be written as $\mathbb{R}^1$ (although this is rarely done in practice).
In data science, we often think of the $x$-coordinate as the input variable and the $y$-coordinate as the output variable. Consider, for example the "diabetes dataset" from Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression", Annals of Statistics (with discussion), 407-499: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
In [11]:
from sklearn import datasets
dataset = datasets.load_diabetes()
In this dataset, "Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of $n = 442$ diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline."
Let us consider points (vectors) where the $x$-coordinate represents the body mass index (input) and the $y$-coordinate the aforementioned "quantitative measure of disease progression" (output) corresponding to the input.
In [12]:
dataset_x = dataset.data[:, 2]
dataset_y = dataset.target
In data science our goal is to find the relationship between the output and the input. If such a relationship exists, we may be able to explain or predict the output by means of the input.
One such input-output point is
In [13]:
(dataset_x[0], dataset_y[0])
Out[13]:
another
In [14]:
(dataset_x[1], dataset_y[1])
Out[14]:
And so on. There are
In [15]:
len(dataset_x)
Out[15]:
such points in total, the last one being
In [16]:
(dataset_x[len(dataset_x)-1], dataset_y[len(dataset_x)-1])
Out[16]:
We can visualize these points by plotting them on an $xy$-plane, just as we visualized vectors as points before:
In [17]:
plt.plot(dataset_x[0], dataset_y[0], 'o', label='first point')
plt.plot(dataset_x[1], dataset_y[1], 'o', label='second point')
plt.plot(dataset_x[len(dataset_x)-1], dataset_y[len(dataset_x)-1], 'o', label='last point')
plt.legend();
To get a better idea of the relationship between the input and output, let us plot all available points, just as we plotted the three points above. The result is a scatter plot:
In [18]:
plt.plot(dataset_x, dataset_y, 'o')
plt.xlabel('body mass index')
plt.ylabel('disease progression');
The scatter plot shows that the points follow a certain pattern. In particular, the disease progression ($y$-coordinate, output) increases with the patient's body mass index ($x$-coordinate, input). Visualization by means of the scatter plot has helped us spot this relationship.
But the point we are really trying to make here is that vectors are extremely useful in data science. Before we could produce a scatter plot, we had to start thinking of each input-output pair as a vector.
Would it make sense to define addition for vectors? And if it would, how would we define it? Thinking of vectors as displacements gives us a clue: the sum of two vectors, $\mathbf{u}$ and $\mathbf{v}$, could be defined by "go in the direction specified by $\mathbf{u}$, then in the direction specified by $\mathbf{v}$".
If, for example, $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, then their sum would be obtained as follows:
The end result?
In [19]:
u = np.array([5, 3])
v = np.array([4, 6])
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
Geometrically, we have appended the arrow representing the vector $\mathbf{v}$ to the end of the arrow representing the vector $\mathbf{u}$ drawn starting at the origin.
What if we started at the origin, went in the direction specified by $\mathbf{v}$ and then went in the direction specified by $\mathbf{u}$? Where would we end up?
In [20]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
We would end up in the same place. More generally, for any vectors $\mathbf{u}$ and $\mathbf{v}$, vector addition is commutative, in other words, $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$.
In [21]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
The sum $\mathbf{u} + \mathbf{v}$ (which, of course, is equal to $\mathbf{v} + \mathbf{u}$ since vector addition is commutative) is itself a vector, which is represented by the diagonal of the parallelogram formed by the arrows above.
In [22]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(u[0], u[1], v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', u + v + (-.5, -2.))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(v[0], v[1], u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', v + u + (-2., -.5))
plt.arrow(0, 0, u[0] + v[0], u[1] + v[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
We observe that the sum of $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ is given by adding them elementwise or coordinate-wise: $\mathbf{u} + \mathbf{v} = \begin{pmatrix} 5 + 4 \\ 3 + 6 \end{pmatrix} = \begin{pmatrix} 9 \\ 9 \end{pmatrix}$.
It is indeed unsurprising that vector addition is commutative, since the addition of ordinary numbers is commutative: $$\mathbf{u} + \mathbf{v} = \begin{pmatrix} 5 + 4 \\ 3 + 6 \end{pmatrix} = \begin{pmatrix} 4 + 5 \\ 6 + 3 \end{pmatrix} = \mathbf{v} + \mathbf{u}.$$
Would it make sense to multiply a vector, such as $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ by a number, say $\alpha = 1.5$ (we'll start referring to ordinary numbers as scalars to distinguish them from vectors)? A natural way to define scalar multiplication of vectors would also be elementwise: $$\alpha \mathbf{u} = 1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} 1.5 \cdot 5 \\ 1.5 \cdot 3 \end{pmatrix} = \begin{pmatrix} 7.5 \\ 4.5 \end{pmatrix}.$$
How can we interpret this geometrically? It turns out that we obtain a vector whose length is $1.5$ times that of $\mathbf{u}$, and whose direction is the same as that of $\mathbf{u}$.
In [23]:
alpha = 1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
What if, instead, we multiplied $\mathbf{u}$ by $\beta = -1.5$? Well, $$\beta \mathbf{u} = -1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} -7.5 \\ -4.5 \end{pmatrix}.$$
In [24]:
alpha, beta = 1.5, -1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.arrow(0, 0, beta * u[0], beta * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\beta \mathbf{u}$', beta * u + (-.5, 1.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
Geometrically, we have obtained a vector whose length is $1.5$ times that of $\mathbf{u}$, and whose direction is the opposite (because $\beta$ is negative) to that of $\mathbf{u}$.
In Python we use the NumPy library, which we usually import with
In [25]:
import numpy as np
to represent vectors as NumPy arrays:
In [26]:
u = np.array([3., 5.])
v = np.array([4., 6.])
We can then add vectors:
In [27]:
u + v
Out[27]:
And multiply them by scalars:
In [28]:
1.5 * u
Out[28]:
In [29]:
-1.5 * u
Out[29]:
Let us go back to the diabetes dataset, which we started considering. We saved the $x$-coordinates as
In [30]:
dataset_x
Out[30]:
and the $y$-coordinates as
In [31]:
dataset_y
Out[31]:
We then visualized the data points using a scatter plot:
In [32]:
plt.plot(dataset_x, dataset_y, 'o')
plt.xlabel('body mass index')
plt.ylabel('disease progression');
We can use Python's list comprehensions to obtain a list of data points:
In [33]:
data_points = [np.array([[x], [y]]) for x, y in zip(dataset_x, dataset_y)]
In [34]:
data_points[0]
Out[34]:
Then use vector arithmetics to find the
In [35]:
data_points_mean = (1. / len(data_points)) * np.sum(data_points, axis=0)
In [36]:
data_points_mean
Out[36]:
and add this point to our scatter plot in a different colour:
In [37]:
plt.plot([dp[0] for dp in data_points], [dp[1] for dp in data_points], 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label="data points' mean")
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();
We have already seen that a vector, say $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$, can be thought of as a direction from the origin:
In [38]:
u = np.array([5, 3])
In [39]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], shape='full', head_width=1, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');
Suppose that we have another vector, say $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$:
In [40]:
v = np.array([4, 6])
In [41]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');
Can we work out the direction from $\mathbf{u}$ to $\mathbf{v}$?
Mathematically, this direction is given by the vector $$\mathbf{d} = \mathbf{v} - \mathbf{u},$$
In [42]:
v = np.array([4, 6])
d = v - u
In [43]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none', label='origin')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .5))
plt.arrow(0, 0, d[0], d[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate('$\mathbf{d}$', v - u + (-.5, -2.))
plt.arrow(u[0], u[1], d[0], d[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate('$\mathbf{d}$', u + v - u + (.75, -2.))
plt.xlim(-10, 10); plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21)); plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid()
plt.legend(loc='lower right');
We could draw $\mathbf{d} = \mathbf{v} - \mathbf{u}$ from the origin, or we could draw it starting at the "arrow tip" of $\mathbf{u}$, in which case it will connect the "arrow tips" of $\mathbf{u}$ and $\mathbf{v}$.
A vector has a length (the size of the arrow) as well as a direction. We have already seen that $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $$\alpha \mathbf{u} = 1.5 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} 1.5 \cdot 5 \\ 1.5 \cdot 3 \end{pmatrix} = \begin{pmatrix} 7.5 \\ 4.5 \end{pmatrix}$$ have the same direction, but $\alpha \mathbf{u}$ is $\alpha$ times longer:
In [44]:
alpha = 1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u}$', u + (.5, -1.))
plt.arrow(0, 0, alpha * u[0], alpha * u[1], head_width=.75, length_includes_head=True, edgecolor='red')
plt.annotate(r'$\alpha \mathbf{u}$', alpha * u + (.5, -1.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
How do we obtain the length of a vector? By Pythagoras's theorem, we add up the coordinates and take the square root:
In [45]:
beta = -1.5
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$\mathbf{u} = (5, 3)$', u + (-.5, 1.))
plt.arrow(0, 0, u[0], 0, head_width=.75, length_includes_head=True)
plt.annotate(r'$(5, 0)$', np.array([u[0], 0]) + (-.5, -1.))
plt.arrow(u[0], 0, 0, u[1], head_width=.75, length_includes_head=True)
plt.annotate(r'$(0, 3)$', u + (.5, -1.))
plt.xlim(-10, 10)
plt.ylim(-10, 10)
plt.xticks(np.linspace(-10., 10., 21))
plt.yticks(np.linspace(-10., 10., 21))
plt.gca().set_aspect('equal', adjustable='box')
plt.grid();
The resulting quantity, which is equal to the length of the vector, is called the norm of the vector and is denoted by $$\|\mathbf{u}\| = \sqrt{u_1^2 + u_2^2} = \sqrt{5^2 + 3^2} = \sqrt{34} = 5.8309518... .$$
In NumPy, we can manually compute the length of a vector...
In [46]:
u = np.array([5, 3])
u
Out[46]:
In [47]:
np.sqrt(np.sum(u * u))
Out[47]:
...or use the library function np.linalg.norm
:
In [48]:
np.linalg.norm(u)
Out[48]:
We have already seen that $$\mathbf{d} = \mathbf{v} - \mathbf{u}$$
gives the direction from vector $\mathbf{u}$ to vector $\mathbf{v}$.
The distance between these two vectors is given by $$\|\mathbf{d}\| = \|\mathbf{v} - \mathbf{u}\|.$$
In our example,
In [49]:
np.linalg.norm(v - u)
Out[49]:
We have used vector arithmetics to compute the mean of the data points and add it to the scatter plot:
In [50]:
plt.plot([dp[0] for dp in data_points], [dp[1] for dp in data_points], 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label='mean data point')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();
Let us now compute the distance from each data point to data_points_mean
:
In [51]:
distances_to_mean = [np.linalg.norm(dp - data_points_mean) for dp in data_points]
It may be interesting to examine the histogram of these distances:
In [52]:
plt.hist(distances_to_mean, bins=20);
In many machine learning algorithms, such as clustering, we end up finding datapoints within a certain fixed distance from — within a neighbourhood of — a data point.
Let us now compute the standard deviation of the distances...
In [53]:
sd_distance_to_mean = np.sqrt(np.var(distances_to_mean))
...and highlight those datapoints which fall within one standard deviation distance from the mean:
In [54]:
points_within_1_sd_from_mean = [dp for dp in data_points if np.linalg.norm(dp - data_points_mean) <= sd_distance_to_mean]
In [55]:
plt.plot(dataset_x, dataset_y, 'o')
plt.plot(data_points_mean[0], data_points_mean[1], 'ro', label='mean data point')
plt.plot([dp[0] for dp in points_within_1_sd_from_mean], [dp[1] for dp in points_within_1_sd_from_mean], 'yo')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();
We see that, because the units of body mass index are so much less than the units of the quantitative measure of disease progression, the body mass index doesn't contribute much to the distance. This difference in units can confuse some clustering algorithms.
Let us therefore normalize the units, so that, for both variables, they fall within $[0, 1]$:
In [56]:
normalized_data_points = (data_points - np.min(data_points, axis=0)) \
/ (np.max(data_points, axis=0) - np.min(data_points, axis=0))
In [57]:
plt.plot([dp[0] for dp in normalized_data_points], [dp[1] for dp in normalized_data_points], 'o');
Let us obtain the set of points whose distance from the mean falls within one standard deviation for these, normalized, units:
In [58]:
normalized_data_points_mean = np.mean(normalized_data_points, axis=0)
distances_to_mean = [np.linalg.norm(dp - normalized_data_points_mean) for dp in normalized_data_points]
sd_distance_to_mean = np.sqrt(np.var(distances_to_mean))
In [59]:
points_within_1_sd_from_mean = [
dp for dp in normalized_data_points \
if np.linalg.norm(dp - normalized_data_points_mean) <= sd_distance_to_mean]
In [60]:
plt.plot([dp[0] for dp in normalized_data_points], [dp[1] for dp in normalized_data_points], 'o')
plt.plot(normalized_data_points_mean[0], normalized_data_points_mean[1], 'ro', label='mean data point')
plt.plot([dp[0] for dp in points_within_1_sd_from_mean], [dp[1] for dp in points_within_1_sd_from_mean], 'yo')
plt.xlabel('body mass index')
plt.ylabel('disease progression')
plt.legend();
The inner product or dot product of two vectors is the sum of products of their respective coordinates: $$\langle \mathbf{u}, \mathbf{v} \rangle = u_1 \cdot v_1 + u_2 \cdot v_2.$$
In particular, for $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, it is given by $$\langle \mathbf{u}, \mathbf{v} \rangle = 5 \cdot 4 + 3 \cdot 6 = 38.$$
We can check our calculations using Python:
In [61]:
np.dot(u, v)
Out[61]:
Geometrically speaking, the inner product, when appropriately normalized, gives the cosine of the angle in radians between two vectors, $\theta$.
To be more precise, $$\cos \theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\| \mathbf{u} \| \| \mathbf{v} \|}.$$
Thus the angle between $\mathbf{u} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ is given by
In [62]:
angle = np.arccos(np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)))
angle
Out[62]:
in radians, or
In [63]:
angle / np.pi * 180.
Out[63]:
in degrees.
We can visually verify that this is indeed true:
In [64]:
u = np.array([5, 3])
v = np.array([4, 6])
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, v[0], v[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{v}$', v + (-.5, .75))
plt.xlim(-10, 10)
plt.ylim(-10, 10);
Note also that $$\| \mathbf{u} \| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}.$$
Two vectors $\mathbf{u}$ and $\mathbf{v}$ are said to be orthogonal or perpendicular if the angle between them is 90 degrees ($\frac{\pi}{2}$ radians). Since $\cos \frac{\pi}{2} = 0$, this is equivalent to saying $$\langle \mathbf{u}, \mathbf{v} \rangle = 0.$$
Consider, for example, $\mathbf{u}$ and $\mathbf{w} = \begin{pmatrix} 1 \\ -\frac{5}{3} \end{pmatrix}$,
In [65]:
u = np.array([5, 3])
w = np.array([1, -5./3.])
In [66]:
np.dot(u, w)
Out[66]:
In [67]:
plt.figure(figsize=(5, 5))
plt.plot(0, 0, 'o', markerfacecolor='none')
plt.arrow(0, 0, u[0], u[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{u}$', u + (-.5, -1.5))
plt.arrow(0, 0, w[0], w[1], head_width=.75, length_includes_head=True)
plt.annotate('$\mathbf{w}$', w + (-.5, -.75))
plt.xlim(-10, 10)
plt.ylim(-10, 10);
Notice that the inner product is commutative, $$\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle.$$
Furthermore, if $\alpha$ is a scalar, then $$\langle \alpha \mathbf{u}, \mathbf{v} \rangle = \alpha \langle \mathbf{u}, \mathbf{v} \rangle,$$ and $$\langle \mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = \langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle;$$ these two properties together are referred to as linearity in the first argument.
The inner product is positive-definite. In other words, for all vectors $\mathbf{u}$, $$\langle \mathbf{u}, \mathbf{u} \rangle \geq 0,$$ and $$\langle \mathbf{u}, \mathbf{u} \rangle = 0$$ if and only if $\mathbf{u}$ is the zero vector, $\mathbf{0}$, i.e. the vector whose elements are all zero.
So far, we have considered vectors that have two coordinates each, corresponding to coordinates on the two-dimensional plane, $\mathbb{R}^2$. Instead, we could consider three-dimensional vectors, such as $\mathbf{a} = \begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix}$ and $\mathbf{b} = \begin{pmatrix} 4 \\ 6 \\ 4 \end{pmatrix}$:
In [68]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlim((-10, 10))
ax.set_ylim((-10, 10))
ax.set_zlim((-10, 10))
ax.quiver(0, 0, 0, 3, 5, 7, color='blue', label='$\mathbf{a}$')
ax.quiver(0, 0, 0, 4, 6, 4, color='green', label='$\mathbf{b}$')
ax.plot([0], [0], [0], 'o', markerfacecolor='none', label='origin')
ax.legend();
In the three-dimensional case, vector addition and multiplication by scalars are defined elementwise, as before:
In [69]:
a = np.array((3., 5., 7.))
b = np.array((4., 6., 4.))
a + b
Out[69]:
In [70]:
alpha
Out[70]:
In [71]:
alpha * a
Out[71]:
In [72]:
beta = -alpha
beta * a
Out[72]:
In [73]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlim((-10, 10))
ax.set_ylim((-10, 10))
ax.set_zlim((-10, 10))
ax.quiver(0, 0, 0, a[0], a[1], a[2], color='blue', label='$\mathbf{a}$')
ax.quiver(0, 0, 0, -alpha * a[0], -alpha * a[1], -alpha * a[2], color='green', label='$-1.5 \mathbf{a}$')
ax.plot([0], [0], [0], 'o', markerfacecolor='none', label='origin')
ax.legend();
We needn't restrict ourselves to three-dimensional vectors. We could easily define $\mathbf{c} = \begin{pmatrix} 4 \\ 7 \\ 8 \\ 2 \end{pmatrix}$ and $\mathbf{d} = \begin{pmatrix} -12 \\ 3 \\ 7 \\ 3 \end{pmatrix}$, and do arithmetics elementwise, as before:
In [74]:
c = np.array((4, 7, 8, 2))
d = np.array((-12, 3, 7, 3))
c + d
Out[74]:
In [75]:
alpha * c
Out[75]:
We wouldn't be able to visualize four-dimensional vectors. We can nonetheless gain some geometric intuition by "pretending" that we deal with familiar two- and three-dimensional spaces.
Notice that it would only make sense to talk about adding the vectors $\mathbf{u}$ and $\mathbf{v}$ if they have the same number of elements.
In general, we talk about the vector space of two-dimensional vectors, $\mathbb{R}^2$, the vector space of three-dimensional vectors, $\mathbb{R}^3$, the vector space of four-dimensional vectors, $\mathbb{R}^4$, etc. and write $$\begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix} \in \mathbb{R}^3$$ meaning that the vector $\begin{pmatrix} 3 \\ 5 \\ 7 \end{pmatrix}$ is an element of $\mathbb{R}^3$. It makes sense to talk about the addition of two vectors if they belong to the same vector space.
In data science, we usually deal with tables of observations. Such as this table from another dataset, the real estate valuation dataset from the paper
Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271
which can be found on https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set
In [76]:
import pandas as pd
df = pd.DataFrame({
'transaction date': [2012.917, 2012.917, 2013.583, 2013.500, 2012.833], 'house age': [32.0, 19.5, 13.3, 13.3, 5.0],
'distance to the nearest MRT station': [84.87882, 306.59470, 561.98450, 561.98450, 390.56840],
'number of convenience stores': [10, 9, 5, 5, 5],
'latitude': [24.98298, 24.98034, 24.98746, 24.98746, 24.97937],
'longitude': [121.54024, 121.53951, 121.54391, 121.54391, 121.54245],
'house price per unit area': [37.9, 42.2, 47.3, 54.8, 43.1]
}, columns=[
'transaction date', 'house age', 'distance to the nearest MRT station', 'number of convenience stores',
'latitude', 'longitude', 'house price per unit area'
])
df
Out[76]:
Vectors are a natural way to represent table columns. In particular, if our goal is to predict (well, explain) house price per unit area using the other columns, we define the required outputs as a vector $$\begin{pmatrix} 37.9 \\ 42.2 \\ 47.3 \\ 54.8 \\ 43.1 \end{pmatrix}.$$
Or, in NumPy,
In [77]:
house_price = np.array([37.9, 42.2, 47.3, 54.8, 43.1])
house_price
Out[77]:
Thus NumPy is one of the most commonly useful Python libraries, a workhorse underlying the work of many other libraries, such as Pandas.
Machine learning algorithms, such as linear regression, can then operate on this object to give us the desired results.
In data science we often talk about the concept of correlation. (If you are not sure about what this is, please look it up.)
Let us generate some correlated normal (Gaussian) data.
In [78]:
means = [3., 50.]
stds = [3., 7.]
corr = 0.75
covs = [[stds[0]**2 , stds[0]*stds[1]*corr],
[stds[0]*stds[1]*corr, stds[1]**2]]
data = np.random.multivariate_normal(means, covs, 50000).T
plt.scatter(data[0], data[1]);
Geometrically, correlation corresponds to the cosine of the angle between the centred (i.e. adjusted by the mean) data vectors:
In [79]:
centred_data = np.empty(np.shape(data))
centred_data[0,:] = data[0,:] - np.mean(data[0,:])
centred_data[1,:] = data[1,:] - np.mean(data[1,:])
In [80]:
np.dot(centred_data[0,:], centred_data[1,:]) / (np.linalg.norm(centred_data[0,:]) * np.linalg.norm(centred_data[1,:]))
Out[80]:
Mathematicians like abstraction. Indeed, much of the power of mathematics is in abstraction. The notions of a vector and vector space can be further generalized as follows.
Formally, a vector space (or linear space) over a field) $F$ (such as real numbers, $\mathbb{R}$) is a set $V$ together with two operations that satisfy the following eight axioms, the first four axioms stipulate the properties of vector addition alone, whereas the last four involve scalar multiplication:
The sets $\mathbb{R}^2$, $\mathbb{R}^3$, $\mathbb{R}^4$, are all vector spaces and the special vectors, whose elements are all zeros, $$\begin{pmatrix} 0 \\ 0 \end{pmatrix} \in \mathbb{R}^2, \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \in \mathbb{R}^3, \ldots$$ are their corresponding zero vectors.
Let us use NumPy to verify that, for example, $\mathbb{R}^3$ adheres to the above axioms:
In [81]:
u = np.array((3., 5., 7.))
v = np.array((4., 6., 4.))
w = np.array((-3., -3., 10.))
Let us check A1 (associativity of addition):
In [82]:
(u + v) + w
Out[82]:
In [83]:
u + (v + w)
Out[83]:
In [84]:
(u + v) + w == u + (v + w)
Out[84]:
In [85]:
np.all((u + v) + w == u + (v + w))
Out[85]:
Let's verify A2 (commutativity of addition):
In [86]:
np.all(u + v == v + u)
Out[86]:
Now let's A3 (identity element of addition):
In [87]:
zero = np.zeros(3)
zero
Out[87]:
In [88]:
np.all(zero + v == v)
Out[88]:
And A4 (inverse elements of addition):
In [89]:
np.all(np.array(v + (-v) == zero))
Out[89]:
Let's confirm S1 (distributivity of scalar multiplication over vector addition):
In [90]:
alpha = -5.
beta = 7.
In [91]:
np.all(alpha * (u + v) == alpha * u + alpha * v)
Out[91]:
S2 (distributivity of scalar multiplication over field addition):
In [92]:
np.all((alpha + beta) * v == alpha * v + beta * v)
Out[92]:
S3 (compatibility of scalar multiplication with field multiplication):
In [93]:
np.all(alpha * (beta * v) == (alpha * beta) * v)
Out[93]:
Finally, let's confirm S4 (identity element of scalar multiplication):
In [94]:
np.all(1 * v == v)
Out[94]:
There are some more unexpected vector spaces, such as the vector space of functions. Consider functions from real numbers to real numbers, $f: \mathbb{R} \rightarrow \mathbb{R}$, $g: \mathbb{R} \rightarrow \mathbb{R}$. We can define the sum of these functions as another function, $$(f + g): \mathbb{R} \rightarrow \mathbb{R},$$ such that it maps its argument $x$ to the sum of $f(x)$ and $g(x)$: $$f + g: x \mapsto f(x) + g(x).$$ We can similarly define the product of a function $f$ with a scalar $\alpha \in \mathbb{R}$: $$\alpha f: \mathbb{R} \rightarrow \mathbb{R}, \quad \alpha f: x \mapsto \alpha f(x).$$ It is then easy to see that, functions, with addition and scalar multiplication defined in this manner, satisfy the axioms of a vector space:
In [95]:
u = lambda x: 2. * x
In [96]:
v = lambda x: x * x
In [97]:
w = lambda x: 3. * x + 1.
In [98]:
def plus(f1, f2):
return lambda x: f1(x) + f2(x)
A1 (associativity of addition):
In [99]:
lhs = plus(plus(u, v), w)
In [100]:
rhs = plus(u, plus(v, w))
In [101]:
lhs(5.) == rhs(5.)
Out[101]:
In [102]:
lhs(10.) == rhs(10.)
Out[102]:
A2 (commutativity of addition):
In [103]:
plus(u, v)(5.) == plus(v, u)(5.)
Out[103]:
S1 (distributivity of scalar multiplication over vector addition):
In [104]:
def scalar_product(s, f):
return lambda x: s * f(x)
In [105]:
lhs = scalar_product(alpha, plus(u, v))
rhs = plus(scalar_product(alpha, u), scalar_product(alpha, v))
lhs(5.) == rhs(5.)
Out[105]:
We can verify the other axioms in a similar manner.
A weighted (by scalars) sum of vectors is called a linear combination: $$\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \alpha_3 \mathbf{v}_3 + \ldots + \alpha_k \mathbf{v}_k,$$ for example, $$3.5 \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix} + 2.7 \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix} + 2.35 \begin{pmatrix} 1 \\ 1 \\ 1.5 \end{pmatrix}.$$
In [106]:
3.5 * np.array([-3., 3., 5.]) + 2.7 * np.array([25., 7., 13.]) + 2.35 * np.array([1., 1., 1.5])
Out[106]:
Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k$ are said to be linearly independent if none of them can be written as a linear combination of the remaining vectors. Thus $$\begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}, \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1.5 \end{pmatrix}$$ are linearly independent, whereas $$\begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}, \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}, \begin{pmatrix} 34 \\ -2 \\ -2 \end{pmatrix}$$ aren't, because $$\begin{pmatrix} 34 \\ -2 \\ -3 \end{pmatrix} = -3 \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix} + \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}.$$
Vectors are set to span a particular vector space if any vector in that vector space can be written as a linear combination of those vectors.
Consider the vectors $\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$ and $\mathbf{v} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}$.
Can we obtain another vector, say $\mathbf{w} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}$ as a linear combination of $\mathbf{u}$ and $\mathbf{v}$? In other words, can we find the scalars $x_1$ and $x_2$ such that $$x_1 \mathbf{u} + x_2 \mathbf{v} = \mathbf{w}?$$
This seems easy enough: what we really need is $$x_1 \begin{pmatrix} 4 \\ 6 \end{pmatrix} + x_2 \begin{pmatrix} 5 \\ 3 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix},$$ i.e. $$\begin{pmatrix} 4 x_1 \\ 6 x_1 \end{pmatrix} + \begin{pmatrix} 5 x_2 \\ 3 x_2 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix},$$ or $$\begin{pmatrix} 4 x_1 + 5 x_2 \\ 6 x_1 + 3 x_2 \end{pmatrix} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}.$$
The left-hand side and the right-hand side must be equal coordinatewise. Thus we obtain a system of linear equations $$4 x_1 + 5 x_2 = -7,$$ $$6 x_1 + 3 x_2 = 3.$$
From the second linear equation, we obtain $$x_1 = \frac{3 - 3 x_2}{6} = \frac{1 - x_2}{2}.$$ We substitute this into the first linear equation, obtaining $$4 \cdot \frac{1 - x_2}{2} + 5 x_2 = -7,$$ whence $x_2 = -3$, and so $x_1 = \frac{1 - (-3)}{2} = 2$.
Let's check:
In [107]:
u = np.array([4, 6])
v = np.array([5, 3])
x1 = 2.; x2 = -3.
x1 * u + x2 * v
Out[107]:
We notice that there is nothing special about $\mathbf{w} = \begin{pmatrix} -7 \\ 3 \end{pmatrix}$ in the above example. We could take a general $\mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}$ and find such $x_1, x_2$ that $$x_1 \mathbf{u} + x_2 \mathbf{v} = \mathbf{b}.$$
Our linear system then becomes $$4 x_1 + 5 x_2 = b_1,$$ $$6 x_1 + 3 x_2 = b_2.$$
From the second linear equation, we obtain $$x_1 = \frac{b_2 - 3 x_2}{6}.$$ We substitute this into the first linear equation, obtaining $$x_2 = \frac{1}{3} b_1 - \frac{2}{9} b_2,$$ hence $$x_1 = -\frac{1}{6} b_1 + \frac{5}{18} b_2.$$
We can check that these results are consistent with the above when $b_1 = -7$, $b_2 = 3$:
In [108]:
b = np.array([-7, 3])
x = np.array([-1./6. * b[0] + 5./18. * b[1], 1./3. * b[0] - 2./9. * b[1]])
x
Out[108]:
Indeed they are.
A set of vectors that span their vector space and are linearly independent are called a basis for that space.
For example, the vectors $$\mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad \mathbf{e}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$$ span the vector space $\mathbb{R}^2$. Any vector $\mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}$ in $\mathbb{R}^2$ can be written as a linear combination of these vectors, namely as $$\mathbf{b} = b_1 \mathbf{e}_1 + b_2 \mathbf{e}_2.$$
$\{\mathbf{e}_1, \mathbf{e}_2\}$ is what is known as the standard basis for $\mathbb{R}^2$, but there are others. We have already seen that the vectors in $\left\{\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}, \mathbf{v} = \begin{pmatrix} 5 \\ 3 \end{pmatrix}\right\}$ span $\mathbb{R}^2$. In fact, they are linearly independent and also form a basis of $\mathbb{R}^2$.
We have already seen that the change of basis from $\{\mathbf{e}_1, \mathbf{e}_2\}$ to $\{\mathbf{u}, \mathbf{v}\}$ is given by the above solution to the linear system, namely $$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} -\frac{1}{6} b_1 + \frac{5}{18} b_2 \\ \frac{1}{3} b_1 - \frac{2}{9} b_2 \end{pmatrix}.$$ Thus we can rewrite $$\mathbf{b} = b_1 \mathbf{e}_1 + b_2 \mathbf{e}_2 = x_1 \mathbf{u} + x_2 \mathbf{v}.$$
Change of basis forms the basis (no pun intended) of many statistical and machine learning techniques, such as the principal components analysis (PCA).
It can be shown that all bases (that's the plural of the word basis) for a particular vector space have the same number of elements called the dimension of that vector space. Thus $\mathbb{R}^2$ is two-dimensional, $\mathbb{R}^3$ is three-dimensional, etc. Whereas it can be shown that the vector space of functions, which we introduced above, is infinite-dimensional. The study of infinite-dimensional vector spaces gives rise to a separate discipline called infinite-dimensional analysis.
If you are interested in infinite-dimensional analysis, have a look at Charalambos D. Aliprantis's book Infinite Dimensional Analysis: A Hitchhiker's Guide: https://www.amazon.co.uk/Infinite-Dimensional-Analysis-Hitchhikers-Guide/dp/3540326960/
A subset of a vector space is itself a vector space if it contains the zero vector and is closed under addition and scalar multiplication. It is then called a subspace of the original space.
For example, all multiples of $\mathbf{u} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}$, $$\alpha \mathbf{u}, \quad \alpha \in \mathbb{R},$$ form a one-dimensional subspace of the two dimensional vector space $\mathbb{R}^2$. (Note that the zero vector, which must be present in any vector space, is present in this subspace, since $0 \cdot \mathbf{u}$ is in it.)
Consider a subspace of a three-dimensional Euclidean space. Suppose that it contains a vector, say $$\mathbf{v}_1 = \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}.$$
In [109]:
v1 = np.array([-3, 3, 5])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
ax.scatter(v1[0], v1[1], v1[2]);
But, by the axioms of a vector space, it must also contain all scalar multiples of $\mathbf{v}_1$, e.g. $-1.5 \cdot \mathbf{v}_1$, $1.5 \cdot \mathbf{v}_1$, $3 \cdot \mathbf{v}_1$, $\ldots$.
In [110]:
v1 = np.array([-3, 3, 5])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
ax.scatter(v1[0], v1[1], v1[2])
scalars = np.linspace(-3, 3, 25)
multiples_of_v1 = np.array([s * v1 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2]);
Mathematically, this means that our subspace contains the line going through $\mathbf{v}_1$ and through the origin.
Our subspace of $\mathbb{R}^3$ could well be that line.
But what if it contains another vector, $\mathbf{v}_2$, which is linearly independent of $\mathbf{v}_1$, say $\mathbf{v}_2 = \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}$?
The same argument as above applies, so our vector space must also contain all scalar multiples of $\mathbf{v}_2$:
In [111]:
v1 = np.array([-3, 3, 5])
v2 = np.array([25, 7, 13])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
scalars = np.linspace(-3, 3, 50)
multiples_of_v1 = np.array([s * v1 for s in scalars])
multiples_of_v2 = np.array([s * v2 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2])
ax.scatter(multiples_of_v2[:,0], multiples_of_v2[:,1], multiples_of_v2[:,2])
ax.scatter(v1[0], v1[1], v1[2], s=100)
ax.scatter(v2[0], v2[1], v2[2], s=100);
But the axioms of the vector space require that the vector space also contain all linear combinations of $\mathbf{v}_1$ and $\mathbf{v}_2$, $$\alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2.$$ Geometrically, this means that it must contain the plane containing $\mathbf{v}_1$, $\mathbf{v}_2$ and all their linear combinations (including the origin).
In [112]:
v1 = np.array([-3, 3, 5]); v2 = np.array([25, 7, 13])
plt3d = plt.figure().gca(projection='3d')
ax = plt.gca()
scalars = np.linspace(-3, 3, 50)
multiples_of_v1 = np.array([s * v1 for s in scalars])
multiples_of_v2 = np.array([s * v2 for s in scalars])
ax.scatter(multiples_of_v1[:,0], multiples_of_v1[:,1], multiples_of_v1[:,2])
ax.scatter(multiples_of_v2[:,0], multiples_of_v2[:,1], multiples_of_v2[:,2])
ax.scatter(v1[0], v1[1], v1[2], s=100)
ax.scatter(v2[0], v2[1], v2[2], s=100)
xx, yy = np.meshgrid(range(-80, 80), range(-20, 20))
z = (4. * xx + 164 * yy) * (1. / 96.)
ax.plot_surface(xx, yy, z, alpha=0.2);
What if the subspace contains another vector, which is linearly independent of both $\mathbf{v}_1$ and $\mathbf{v}_2$, say $\mathbf{v}_3$?
If that is the case, since we can obtain any point in $\mathbb{R}^3$ as a linear combination of three vectors, our subspace is actually equal to $\mathbb{R}^3$.
It is worth mentioning that the subspace containing one and only one vector, $\begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}$, satisfies all the axioms of a vector space and is therefore a (trivial) subspace of $\mathbb{R}^3$.
Thus, geometrically, a subspace of the Euclidean space $\mathbb{R}^3$ is either the zero vector, or a line through the origin, or a plane through the origin, or the entire $\mathbb{R}^3$.
We can extend this reasoning to higher-dimensional Euclidean spaces.
A subspace of the Euclidean space $\mathbb{R}^4$ is either the zero vector, or a line through the origin, or a plane through the origin, or a three-dimensional hyperplane through the origin (which we cannot easily visualize), or the entire $\mathbb{R}^4$.
Not every line in $\mathbb{R}^3$ is a subspace of $\mathbb{R}^3$. Only the lines going through the origin, as a vector space must contain the zero vector.
Similarly, not every plane in $\mathbb{R}^3$ is a subspace of $\mathbb{R}^3$. Only the planes containing the origin.
Consider a plane in 3D, which contains the point $$\mathbf{p} = \begin{pmatrix} p_1 \\ p_2 \\ p_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}$$ and such that the vector $$\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \\ 2 \end{pmatrix}$$ is orthogonal to that plane.
A vector is orthogonal to a plane iff it is orthogonal to vectors joining any two points in that plane.
In particular, for any point $$\mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix},$$ the vector from it to $\mathbf{p}$ is orthogonal to $\mathbf{v}$, $$\langle (\mathbf{x} - \mathbf{p}), \mathbf{v} \rangle = 0.$$
This equation is sufficient for us to work out the equation of the plane:
Putting this together with the right-hand side, we get the equation of the plane, $$x_1 + x_2 + 2x_3 = 9.$$
Every point in that plane satisfies this equation. In particular, we can check that $\mathbf{p}$ satisfies it: $$p_1 + p_2 + 2p_3 = 1 + 2 + 2 \cdot 3 = 9.$$
And conversely, if a point satisfies this equation, it belongs to that plane. For example, if we pick $x_1 = 2, x_2 = 1$, then $$2 + 1 + 2x_3 = 9$$ implies that $x_3 = 3$. The point $\begin{pmatrix} 2 \\ 1 \\ 3 \end{pmatrix}$ also belongs to this plane.
You may be wondering, how we obtained the equation of the plane containing the vectors $\mathbf{v}_1 = \begin{pmatrix} -3 \\ 3 \\ 5 \end{pmatrix}$ and $\mathbf{v}_2 = \begin{pmatrix} 25 \\ 7 \\ 13 \end{pmatrix}$ above.
The so-called cross-product operation on two vectors gives a vector that is orthogonal to both of them:
In [113]:
v1 = np.array([-3., 3., 5.])
v2 = np.array([25., 7., 13.])
np.cross(v1, v2)
Out[113]:
This gives us a vector that is orthogonal (normal) to the plane containing our two points. We can then proceed to obtain the equation of the plane, just as we did above.