In [1]:
%matplotlib inline
import numpy as np
from numpy.random import rand, randn
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
# Data generation
n, d, k = 100, 2, 2
np.random.seed(20)
X = rand(n, d)
means = [rand(d) * 0.5 + 0.5 , - rand(d) * 0.5 + 0.5] # for better plotting when k = 2
S = np.diag(rand(d))
sigmas = [S]*k
The following was not clear in the instructions: The function you had to implement was compute_log_p
, so you should have transformed the expression by applying a log first.
It wouldn't have changed the result of the graph though.
If you skipped this step and completed this exercise using for loops, you probably did something like the next cell.
The function to implement: $$p(x | \mu, \Sigma) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu)\right)$$
In [2]:
def compute_p_forloop(X, mean, sigma):
[n, d] = np.shape(X)
# The constant (same for all samples) term
c = 2*np.power(np.pi, d/2)*np.power(np.linalg.det(sigma), 0.5)
invSigma = np.linalg.inv(sigma)
result = np.zeros((n,))
for i in range(n):
xmu = X[i] - mean
result[i] = 1/c * np.exp( - 0.5 * (xmu).T.dot(invSigma).dot(xmu))
return result
And it works pretty well. Comparing with our solution, the difference is very small, around $10^{-15}$
In [3]:
### -----
### Applying log to our function + Solution
def compute_log_p_forloop(X, mean, sigma):
return np.log(compute_p_forloop(X, mean, sigma))
def compute_log_p_solution(X, mean, sigma):
dxm = X - mean
return -0.5 * np.sum(dxm * np.dot(dxm, np.linalg.inv(sigma)), axis=1) - np.log(2 * np.pi) * (d / 2) - 0.5 * np.log(np.linalg.det(sigma))
### -----
### Difference between solution and this implementation
a = compute_log_p_forloop(X, means[0], sigmas[0])
b = compute_log_p_solution(X, means[0], sigmas[0])
print("|a-b|_2 =", np.linalg.norm(a-b))
### -----
### Print the graphs
def makeGraph(function, X, means, sigmas):
log_ps = [function(X, m, s) for m, s in zip(means, sigmas)]
assignments = np.argmax(log_ps, axis=0)
colors = np.array(['red', 'green'])[assignments]
plt.title(function.__name__)
plt.scatter(X[:, 0], X[:, 1], c=colors, s=100)
plt.scatter(np.array(means)[:, 0], np.array(means)[:, 1], marker='*', s=200)
plt.show()
makeGraph(compute_log_p_forloop, X, means, sigmas)
makeGraph(compute_log_p_solution, X, means, sigmas)
Why log? - Our goal is to compare probabilities to see to which of the two stars a point belongs.
But the formula for the probability is a bit heavy, with multiplications and exponents.
By applying a log transform, we get additions and multiplications, which is easer to handle, and does not impact the comparison - if a > b
, log(a) > log(b)
.
(If it does not make sense - don't worry - you'll see this in the coming lectures)
Notation: $x$ is a sample, so $[D \times 1]$, $X$ is the matrix, so $[N \times D]$
$$p(x | \mu, \Sigma) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu)\right)$$$$\log p(x | \mu, \Sigma) = \log \left[ \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu)\right) \right]$$$$\log p(x | \mu, \Sigma) = \log \left[ \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \right] + \log \left[\exp\left(-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu)\right) \right]$$$$\log p(x | \mu, \Sigma) = -\log \left[ (2\pi)^{d/2} |\Sigma|^{1/2} \right] - \frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu)$$This gives us the following expression, $$\log p(x | \mu, \Sigma) = - \frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu) + c(\mu,\Sigma),$$ Where $c(\mu, \Sigma) = -\log \left[ (2\pi)^{d/2} |\Sigma|^{1/2} \right] = - \frac{1}{2}\left[d\log(2 \pi) + \log(|\Sigma|)\right]$
Steps used:
In [4]:
def compute_log_p_forloop(X, mean, sigma):
[n, d] = np.shape(X)
result = np.zeros((n,))
constant = -0.5 * (d*np.log(2*np.pi) + np.log(np.linalg.det(sigma)))
invSigma = np.linalg.inv(sigma)
for i in range(n):
xmu = X[i] - mean
result[i] = -(1/2) * (xmu).T.dot(invSigma).dot(xmu) + constant
return result
### -----
### Difference between solution and this implementation
a = compute_log_p_forloop(X, means[0], sigmas[0])
b = compute_log_p_solution(X, means[0], sigmas[0])
print("|a-b|_2 =", np.linalg.norm(a-b))
### -----
### Print the graphs
makeGraph(compute_log_p_forloop, X, means, sigmas)
makeGraph(compute_log_p_solution, X, means, sigmas)
Now, how can we use numpy to avoid using for loops?
This is firstly an algebra problem, not a programming one - the programming is just translation. Our main tools are
But first, we need to know what we are after. The expression for the log is $$\log p(x | \mu, \Sigma) = - \frac{1}{2}(x - \mu)^T\Sigma^{-1}(x-\mu) + c(\mu,\Sigma),$$ Where $c(\mu, \Sigma) = -\log \left[ (2\pi)^{d/2} |\Sigma|^{1/2} \right] = - \frac{1}{2}\left[d\log(2 \pi) + \log(|\Sigma|)\right]$
And our function compute_log_p
should return a vector of N
elements, which is
Let us focus on the matrix part of the formula, which we'll call $M$, after some simplification:
[N x D]
matrix).D
-elements vectorWe have
$$ M= \begin{pmatrix} A_1^T\Sigma^{-1}A_1\\ ...\\ A_N^T\Sigma^{-1}A_N\\ \end{pmatrix} $$Can we simplify the expression? For a single row, we have
$$M_i = A_i^T \Sigma^{-1} A_i = \begin{pmatrix} A_{i1} & ... & A_{iD} \end{pmatrix} \begin{pmatrix} \Sigma_{11} & ... & \Sigma_{1D} \\ \vdots & \ddots & \vdots \\ \Sigma_{D1} & ... & \Sigma_{DD} \\ \end{pmatrix}^{-1} \begin{pmatrix} A_{i1} \\ ... \\ A_{iD} \end{pmatrix} \text{ - Dimensions: } [1 x D] [D x D] [D x 1] \text{ } $$From here, there are two path - simplification using the properties of the matrices we have, and a more brute force approach.
The thing to note is that $\Sigma$ is a diagonal matrix (Data loading code: S = np.diag(rand(d))
). Therefore, we have
$$M_i = A_i^T \Sigma^{-1} A_i =
\begin{pmatrix}
A_{i1} & ... & A_{iD}
\end{pmatrix}
\begin{pmatrix}
1/\Sigma_{11} & 0 & ... & 0 \\
0 & \ddots & \ddots & \vdots \\
\vdots & \ddots & \ddots & \vdots \\
0 & ... & ... & 1/\Sigma_{DD} \\
\end{pmatrix}
\begin{pmatrix}
A_{i1} \\ ... \\ A_{iD}
\end{pmatrix}
$$
Notice that on those last formulations, we have a [1 x D] [D x 1]
system. The [D x 1]
matrix is a transformation we apply to the [1 x D]
input, and we can apply it to all samples by providing a [N x D]
input, as follow.
Or, in (pseudo) code,
A = X - mu
A2 = A * A # element-wise multiplication
invSigma = np.linalg.inverse(Sigma)
diagInvSigma = np.diag(invSigma)
M = A2.dot(diagInvSigma)
compute_log_p(X, mu, sigma) = - 0.5 * M + c(mu, sigma)
In [5]:
def compute_log_p_sol1(X, mean, sigma):
[n, d] = np.shape(X)
constant = - 0.5 * (d * np.log(2 * np.pi) + np.log(np.linalg.det(sigma)))
diagInvSigma = np.diag(np.linalg.inv(sigma))
xmu = X - mean
xmu2 = xmu*xmu
return -0.5 * xmu2.dot(diagInvSigma) + constant
#return result
### -----
### Difference between solution and this implementation
a = compute_log_p_sol1(X, means[0], sigmas[0])
b = compute_log_p_solution(X, means[0], sigmas[0])
print("|a-b|_2 =", np.linalg.norm(a-b))
### -----
### Print the graphs
makeGraph(compute_log_p_sol1, X, means, sigmas)
makeGraph(compute_log_p_solution, X, means, sigmas)
If your $\Sigma$ matrix is less nice, you might need to take some more steps. Again, looking at a single sample, we have
$$M_i = A_i^T \Sigma^{-1} A_i = \begin{pmatrix} A_{i1} & ... & A_{iD} \end{pmatrix} \begin{pmatrix} \Sigma_{11} & ... & \Sigma_{1D} \\ \vdots & \ddots & \vdots \\ \Sigma_{D1} & ... & \Sigma_{DD} \\ \end{pmatrix}^{-1} \begin{pmatrix} A_{i1} \\ ... \\ A_{iD} \end{pmatrix} $$Using $\Sigma' = \Sigma^{-1}$ to avoid confusion between the matrix inverse and the elementwise inverse, this gives
$$M_n = \begin{pmatrix} A_{n1} & ... & A_{nD} \end{pmatrix} \begin{pmatrix} \sum_{i=1}^D a_{ni} \Sigma'_{1i} \\ \vdots \\ \sum_{i=1}^D a_{ni} \Sigma'_{Di} \\ \end{pmatrix} = \sum_{i=1}^D \sum_{j=1}^D a_{ni} a_{nj} \Sigma'_{ij} $$In last section we found a nice way to split the one sample the expression to a [1 x D][D x 1]
matrix multiplication, with the inputs on the left [1 x D]
matrix and the transformation being the [D x 1]
matrix; extanding to a [N x D]
input is easy. Here, we are not assuming that $\Sigma$ is diagonal, which complicates a little bit the system. We will need to bring two tools out of the box:
[A x B]
matrix into a [A x 1]
matrix by summing all columns, for each row, as follow:
$$
\texttt{column summation of }
\begin{pmatrix}
y{11} & y{12} & ... & y{1B} \
y{21} & y{22} & ... & y{2B}*
operator does on numpy matrices - written $\odot$ here,
$$
\begin{pmatrix}
y_{1} & y_{2} & ... & y_{B}
\end{pmatrix}
\odot
\begin{pmatrix}
z{1} & z{2} & ... & z_{B} Using those tools, we can work out a better representation for our formula. We first expand one of the summation by doing a reverse-column-summation to go from our scalar result to a [D x 1]
matrix.
$$
= \begin{pmatrix} A_{n1} & A_{n2} & ... A_{nD} \end{pmatrix} \odot \begin{pmatrix} \sum_{j=1}^D A_{nj} \Sigma'_{1j} & \sum_{j=1}^D A_{nj} \Sigma'_{2j} & ... \sum_{j=1}^D A_{nj} \Sigma'_{Dj} \end{pmatrix} $$
$$ = A_n \odot \begin{pmatrix} \sum_{j=1}^D A_{nj} \Sigma'_{1j} & \sum_{j=1}^D A_{nj} \Sigma'_{2j} & ... \sum_{j=1}^D A_{nj} \Sigma'_{Dj} \end{pmatrix} $$Now, we can see that the [1 x D]
matrix on the right is the result of $A_n\Sigma'$ ([1 x D][D x D]
).
And this is it. If instead of $A_n$, we plug $A$ into the system, we get a [N x D]
$\odot$ [N x D][D x D]
system, the solution for every sample in one line,
$$A \odot (A \Sigma')$$
In [6]:
def compute_log_p_solution(X, mean, sigma):
c = - np.log(2 * np.pi) * (d / 2) - 0.5 * np.log(np.linalg.det(sigma))
A = X - mean
invSigma = np.linalg.inv(sigma)
return -0.5 * np.sum(A * (A.dot(invSigma)), axis=1) + c
makeGraph(compute_log_p_solution, X, means, sigmas)
Time.
For loops are much more expensive than lingear algebra - for which we have specialized libraries, Code that correctly uses linear algebra can run 10x to 100x faster than a for-loop program for the same output.
Here is a comparison of the different solutions,
Most of the benefit comes from using numpy correctly
In [ ]:
import time
def generateData(n, d, k):
X = rand(n, d)
means = [rand(d) * 0.5 + 0.5 , - rand(d) * 0.5 + 0.5] # for better plotting when k = 2
S = np.diag(rand(d))
sigmas = [S]*k
return X, means, sigmas
matrix_time = np.zeros(10,)
forloop_time = np.zeros(10,)
i = 0
Ns = np.logspace(0, 7, num=10)
for N in Ns:
X_n, means_n, sigmas_n = generateData(np.floor(N), 2, 2)
start_time = time.time()
compute_log_p_solution(X_n, means_n[0], sigmas_n[0])
matrix_time[i] = time.time() - start_time
start_time = time.time()
compute_log_p_forloop(X_n, means_n[0], sigmas_n[0])
forloop_time[i] = time.time() - start_time
i += 1
In [8]:
plt.title('Computation time comparison\n Assuming diagonal matrix vs. no assumption')
h1 = plt.plot(np.floor(Ns), forloop_time, label='For loop')
h2 = plt.plot(np.floor(Ns), matrix_time, label='Matrix')
plt.xscale("log", nonposx='clip')
plt.yscale("log", nonposx='clip')
plt.ylabel('Time (s)')
plt.legend(['For loop', 'Matrix'])
plt.grid()
plt.show()
In [ ]:
import datetime
import timeit
def generateData(n, d, k):
X = rand(n, d)
means = [rand(d) * 0.5 + 0.5 , - rand(d) * 0.5 + 0.5] # for better plotting when k = 2
S = np.diag(rand(d))
sigmas = [S]*k
return X, means, sigmas
matrix_time = np.zeros(10,)
diag_time = np.zeros(10,)
i = 0
Ns = np.logspace(0, 7, num=10)
for N in Ns:
X_n, means_n, sigmas_n = generateData(np.floor(N), 2, 2)
start_time = time.time()
compute_log_p_solution(X_n, means_n[0], sigmas_n[0])
matrix_time[i] = time.time() - start_time
start_time = datetime.datetime.now()
compute_log_p_sol1(X_n, means_n[0], sigmas_n[0])
diag_time[i] = (datetime.datetime.now() - start_time).total_seconds()
i += 1
In [10]:
plt.title('Computation time comparison\n Assuming diagonal matrix vs. no assumption')
h1 = plt.plot(np.floor(Ns), diag_time, label='Assuming diagonal covariance')
h2 = plt.plot(np.floor(Ns), matrix_time, label='Matrix')
plt.xscale("log", nonposx='clip')
plt.yscale("log", nonposx='clip')
plt.ylabel('Time (s)')
plt.legend(['Assuming diagonal covariance', 'No assumption'])
plt.grid()
plt.show()