The expected value of a function $\phi(X): \mathcal{X} \rightarrow \mathcal{R}$ is defined as
$$ E[\phi(X)] = \int \phi(X) p(X) dx $$Intuitively, this is the average value that the function $\phi$ take when given random inputs $X$ with a distribution of $p(X)$.
Some test functions are special
Suppose we are given a dataset $X = \{x_1, x_2, \dots, x_N\}$
$$ \tilde{p}(x) = \frac{1}{N}\sum_{i=1}^N \delta(x - x_i) $$Dataset of pairs $X = \{(x_1,y_1), (x_2,y_2), \dots, (x_N, y_N)\}$
$$ \tilde{p}(x, y) = \frac{1}{N}\sum_{i=1}^N \delta(x - x_i)\delta(y - y_i) $$Compute expectations with respect to the emprical distribution
$$ E[x] = \int x \tilde{p}(x) dx = \int x \frac{1}{N}\sum_{i=1}^N \delta(x - x_i) dx = \frac{1}{N}\sum_{i=1}^N x_i \equiv s_1/N $$$$ Var[x] = \int (x-E[x])^2 \tilde{p}(x) dx = E[x^2] - m^2 = \frac{1}{N}\sum_{i=1}^N x^2_i - \frac{1}{N^2}s_1^2 \equiv \frac{1}{N}s_2 - \frac{1}{N^2}s_1^2 $$Here, $s_1 = \sum_{i=1}^N x_i$ and $s_2 = \sum_{i=1}^N x_i^2$ are known as the first and second (sample) moments, respectively.
A generative model is a computational procedure with random inputs that describes how to simulate a dataset $X$. The model defines a joint distribution of the variables of the dataset and possibly additional hidden (unobserved) variables and parameters $H$ to aid the data generation mechanism, denoted as $p(X, H)$.
A new terminology for a generative model is a probabilistic program.
Given a generative model and a dataset, the posterior distribution over the hidden variables can be computed via Bayesian inference $P(H|X)$. The hidden variables and parameters provide explanations for the observed data.
In [2]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
N = 50
u = np.cos(2*np.pi*np.random.rand(N))
plt.figure(figsize=(6,2))
plt.plot(u, np.zeros_like(u), 'o')
plt.show()
In [3]:
N = 500
u = np.cos(2*np.pi*np.random.rand(N))
plt.figure(figsize=(6,2))
plt.hist(u, bins=30)
plt.show()
In [4]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
N = 100
sigma = 0.8
theta = np.mat([3,-1]).T
u = np.cos(2*np.pi*np.random.rand(1,N))
X = theta*u
X = X + sigma*u*np.random.randn(X.shape[0],X.shape[1])
plt.figure(figsize=(6,6))
plt.plot(X[0,:],X[1,:],'k.')
plt.show()
In [71]:
import seaborn as sns
import pandas as pd
sns.set(color_codes=True)
plt.figure(figsize=(5,5))
df = pd.DataFrame(X.T, columns=['x','y'])
sns.jointplot(x="x", y="y", data=df);
plt.show()
In [23]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
N = 100
sigma_1 = 0.1
sigma_2 = 0.0
mu_1 = 5
mu_2 = 5
s_1 = 1
s_2 = 3
w = 2*np.pi*np.random.rand(1,N)
u1 = mu_1 + s_1*np.cos(w) + sigma_1*np.random.randn(1,N)
u2 = mu_2 + s_2*np.sin(w) + sigma_2*np.random.randn(1,N)
plt.figure(figsize=(6,6))
plt.plot(u1, u2,'k.')
plt.axis('equal')
plt.show()
for i in range(N):
print('%3.3f %3.3f' % (u1[0,i],u2[0,i] ))
In [104]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
N = 100
r = 0.01
u = 2*np.random.randn(1,N)-1
x = u**2 + np.sqrt(r)*np.random.randn(1,N)
plt.figure(figsize=(6,6))
plt.plot(u,x,'k.')
plt.xlabel('u')
plt.ylabel('x')
plt.show()
In [88]:
%matplotlib inline
from IPython.display import display, Math, Latex
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from notes_utilities import pnorm_ball_points
from notes_utilities import mat2latex
import pandas as pd
import seaborn as sns
# Number of points
N = 30
# Parameters
A = np.mat('[3;-1]')
r = 0.1
Dh = 1
Dx = 2
h = np.random.randn(Dh, N)
y = A*h + np.sqrt(r)*np.random.randn(Dx, N)
#sns.jointplot(x=y[0,:], y=y[1,:]);
plt.figure(figsize=(5,5))
plt.scatter(y[0,:],y[1,:])
plt.xlabel('y_0')
plt.ylabel('y_1')
plt.show()
In [81]:
%matplotlib inline
from IPython.display import display, Math, Latex
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from notes_utilities import pnorm_ball_points
from notes_utilities import mat2latex
import pandas as pd
#import seaborn as sns
#sns.set(color_codes=True)
# Number of points
N = 10
# Parameters
a = -0.8
R = 0.1
x = np.random.randn(N)
y = a*x + np.sqrt(R)*np.random.randn(N)
sns.jointplot(x=x, y=y);
We can work out the joint distribution as:
\begin{eqnarray} \left(\begin{array}{c} x \\ y \end{array}\right) \sim \mathcal{N}\left( \left(\begin{array}{c} 0 \\ 0 \end{array}\right) , \left(\begin{array}{cc} 1 & a\\ a & a^2 + R \end{array}\right) \right) \end{eqnarray}
In [107]:
%matplotlib inline
from IPython.display import display, Math, Latex
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from notes_utilities import pnorm_ball_points
from notes_utilities import mat2latex
import pandas as pd
#import seaborn as sns
#sns.set(color_codes=True)
# Number of points
N = 10
# Parameters
a = -0.8
R = 0.1
# Theoretical Covariance
Cov = np.mat([[1,a],[a, a**2+R]])
x = np.random.randn(N)
y = a*x + np.sqrt(R)*np.random.randn(N)
np.set_printoptions(precision=4)
X = np.c_[x,y].T
N = X.shape[1]
print('True Covariance')
display(Math(r'\mu='+mat2latex(np.mat('[0;0]'))))
display(Math(r'\Sigma='+mat2latex(Cov)))
print('The ML Estimates from Data')
mean_est = np.mean(X,axis=1,keepdims=True)
cov_est = np.cov(X,bias=True)
display(Math(r'\bar{m}='+mat2latex(mean_est)))
display(Math(r'\bar{S}='+mat2latex(cov_est)))
print('The estimate when we assume that we know the true mean')
cov2_est = X.dot(X.T)/N
display(Math(r'\bar{\Sigma}='+mat2latex(cov2_est)))
plt.figure(figsize=(8,8))
plt.plot(x, y, '.')
ax = plt.gca()
ax.axis('equal')
ax.set_xlabel('x')
ax.set_ylabel('y')
# True mean and Covariance
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(Cov))
ln = plt.Line2D(dx,dy, color='r')
ln.set_label('True')
ax.add_line(ln)
ln = plt.Line2D([0],[0], color='r', marker='o')
ax.add_line(ln)
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(Cov), mu=mean_est)
ln = plt.Line2D(dx,dy, color='b')
ln.set_label('ML Estimate')
ax.add_line(ln)
ln = plt.Line2D(mean_est[0],mean_est[1], color='b', marker='o')
ax.add_line(ln)
# Estimate conditioned on knowing the true mean
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(cov2_est))
ln = plt.Line2D(dx,dy, color='g')
ln.set_label('Conditioned on true mean')
ax.add_line(ln)
ln = plt.Line2D([0],[0], color='g', marker='o')
ax.add_line(ln)
Lim = 6
ax.set_ylim([-Lim,Lim])
ax.set_xlim([-Lim,Lim])
ax.legend()
plt.title('Covariance Matrix Estimates')
plt.show()
In [76]:
EPOCH = 20
fig = plt.figure(figsize=(6,6))
ax = fig.gca()
Lim = 6
ax.set_ylim([-Lim,Lim])
ax.set_xlim([-Lim,Lim])
for i in range(EPOCH):
x = np.random.randn(N)
y = a*x + np.sqrt(R)*np.random.randn(N)
X = np.c_[x,y].T
cov2_est = X.dot(X.T)/N
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(cov2_est))
ln = plt.Line2D(dx,dy, color='g')
ax.add_line(ln)
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(Cov))
ln = plt.Line2D(dx,dy, color='r', linewidth=3)
ax.add_line(ln)
plt.show()
Every green ellipse corresponds to an estimated covariance $\Sigma^{(i)}$ from each new dataset $X^{(i)}$ sampled from the data distribution. The picture suggests that the true covariance could be somehow obtained as the average ellipse.
An estimator is called unbiased, if the true parameter is exactly the expected value of the estimator. Otherwise, the estimator is called biased.
The variance of the estimator is the amount of fluctuation around the mean. Ideally, we wish it to be small, in fact zero. However, obtaining a zero variance turns out to be impossible when the bias is zero. The variance is always greater or equal to a positive quantity called the Cramer-Rao bound.
In [77]:
EPOCH = 100
M = N
x = np.random.randn(N+M)
y = a*x + np.sqrt(R)*np.random.randn(N+M)
fig = plt.figure(figsize=(6,6))
ax = fig.gca()
Lim = 6
ax.set_ylim([-Lim,Lim])
ax.set_xlim([-Lim,Lim])
for i in range(EPOCH):
idx = np.random.permutation(N+M)
X = np.c_[x[idx[0:N]],y[idx[0:N]]].T
cov2_est = X.dot(X.T)/N
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(cov2_est))
ln = plt.Line2D(dx,dy, color='g')
ax.add_line(ln)
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(Cov))
ln = plt.Line2D(dx,dy, color='r', linewidth=3)
ax.add_line(ln)
plt.show()
In [67]:
EPOCH = 20
fig = plt.figure(figsize=(6,6))
ax = fig.gca()
Lim = 6
ax.set_ylim([-Lim,Lim])
ax.set_xlim([-Lim,Lim])
x = np.random.randn(N)
y = a*x + np.sqrt(R)*np.random.randn(N)
X = np.c_[x,y].T
cov2_est = X.dot(X.T)/N
W = np.linalg.cholesky(cov2_est)
plt.plot(x,y,'.')
for i in range(EPOCH):
U = W.dot(np.random.randn(2,N))
S = U.dot(U.T)/N
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(S))
ln = plt.Line2D(dx,dy, color='k')
ax.add_line(ln)
dx,dy = pnorm_ball_points(3*np.linalg.cholesky(Cov))
ln = plt.Line2D(dx,dy, color='r', linewidth=3)
ax.add_line(ln)
plt.show()
In [46]:
from notes_utilities import mat2latex
print(mat2latex(np.mat([[1,0],[0,1]])))