Six independent observations from a Gaussian distribution $N(\mu, \sigma^{2})$ are given by {1.017, 2.221, 1.416, 0.641, 0.124, 1.728}. If $\sigma=0.75$ is known, find the symmetric confidence intervals for $\mu$ with confidence levels $1-\alpha$ = 0.68, 0.90, and 0.95, respectively.
The estimator $\hat{\mu}$ that will be used is the sample mean of the observations. The observations are all independent and are all Gaussian, therefore, the sample mean will follow a Gaussian distribution too, with parameters:
$$ \mu_{\hat{\mu}}=E(\hat{\mu})=E\big(\frac{\sum x_i}{n}\big) = \frac{1}{n} \sum E(x_i) = \frac{1}{n} \sum \mu = \mu \\ \sigma^2_{\hat{\mu}} =Var(\hat{\mu})=Var\big(\frac{\sum x_i}{n}\big) = \frac{1}{n^2} \sum Var(x_i) = \frac{\sigma^2}{n} $$where $x_i$ are the observations which all follow the same distribution.
Thus, the probability that a symetric interval $[\hat{\mu}-\delta,\hat{\mu}+\delta]$ contains the real value $\mu$ will be:
$$ 1-\alpha = P(\hat{\mu}-\delta<\mu<\hat{\mu}+\delta)=P(\mu-\delta<\hat{\mu}<\mu+\delta) =\frac{1}{\sqrt{2\pi}\sigma_{\hat{\mu}}} \int_{\mu-\delta}^{\mu+\delta} x e^{-\frac{(x-\mu)^2}{2\sigma_{\hat{\mu}}^2}} dx = erf\bigg(\frac{\delta}{\sqrt{2}\sigma_{\hat{\mu}}}\bigg)$$which can be rewritten as a root finding problem $C(\delta)=0$
\begin{equation} erf\bigg(\frac{\delta}{\sqrt{2}\sigma_{\hat{\mu}}}\bigg)-1+\alpha = 0 \end{equation}
In [1]:
%matplotlib inline
#%matplotlib # [ORIOL]: Para 3d plots creo que es mucho mejor así, sin el inline
import numpy as np
import math
import scipy.stats as stat # extra statistical functions (the basic are included in numpy)
import scipy.optimize as opt # optimization and root finding package
import scipy.misc as msc
import scipy.special as spc
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
from matplotlib import cm
import matplotlib.colors as colors
In [2]:
def find_delta(delta, alpha, sigma):
x = math.erf(delta/(np.sqrt(2)*sigma)) -1 + alpha
return x
If sigma is known, then:
In [3]:
# Data:
obs = [1.1017, 2.221, 1.416, 0.641, 0.124, 1.728]
alpha = 1-np.array([0.68, 0.90, 0.95])
sigma_real = 0.75 # standard deviation of variable x (the observations)
In [4]:
sigma_estimator = sigma_real/np.sqrt(len(obs)) # standard deviation of variable sample mean (or mu hat)
opt_delta = np.empty(len(alpha))
mu=np.mean(obs)
print "Mu = %.3g\tsigma estimator = %.3g\n" %(mu,sigma_estimator)
for i in range(len(alpha)):
opt_delta[i] = opt.fsolve(find_delta, sigma_estimator, args=(alpha[i],sigma_estimator))
print "Delta:", opt_delta[i]
print 'Interval [mu-delta, mu+delta]', mu-opt_delta[i], mu+opt_delta[i]
If we did not have sigma, we would first estimate it and use this estimate as our actual value. The resulting confidence intervals would be an estimation of the real confidence intervals shown above. The code below takes sigma as the std of the measurements.
In [5]:
sigma_estimated = np.std(obs) # estimated value of the standar deviation of variable x
print sigma_estimated
sigma_est_estimator = sigma_estimated/np.sqrt(len(obs)) # estimated value of the standard deviation of variable mu hat
opt_delta_estimated = np.empty(len(alpha))
delta_t_student = np.empty(len(alpha))
mu_hat=np.mean(obs)
print "Mu = %.3g and Sigma = %.3g\n" %(mu, sigma_est_estimator)
for i in range(len(alpha)):
opt_delta_estimated[i] = opt.fsolve(find_delta, sigma_est_estimator, args=(alpha[i],sigma_est_estimator))
delta_t_student[i] = stat.t.interval(1-alpha[i],len(obs),0,sigma_est_estimator)[1]
#if our estimated delta was the true one, our results would match the values in the alpha array.
print "1-alpha = %.3f" %(1-alpha[i])
print "\tGaussian approximation resulting 1-alpha = %.3f" %math.erf(opt_delta_estimated[i]/(np.sqrt(2)*sigma_estimator))
print "\t\tDelta gaussian = ", opt_delta_estimated[i]
print '\t\tInterval [mu-delta, mu+delta]', mu-opt_delta_estimated[i], mu+opt_delta_estimated[i]
print "\tT-student approximation resulting 1-alpha = %.3f" %math.erf(delta_t_student[i]/(np.sqrt(2)*sigma_estimator))
print "\t\tDelta t-student= ", delta_t_student[i]
print '\t\tInterval [mu-delta, mu+delta]', mu-delta_t_student[i], mu+delta_t_student[i]
Let's imagine that an LHC experiment measures the number of events produced in a certain decay channel of the Higgs particle. Let's call $\nu$ the expected number of events, assuming there is no background. Then, the probability to measure exactly $N$ events in an experiment is given by the Poisson probability distribution:
$$ P(N,\nu)= \frac{\nu^{N}}{N!} e^{-\nu} $$Check that $P(N; \nu)$ is properly normalized. Given an observed value of $N (=N_{obs})$, find a frequentist unbiased esitmator for $\nu$. Compute its expected value and variance. [Hint: compute $<N>$ and $<N(N-1)>$ first.]
To check wheter a discrete pmf is normalized we have to sum its values for all possible outcomes, in the case of a Poisson, between 0 and $\infty$.
$$\sum_{n=0}^\infty P(n;\nu)= e^{-\nu} \sum_{n=0}^\infty \frac{\nu^{n}}{n!} = e^{-\nu} e^{\nu} = 1 $$where only the Taylor expansion of an exponential needs to be known.
The maximum likelihood can be used to find an estimator whithin the frequentist interpretation of probability, and in many cases it is unbiased. Starting from the likelihood of $m$ observations:
$$ L(\{ N_{i}\} ;\nu)=\prod_{i=1}^{m}\frac{\nu^{N_{i}}}{N_{i}!}e^{-\nu}=e^{-\nu m}\prod_{i=1}^{m}\frac{\nu^{N_{i}}}{N_{i}!} \\ l(\{ N_{i}\} ;\nu)=-\log L(\{ N_{i}\} ;\nu)=-(-\nu_{m}+\sum_{i=1}^{m} N_{i}\log\nu-\sum_{i=1}^{m} \log N_{i}!)=\nu_{m}-\sum_{i=1}^{m} N_{i}\log\nu-\sum_{i=1}^{m} \log N_{i}! $$once the likelihood and the log likelihood are known, it can be maximized with respect to the parameter that wants to be estimated in order to find the estimator:
$$ \frac{\partial P}{\partial\nu}\bigg\rvert_{\nu=\hat{\nu}ML}=m-\sum\frac{N_{i}}{\hat{\nu}_{ML}}=0 \quad \rightarrow \quad \hat{\nu}_{ML}=\frac{1}{m}\sum_{i=1}^{m}N_{i}\\ $$To help afterwards in the calculation of the expected values of the estimator, first some expected values for the poisson distribution will be calculated:
$$E[N]=\sum_{n=0}^\infty nP(n;\nu)= e^{-\nu} \sum_{n=1}^\infty n \frac{\nu^{n}}{n!} = \nu e^{-\nu} \sum_{n=1}^\infty \frac{\nu^{n-1}}{(n-1)!} = \nu e^ {-\nu} \sum_{n=0}^\infty \frac{\nu^{m}}{m!} = \nu e^{-\nu} e^{\nu} = \nu$$Note that the fisrt term of the sum has $n=0$, therefore it is zero and can be removed from the sum. Therefore, by taking common factor of $\nu$ and changing the index of the sum to $m=n-1$ (and thus changing the limits to $m=0$ to $\infty$) the Taylor expension of the exponential is obtained again.
$$E[N(N-1)]= \sum_{n=0}^\infty n(n-1) P(n;\nu)= \sum_{n=2}^\infty n(n-1) \frac{\nu^{n}}{n!} e^{-\nu}= e^{-\nu} \nu^{2} \sum_{n=2}^\infty \frac{\nu^{n-2}}{(n-2)!}= e^{-\nu} \nu^{2}\nu= \nu^{2}$$$$E[N(N-1)]= E[N^{2}]-E[N] \quad \rightarrow \quad E[N^{2}]= E[N(N-1)]+ E[N]= \nu^{2}+\nu$$$$Var[N]= E[N^{2}]-\big(E[N]\big)^{2}= \nu^{2}+\nu-\nu^{2}= \nu$$Therefore, the expected values of the estimator can be easily calculated:
$$ E[\hat{\nu}_{ML}] =E\bigg[\frac{1}{m}\sum_{i=1}^{m} N_{i}\bigg] =\frac{1}{m}\sum_{i=1}^{m} E[N_{i}] =\frac{1}{m}\sum_{i=1}^{m}\nu=\nu. $$$$Var[\hat{\nu}_{ML}]= Var[\frac{1}{n} \sum_{i=1}^{N} N_{i}]= \frac{1}{n^{2}} \sum_{i=1}^{n} Var[N_{i}]= \frac{1}{n^{2}} n \nu= \frac{\nu}{n}$$[Oriol]: Los enunciados y explicaciones teóricas de los apartados II-V están debajo, en el título correspondiente. Sin embargo, os recomiendo muchísimo que cúando ejecuteís las diferentes partes lo hagaís con el código que tiene de título Part II-VII modificando los plots si es necesario, por así evitar que algo previamente definido (y que la notebook no olvida pero los humanos sí que olvidamos) haga que os salgan resultados sin sentido.
Una alternativa a eso es modificar los nombres para que no haya coincidencias, pero creo que no vale la pena o hacer restart del notebook cada vez que queraís ejecutar código de partes entriores (no celdas sino partes) y volver a ejecutar la primera celda y des de aquí por orden.
Assume now that the pdf for the estimator of $\nu$ is a Gaussian. using the expected value and variance computed in part I, plot the Gaussian pdf corresponding to $N_{obs}=150$.
In [6]:
observed_values = np.array([150])
num_observations = len(observed_values)
nu_estimated = np.mean(observed_values)
# Expected gaussian pdf
mu = nu_estimated
sigma = np.sqrt(nu_estimated)
dist = stat.norm(mu, sigma)
x_pdf = np.linspace(mu-4*sigma, mu+4*sigma, 1000)
#Plot distributions
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x_pdf, dist.pdf(x_pdf), '-k', label = 'Gaussian')
ax.legend()
Out[6]:
Let's now construct the pdf for the estimator of $\nu$ using a MC simulation.
In [7]:
poisson = stat.poisson(nu_estimated)
N = 10**6
mc_sample = poisson.rvs(N)
x_pdf = np.linspace(min(mc_sample)*.8, max(mc_sample)*1.2, 1000)
x_pmf = np.arange(min(mc_sample), max(mc_sample)) # We use the pmf because the
# Poisson takes discrete values
mu=np.mean(mc_sample)
variance =np.var(mc_sample) # equivalent to np.sqrt(np.var(nu_mc_sample))
skewness = stat.skew(mc_sample)
kurtosis=stat.kurtosis(mc_sample)
print 'The Montecarlo generated distribution has:\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu, variance, skewness, kurtosis)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(mc_sample,color='b',normed=1,bins=15,label='Montecarlo poisson p.d.f.')
ax.plot(x_pdf, dist.pdf(x_pdf), '-k', label = 'Gaussian')
ax.plot(x_pmf, poisson.pmf(x_pmf), 'r+', label = 'Poisson')
ax.set_xlim([min(mc_sample), max(mc_sample)])
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$p(\nu|N_{obs})$')
ax.legend()
Out[7]:
Assuming a prior for $\nu$ uniform between 0 and infinity, compute the Bayesian posterior pdf for $\nu$. Note that it is not a Gaussian. Plot the PDF for $N_{\text{obs}}$ and compare it with the results in II and III.
Before calculating the posterior, the three terms involved in its calculation will be specified. For simplicity, from now on everything will be constrained to $m=1$.
$$ L(N ;\nu)=\frac{\nu^{N}}{N!}e^{-\nu} $$$$ \Pi(\nu)= \begin{cases} 0 & \nu<0 \\ c & \nu>0 \end{cases} $$$$ \int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu = \int_{0}^{\infty} \frac{\nu^{N}}{N!}e^{-\nu} c d\nu = \frac{c}{N!} \int_{0}^{\infty} \nu^{N}e^{-\nu}d\nu = \frac{c}{N!} N! =c $$Therefore, the posterior probability is:
$$ P(\nu;N)=\frac{L(N ;\nu)\Pi(\nu)}{\int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu}= \begin{cases} 0 & \nu<0 \\ \frac{\nu^{N}}{N!}e^{-\nu} & \nu>0 \end{cases} $$However, due to the huge powers and exponentials, the computations will be done with the logarithm of $P(\nu;N)$ for $\nu>0$ and eventually exponentiated to plot the real value.
$$ \log\big(P(\nu;N)\rvert_{\nu>0}\big)=-\nu+N\log(\nu)-\log(N!)$$
In [8]:
nu_pdf = np.linspace(min(mc_sample)*.8, max(mc_sample)*1.2, 1000)
Nobs = observed_values[0]
# nu_posterior = np.exp(-nu_pdf )*(nu_pdf**observed_values/msc.factorial(observed_values)) --> Treballarem amb logs per evitar overflow
log_nu_posterior = -nu_pdf + Nobs*np.log(nu_pdf) - np.log(msc.factorial(Nobs))
nu_posterior = np.exp(log_nu_posterior)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x_pdf, dist.pdf(x_pdf), '-r', label = 'Gaussian (Part II)')
ax.hist(mc_sample,color='b',normed=1,bins=15,label='Montecarlo poisson p.d.f. (Part III)')
ax.plot(nu_pdf, nu_posterior, 'g', lw=2, label = 'Bayesian Posterior')
#ax.plot(x_pmf, poisson.pmf(x_pmf), 'r', label = 'Poisson')
ax.set_xlim([min(mc_sample), max(mc_sample)])
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$p(\nu|N_{obs})$')
ax.legend()
Out[8]:
Let's now construc the Bayesian posterior pdf for $\nu$ using a MC simulation:
[Oriol]: Note that $N_{obs}+10\sqrt{N_{obs}}\approx \mu+10\sigma$
In [9]:
N=10**7
nu_rndm = np.random.random(N)*(Nobs+10*np.sqrt(Nobs))
poisson = stat.poisson(nu_rndm)
N_mc = poisson.rvs(N) #We'll draw a sample from N_mc
nu_mc_sample = nu_rndm[N_mc==Nobs] # Solo nos quedamos con los valores de nu en nu_rndm con los que al generar la Poisson,
# nos han dado, en el primer intento, que el de N_mc es igual a N_obs
(D)
In [10]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(nu_mc_sample,color='b',normed=1,bins=15,label='Montecarlo poisson p.d.f. (Part III)')
ax.plot(nu_pdf, nu_posterior, 'g', lw=2, label = 'Bayesian Posterior')
ax.set_xlim([min(mc_sample), max(mc_sample)])
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$p(\nu|N_{obs})$')
ax.legend()
Out[10]:
In [11]:
print len(nu_mc_sample) #this is the number of values that we select from N_mc sample
(E)
In [12]:
mu=np.mean(nu_mc_sample)
variance =np.var(nu_mc_sample) # equivalent to np.sqrt(np.var(nu_mc_sample))
skewness = stat.skew(nu_mc_sample)
kurtosis=stat.kurtosis(nu_mc_sample)
print 'The Montecarlo generated distribution has:\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu, variance, skewness, kurtosis)
Repeat for $N_{obs}=10$
Repeat for $N_{obs}=1$
Las dos celdas que siguen sirven para haver los apartados II-V, VI i VII de manera sintetizada.
[Oriol]: Debajo, los plots se pueden pesonalizar a gusto, jugando con el numero y posición de los subplots y en qué subplot está cada gráfico. Los tres numeros de los subplots definen una cuadrícula y una posición en esa cuadrícula. La posición dentro de la cuadrícula tiene que ser menor o igual al producto de los dos primeros números. Para ejemplos podéis mirar los plots tal y como están ahora.
In [13]:
Nobs = 10
nu_grid = np.linspace(Nobs-4*np.sqrt(Nobs), Nobs+6*np.sqrt(Nobs), 1000)
# Part II
gaussian_II = stat.norm(Nobs, np.sqrt(Nobs))
# Part III
N_III = 10**6
poisson_III = stat.poisson(Nobs)
mc_sample_III = poisson_III.rvs(N_III)
# Part IV
log_nu_posterior_lin = -nu_grid + Nobs*np.log(nu_grid) - np.log(msc.factorial(Nobs))
nu_posterior_lin = np.exp(log_nu_posterior_lin)
# Part V
N_V_lin=10**7
nu_prior_lin = np.random.random(N_V_lin)*(Nobs+10*np.sqrt(Nobs))
poisson_V_lin = stat.poisson(nu_prior_lin)
N_mc_lin = poisson_V_lin.rvs(N_V_lin) #We'll draw a sample from N_mc
mc_sample_V_lin = nu_prior_lin[N_mc_lin==Nobs]
print len(mc_sample_V_lin)
# Part V.E)
mu_VE_lin = np.mean(mc_sample_V_lin)
sigma_VE_lin = np.std(mc_sample_V_lin)
gaussian_VE_lin = stat.norm(mu_VE_lin, sigma_VE_lin)
In [14]:
rng = [nu_grid[0], nu_grid[-1]]
Nbins = 12
Nbins = np.arange(int(nu_grid[0]),int(2*nu_grid[-1])) # For the case Nobs = 1, to have bins between integer values.
# For other Nobs, it's bettter to comment it.
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
ax1.plot(nu_grid,gaussian_II.pdf(nu_grid), label='Gaussian fit to freq. estimator')
ax1.hist(mc_sample_III, normed=1, range=rng, bins=Nbins, label='MC frequentist estimator')
ax1.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax2.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax2.hist(mc_sample_V_lin, normed=1, range=rng, bins=Nbins, label='MC linear prior')
ax2.plot(nu_grid,gaussian_VE_lin.pdf(nu_grid), label='Gaussian fit to MC linear prior')
ax3.plot(nu_grid,gaussian_II.pdf(nu_grid), '--', label='Gaussian fit to freq. estimator')
ax3.hist(mc_sample_III, normed=1, range=rng, bins=Nbins, histtype="step", label='MC frequentist estimator')
ax3.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax3.hist(mc_sample_V_lin, normed=1, range=rng, bins=Nbins, histtype="step", label='MC linear prior')
ax3.plot(nu_grid,gaussian_VE_lin.pdf(nu_grid), ':', label='Gaussian fit to MC linear prior')
# Trick to set all labels for all axes to the same names and values
axes = [ax1, ax2, ax3]
for ax in axes:
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$P(\nu|N)$')
ax.legend(loc='best')
In [15]:
mu_V =np.mean(mc_sample_V_lin)
variance_V =np.var(mc_sample_V_lin) # equivalent to np.sqrt(np.var(mc_sampleV))
skewness_V = stat.skew(mc_sample_V_lin)
kurtosis_V=stat.kurtosis(mc_sample_V_lin)
print ('the moments for the MC simulatiom with linear prior are:\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu_V, variance_V, skewness_V, kurtosis_V))
mu_III=np.mean(mc_sample_III)
variance_III =np.var(mc_sample_III) # equivalent to np.sqrt(np.var(mc_sample_III))
skewness_III = stat.skew(mc_sample_III)
kurtosis_III=stat.kurtosis(mc_sample_III)
print ('The moments for the MC simulation with N=nu (part III):\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu_III, variance_III, skewness_III, kurtosis_III))
A constant prior in $\log\nu$ implies:
$$ \Pi(\nu)d\log\nu=c d\log\nu=\frac{c}{\nu} d\nu$$In our case:
$$ \Pi(\nu)= \begin{cases} 0 & \nu<0 \\ \frac{c}{\nu} & \nu>0 \end{cases} $$The likelihood is the same as for the linear case:
$$L(N ;\nu)=\frac{\nu^{N}}{N!}e^{-\nu}$$Therefore, the normalization factor becomes:
$$ \int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu = \int_{0}^{\infty} \frac{\nu^{N}}{N!}e^{-\nu} \frac{c}{\nu} d\nu = \frac{c}{N!} \int_{0}^{\infty} \nu^{N-1}e^{-\nu}d\nu = \frac{c}{N!} (N-1)! = \frac{c}{N} $$Therefore, the posterior probability is:
$$ P(\nu;N)=\frac{L(N ;\nu)\Pi(\nu)}{\int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu}= \begin{cases} 0 & \nu<0 \\ \frac{\nu^{N}}{N!}e^{-\nu}\frac{c}{\nu}\frac{N}{c}=\frac{\nu^{N-1}}{(N-1)!}e^{-\nu} & \nu>0 \end{cases} $$However, like in the linear case, due to the huge powers and exponentials, the computations will be done with the logarithm of $P(\nu;N)$ for $\nu>0$ and eventually exponentiated to plot the real value.
$$ \log\big(P(\nu;N)\rvert_{\nu>0}\big)=-\nu+(N-1)\log(\nu)-\log\big((N-1)!\big)$$For the linear case, we have easily solved the problem of the improper prior because generating a uniform sample is really easy, and the only trick used was to make it between 0 and $N_{obs}+10\sqrt{N_{obs}}$. Now, generating a sample following this prior is not that simple, and it has to be done either by try-reject or inverse sampling.
Due to the shape of the pdf filling a ridiculously small area of any square containing it, the try-reject was discarded.
Inversion sampling does not require a finite box, but in this case, as it does require the cdf and it is an improper pdf, the range of the prior has been limited to $\nu\in[{\nu_{LIM}}^{-1},\nu_{LIM}]$, being $\nu_{LIM}=N_{obs}+10\sqrt{N_{obs}}$. Therefore:
$$\int_{{\nu_{LIM}}^{-1}}^{\nu_{LIM}} \frac{c}{\nu} d\nu = c \Big[\log \nu \Big]_{{\nu_{LIM}}^{-1}}^{\nu_{LIM}} = 2c\log\nu_{LIM} \quad \Rightarrow \quad c=\frac{1}{2\log\nu_{LIM}}$$And the cdf is:
$$ F_\Pi(\nu)= \int_{{\nu_{LIM}}^{-1}}^{\nu} \frac{c}{x} dx = c \Big[\log x \Big]_{{\nu_{LIM}}^{-1}}^{\nu} = c(\log\nu+\log\nu_{LIM})$$whose invertion in order to do inverse sampling is:
$$ r = c(\log\nu+\log\nu_{LIM}) \quad \rightarrow \quad \frac{r}{c}-\log\nu_{LIM}= \log\nu\\ F_\Pi^{-1}(r)=\nu = \frac{1}{\nu_{LIM}}e^{\frac{r}{c}} = \frac{1}{\nu_{LIM}}e^{2r\log\nu_{LIM}}$$
In [16]:
Nobs = 1
nu_grid = np.linspace(Nobs-5*np.sqrt(Nobs), Nobs+6*np.sqrt(Nobs), 1000)
# Part II
gaussian_II = stat.norm(Nobs, np.sqrt(Nobs))
# Part III
N_III = 10**6
poisson_III = stat.poisson(Nobs)
mc_sample_III = poisson_III.rvs(N_III)
# Part IV
log_nu_posterior_log = -nu_grid + (Nobs-1)*np.log(nu_grid) - np.log(msc.factorial(Nobs-1))
nu_posterior_log = np.exp(log_nu_posterior_log)
# Part V
N_V_log=10**6
nu_lim = Nobs+10*np.sqrt(Nobs)
# For Nobs=1 it is better to increase the value
nu_lim = Nobs+1000*np.sqrt(Nobs)
r_prior_log = np.random.random(N_V_log)
nu_prior_log = np.exp(2*r_prior_log*np.log(nu_lim))/nu_lim
poisson_V_log = stat.poisson(nu_prior_log)
N_mc_log = poisson_V_log.rvs(N_V_log) #We'll draw a sample from N_mc
mc_sample_V_log = nu_prior_log[N_mc_log==Nobs]
# Part V.E)
mu_VE_log = np.mean(mc_sample_V_log)
sigma_VE_log = np.std(mc_sample_V_log)
gaussian_VE_log = stat.norm(mu_VE_log, sigma_VE_log)
In [17]:
rng = [nu_grid[0], nu_grid[-1]]
Nbins = 12
Nbins = np.arange(int(nu_grid[0]),int(2*nu_grid[-1])) # For the case Nobs = 1, to have bins between integer values.
# For other Nobs, it's bettter to comment it.
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
ax1.plot(nu_grid,gaussian_II.pdf(nu_grid), label='Gaussian fit to freq. estimator')
ax1.hist(mc_sample_III, normed=1, range=rng, bins=Nbins, label='MC frequentist estimator')
ax1.plot(nu_grid,nu_posterior_log, label='Log prior')
ax2.plot(nu_grid,nu_posterior_log, label='Log prior')
ax2.hist(mc_sample_V_log, normed=1, range=rng, bins=Nbins, label='MC log prior')
ax2.plot(nu_grid,gaussian_VE_log.pdf(nu_grid), label='Gaussian fit to MC log prior')
ax3.plot(nu_grid,gaussian_II.pdf(nu_grid), '--', label='Gaussian fit to freq. estimator')
ax3.hist(mc_sample_III, normed=1, histtype="step", range=rng, bins=Nbins, label='MC frequentist estimator')
ax3.plot(nu_grid,nu_posterior_log, label='Log prior')
ax3.hist(mc_sample_V_log, normed=1, histtype="step", range=rng, bins=Nbins, label='MC log prior')
ax3.plot(nu_grid,gaussian_VE_log.pdf(nu_grid), ':', label='Gaussian fit to MC log prior')
# Trick to set all labels for all axes to the same names and values
axes = [ax1, ax2, ax3]
for ax in axes:
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$P(\nu|N)$')
ax.legend(loc='best')
In [18]:
mu_V =np.mean(mc_sample_V_log)
variance_V =np.var(mc_sample_V_log) # equivalent to np.sqrt(np.var(mc_sampleV))
skewness_V = stat.skew(mc_sample_V_log)
kurtosis_V=stat.kurtosis(mc_sample_V_log)
print ('the moments for the MC simulatiom with log prior are:\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu_V, variance_V, skewness_V, kurtosis_V))
mu_III=np.mean(mc_sample_III)
variance_III =np.var(mc_sample_III) # equivalent to np.sqrt(np.var(mc_sample_III))
skewness_III = stat.skew(mc_sample_III)
kurtosis_III=stat.kurtosis(mc_sample_III)
print ('The moments for the MC simulation with N=nu (part III):\n\tmean = %.6f,\n\t\
variance = %.6f,\n\tskewness = %.6f\n\tand kurtosis = %.6f' %(mu_III, variance_III, skewness_III, kurtosis_III))
The uniform prior in $\log\nu$ is exactly the same as the uniform prior in $\nu$ changing $N$ for $N-1$. Therefore, the calculations of expected values and moments will be done with $m$, and substituting it by $N$ or $N-1$ will yield the result in the desired case:
$$ P(\nu;m)= \begin{cases} 0 & \nu<0 \\ \frac{\nu^{m}}{(m)!}e^{-\nu} & \nu>0 \end{cases} $$Therefore, the $k$th order moment will be:
$$E[\nu^k]=\int_0^{\infty} \nu^k\frac{\nu^{m}}{m!}e^{-\nu}d\nu = \frac{1}{m!}\int_0^{\infty} \nu^{m+k}e^{-\nu}d\nu = \frac{(m+1+k)!}{m!}=\prod_{i=1}^k (m+i)$$Once the $k$th order moments are known, the following formulas can be used to calculate the variance and the skewness from them:
$$ Var[\nu] = E[\nu^2]-(E[\nu])^2 = (m+1)(m+2) - (m+1)^2 = (m+1)(m+2-m-1)=(m+1)$$$$\gamma_1=\frac{E[\nu^3]-3E[\nu]Var[\nu]-(E[\nu])^3}{(Var[\nu])^{3/2}} =\frac{(m+1)(m+2)(m+3)-3(m+1)^2-(m+1)^3}{(m+1)\sqrt{m+1}} = \frac{(m+2)(m+3)-3(m+1)-(m+1)^2}{\sqrt{m+1}} = \frac{m^2+5m+6-3m-3-m^2-2m-1}{\sqrt{m+1}} = \frac{2}{\sqrt{m+1}}$$The theoretical moments of the posterior for a logarithmic prior ($m=N-1$):
$$ Mean \rightarrow E[\nu] = N \\ Variance \rightarrow Var[\nu] = N \\ Skewness \rightarrow \gamma_1=\frac{2}{\sqrt{N}} $$The theoretical moments ofthe posterior for a linear prior ($m=N$): $$ Mean \rightarrow E[\nu] = N +1 \\ Variance \rightarrow Var[\nu] = N +1 \\ Skewness \rightarrow \gamma_1=\frac{2}{\sqrt{N +1}} $$
In [19]:
Nobs = 150 #Change this parameter to see how it shapes the graph
nu_grid = np.linspace(Nobs-4*np.sqrt(Nobs), Nobs+4*np.sqrt(Nobs), 1000)
# Part II
gaussian_II = stat.norm(Nobs, np.sqrt(Nobs))
# Part III
N_III = 10**6
poisson_III = stat.poisson(Nobs)
mc_sample_III = poisson_III.rvs(N_III)
# Part IV lin
log_nu_posterior_lin = -nu_grid + Nobs*np.log(nu_grid) - np.log(msc.factorial(Nobs))
nu_posterior_lin = np.exp(log_nu_posterior_lin)
# Part V lin
N_V_lin=10**7
nu_prior_lin = np.random.random(N_V_lin)*(Nobs+10*np.sqrt(Nobs))
poisson_V_lin = stat.poisson(nu_prior_lin)
N_mc_lin = poisson_V_lin.rvs(N_V_lin) #We'll draw a sample from N_mc
mc_sample_V_lin = nu_prior_lin[N_mc_lin==Nobs]
print len(mc_sample_V_lin)
# Part V.E) lin
mu_VE_lin = np.mean(mc_sample_V_lin)
sigma_VE_lin = np.std(mc_sample_V_lin)
gaussian_VE_lin = stat.norm(mu_VE_lin, sigma_VE_lin)
# Part IV log
log_nu_posterior_log = -nu_grid + (Nobs-1)*np.log(nu_grid) - np.log(msc.factorial(Nobs-1))
nu_posterior_log = np.exp(log_nu_posterior_log)
# Part V log
N_V_log=5*10**7
nu_lim = Nobs+10*np.sqrt(Nobs)
# For Nobs=1 it is better to increase the value, which may also require to increase N_V_log
nu_lim = Nobs+10000*np.sqrt(Nobs)
r_prior_log = np.random.random(N_V_log)
nu_prior_log = np.exp(2*r_prior_log*np.log(nu_lim))/nu_lim
poisson_V_log = stat.poisson(nu_prior_log)
N_mc_log = poisson_V_log.rvs(N_V_log) #We'll draw a sample from N_mc
mc_sample_V_log = nu_prior_log[N_mc_log==Nobs]
print len(mc_sample_V_log)
# Part V.E) log
mu_VE_log = np.mean(mc_sample_V_log)
sigma_VE_log = np.std(mc_sample_V_log)
gaussian_VE_log = stat.norm(mu_VE_log, sigma_VE_log)
In [20]:
rng = [nu_grid[0], nu_grid[-1]]
Nbins = 12
#Nbins = np.arange(int(nu_grid[0]),int(2*nu_grid[-1])) # For the case Nobs = 1, to have bins between integer values.
# For other Nobs, it's bettter to comment it.
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
ax1.plot(nu_grid,nu_posterior_log, label='Log prior')
ax1.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax2.hist(mc_sample_V_lin, normed=1, histtype="step", range=rng, bins=Nbins, label='MC linear prior')
ax2.hist(mc_sample_V_log, normed=1, histtype="step", range=rng, bins=Nbins, label='MC log prior')
ax2.hist(mc_sample_III, normed=1, histtype="step", range=rng, bins=Nbins, label='MC frequentist estimator')
ax3.plot(nu_grid,nu_posterior_log, label='Log prior')
ax3.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax3.plot(nu_grid,gaussian_II.pdf(nu_grid), '--', label='Gaussian fit to freq. estimator')
ax3.plot(nu_grid,gaussian_VE_log.pdf(nu_grid), '-.', label='Gaussian fit to MC log prior')
ax3.plot(nu_grid,gaussian_VE_lin.pdf(nu_grid), ':', label='Gaussian fit to MC linear prior')
# Trick to set all labels for all axes to the same names and values
axes = [ax1, ax2, ax3]
for ax in axes:
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$P(\nu|N)$')
ax.legend(loc='best')
What can you canclude about the similitudes and differences between the frequentist and Bayesian results? What can you conclude about the importance of choosing Jeffrey's prior? And about the validity of the gaussian approximations for the pdf?
La aproximación frequentista da una probabilidad grande para nu < 0, en cambio la aproximacion bayesiana la probabilidad de encontrar nu < 0 es zero. Para ambas aproximaciones, la mean es similar.
Jeffrey's prior refers to the prior chosen with the Jeffrey's rule, which is:
$$ \Pi(\theta) \propto \mathbf{F}^{1/2}(\theta) $$where
$$ \mathbf{F}(\theta)=-E\Big[\frac{\partial^2 (\log L(\theta))}{\partial \theta^2}\Big] $$In the case of a Poisson:
$$ L(N;\nu) = \frac{\nu^{N}}{N!}e^{-\nu} \\ -\log\big(L(N;\nu)\big)=\nu-N\log\nu-\log(N!) \\ \frac{\partial (-\log L(N;\nu))}{\partial \nu} = 1-\frac{N}{\nu} \\ \frac{\partial^2 (-\log L(N;\nu))}{\partial \nu^2}= \frac{N}{\nu^2} \\ \mathbf{F}(\nu)=E\bigg[\frac{\partial^2 (-\log L(N;\nu))}{\partial \nu^2}\bigg]=\frac{E[N]}{\nu^2}=\frac{1}{\nu}$$Which means that:
$$ \Pi(\nu) \propto \mathbf{F}^{1/2}(nu) = \nu^{-1/2} $$Jeffrey's prior has been found to be:
$$ \Pi(\nu)= \begin{cases} 0 & \nu<0 \\ c\nu^{-1/2} & \nu>0 \end{cases} $$The likelihood is the same as for the linear and logarithmic case:
$$L(N ;\nu)=\frac{\nu^{N}}{N!}e^{-\nu}$$Therefore, the normalization factor becomes:
$$ \int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu = \int_{0}^{\infty} \frac{\nu^{N}}{N!}e^{-\nu} c\nu^{-1/2} d\nu = \frac{c}{N!} \int_{0}^{\infty} \nu^{N-\frac{1}{2}}e^{-\nu}d\nu = \frac{c}{N!} \Gamma(N+\frac{1}{2}) $$where we have used:
$$ \int_0^\infty x^ne^{-ax}dx = \frac{\Gamma(n+1)}{a^{n+1}} $$Therefore, the posterior probability is:
$$ P(\nu;N)=\frac{L(N ;\nu)\Pi(\nu)}{\int_{-\infty}^{\infty} L(N ;\nu) \Pi(\nu) d\nu}= \begin{cases} 0 & \nu<0 \\ \frac{\nu^{N}}{N!}e^{-\nu}c\nu^{-1/2}\frac{N!}{c\Gamma(N+\frac{1}{2})} =\frac{\nu^{N-\frac{1}{2}}}{\Gamma(N+\frac{1}{2})}e^{-\nu} & \nu>0 \end{cases} $$However, like in the linear case, due to the huge powers and exponentials, the computations will be done with the logarithm of $P(\nu;N)$ for $\nu>0$ and eventually exponentiated to plot the real value.
$$ \log\big(P(\nu;N)\rvert_{\nu>0}\big)=-\nu+(N-\frac{1}{2})\log(\nu)-\log\big(\Gamma(N+\frac{1}{2})\big)$$
In [21]:
Nobs = 1
nu_grid = np.linspace(Nobs-5*np.sqrt(Nobs), Nobs+6*np.sqrt(Nobs), 1000)
# Part II
gaussian_II = stat.norm(Nobs, np.sqrt(Nobs))
# Part III
N_III = 10**6
poisson_III = stat.poisson(Nobs)
mc_sample_III = poisson_III.rvs(N_III)
# Part IV - Jeffrey's prior (we're using the log so that P(nu) doesn't blow up)
log_nu_posterior_jef = -nu_grid + (Nobs-1./2.)*np.log(nu_grid) - np.log(spc.gamma(Nobs+1./2.))
nu_posterior_jef = np.exp(log_nu_posterior_jef)
# Part V jef
N_V_jef = 10**6
nu_lim = Nobs+10*np.sqrt(Nobs)
# For Nobs=1 it is better to increase the value
#nu_lim = Nobs+1000*np.sqrt(Nobs)
r_prior_jef = np.random.random(N_V_jef)
nu_prior_jef = 4*nu_lim**2*r_prior_jef**2
poisson_V_jef = stat.poisson(nu_prior_jef)
N_mc_jef = poisson_V_jef.rvs(N_V_jef) #We'll draw a sample from N_mc
mc_sample_V_jef = nu_prior_jef[N_mc_jef==Nobs]
print len(mc_sample_V_jef)
# Part V.E) jef
mu_VE_jef = np.mean(mc_sample_V_jef)
sigma_VE_jef = np.std(mc_sample_V_jef)
gaussian_VE_jef = stat.norm(mu_VE_jef, sigma_VE_jef)
In [22]:
rng = [nu_grid[0], nu_grid[-1]]
Nbins = 12
Nbins = np.arange(int(nu_grid[0]),int(2*nu_grid[-1])) # For the case Nobs = 1, to have bins between integer values.
# For other Nobs, it's bettter to comment it.
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
ax1.plot(nu_grid,gaussian_II.pdf(nu_grid), label='Gaussian fit to freq. estimator')
ax1.hist(mc_sample_III, normed=1, range=rng, bins=Nbins, label='MC frequentist estimator')
ax1.plot(nu_grid,nu_posterior_jef, label='Jef''s prior')
ax2.plot(nu_grid,nu_posterior_jef, label='Jef''s prior')
ax2.hist(mc_sample_V_jef, normed=1, range=rng, bins=Nbins, label='MC Jef''s prior')
ax2.plot(nu_grid,gaussian_VE_jef.pdf(nu_grid), label='Gaussian fit to MC Jef''s prior')
ax3.plot(nu_grid,gaussian_II.pdf(nu_grid), '--', label='Gaussian fit to freq. estimator')
ax3.hist(mc_sample_III, normed=1, histtype="step", range=rng, bins=Nbins, label='MC frequentist estimator')
ax3.plot(nu_grid,nu_posterior_jef, label='Jef''s prior')
ax3.hist(mc_sample_V_jef, normed=1, histtype="step", range=rng, bins=Nbins, label='MC Jef''s prior')
ax3.plot(nu_grid,gaussian_VE_jef.pdf(nu_grid), ':', label='Gaussian fit to MC Jef''s prior')
# Trick to set all labels for all axes to the same names and values
axes = [ax1, ax2, ax3]
for ax in axes:
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$P(\nu|N)$')
ax.legend(loc='best')
In [23]:
Nobs = 1 #Change this parameter to see how it shapes the graph
nu_grid = np.linspace(Nobs-4*np.sqrt(Nobs), Nobs+4*np.sqrt(Nobs), 1000)
# Part II
gaussian_II = stat.norm(Nobs, np.sqrt(Nobs))
# Part III
N_III = 10**6
poisson_III = stat.poisson(Nobs)
mc_sample_III = poisson_III.rvs(N_III)
# Part IV lin
log_nu_posterior_lin = -nu_grid + Nobs*np.log(nu_grid) - np.log(msc.factorial(Nobs))
nu_posterior_lin = np.exp(log_nu_posterior_lin)
# Part V lin
N_V_lin=10**7
nu_prior_lin = np.random.random(N_V_lin)*(Nobs+10*np.sqrt(Nobs))
poisson_V_lin = stat.poisson(nu_prior_lin)
N_mc_lin = poisson_V_lin.rvs(N_V_lin) #We'll draw a sample from N_mc
mc_sample_V_lin = nu_prior_lin[N_mc_lin==Nobs]
print len(mc_sample_V_lin)
# Part V.E) lin
mu_VE_lin = np.mean(mc_sample_V_lin)
sigma_VE_lin = np.std(mc_sample_V_lin)
gaussian_VE_lin = stat.norm(mu_VE_lin, sigma_VE_lin)
# Part IV log
log_nu_posterior_log = -nu_grid + (Nobs-1)*np.log(nu_grid) - np.log(msc.factorial(Nobs-1))
nu_posterior_log = np.exp(log_nu_posterior_log)
# Part V log
N_V_log=5*10**7
nu_lim = Nobs+10*np.sqrt(Nobs)
# For Nobs=1 it is better to increase the value, which may also require to increase N_V_log
nu_lim = Nobs+10000*np.sqrt(Nobs)
r_prior_log = np.random.random(N_V_log)
nu_prior_log = np.exp(2*r_prior_log*np.log(nu_lim))/nu_lim
poisson_V_log = stat.poisson(nu_prior_log)
N_mc_log = poisson_V_log.rvs(N_V_log) #We'll draw a sample from N_mc
mc_sample_V_log = nu_prior_log[N_mc_log==Nobs]
print len(mc_sample_V_log)
# Part V.E) log
mu_VE_log = np.mean(mc_sample_V_log)
sigma_VE_log = np.std(mc_sample_V_log)
gaussian_VE_log = stat.norm(mu_VE_log, sigma_VE_log)
# Part IV - Jeffrey's prior (we're using the log so that P(nu) doesn't blow up)
log_nu_posterior_jef = -nu_grid + (Nobs-1./2.)*np.log(nu_grid) - np.log(spc.gamma(Nobs+1./2.))
nu_posterior_jef = np.exp(log_nu_posterior_jef)
# Part V jef
N_V_jef = 10**7
nu_lim = Nobs+10*np.sqrt(Nobs)
r_prior_jef = np.random.random(N_V_jef)
nu_prior_jef = 4*nu_lim**2*r_prior_jef**2
poisson_V_jef = stat.poisson(nu_prior_jef)
N_mc_jef = poisson_V_jef.rvs(N_V_jef) #We'll draw a sample from N_mc
mc_sample_V_jef = nu_prior_jef[N_mc_jef==Nobs]
print len(mc_sample_V_jef)
# Part V.E) jef
mu_VE_jef = np.mean(mc_sample_V_jef)
sigma_VE_jef = np.std(mc_sample_V_jef)
gaussian_VE_jef = stat.norm(mu_VE_jef, sigma_VE_jef)
In [24]:
rng = [nu_grid[0], nu_grid[-1]]
Nbins = 12
Nbins = np.arange(int(nu_grid[0]),int(2*nu_grid[-1])) # For the case Nobs = 1, to have bins between integer values.
# For other Nobs, it's bettter to comment it.
fig = plt.figure(figsize=(12,12))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
ax1.plot(nu_grid,nu_posterior_log, label='Log prior')
ax1.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax1.plot(nu_grid,nu_posterior_jef, label='Jef''s prior')
ax2.hist(mc_sample_V_lin, normed=1, histtype="step", range=rng, bins=Nbins, label='MC linear prior')
ax2.hist(mc_sample_V_log, normed=1, histtype="step", range=rng, bins=Nbins, label='MC log prior')
ax2.hist(mc_sample_V_jef, normed=1, histtype="step", range=rng, bins=Nbins, label='MC jef''s prior')
ax2.hist(mc_sample_III, normed=1, histtype="step", range=rng, bins=Nbins, label='MC frequentist estimator')
ax3.plot(nu_grid,nu_posterior_log, label='Log prior')
ax3.plot(nu_grid,nu_posterior_lin, label='Linear prior')
ax3.plot(nu_grid,nu_posterior_jef, label='Jef''s prior')
ax3.plot(nu_grid,gaussian_II.pdf(nu_grid), '--', label='Gaussian fit to freq. estimator')
ax3.plot(nu_grid,gaussian_VE_log.pdf(nu_grid), '-.', label='Gaussian fit to MC log prior')
ax3.plot(nu_grid,gaussian_VE_lin.pdf(nu_grid), ':', label='Gaussian fit to MC linear prior')
ax3.plot(nu_grid,gaussian_VE_jef.pdf(nu_grid), ':', label='Gaussian fit to MC jef''s prior')
# Trick to set all labels for all axes to the same names and values
axes = [ax1, ax2, ax3]
for ax in axes:
ax.set_xlabel(r'$\nu$')
ax.set_ylabel(r'$P(\nu|N)$')
ax.legend(loc='best')
For small $N$, there are many differences between them, and jef's prior is kind of a mixture between both of them. That is because jef's prior is the least informative one, and therefore should represent a bit better the sample. The gaussian is very off...
For larger $N$ (i.e. $N=150$) all different priors and both bayesian or frequentist are nearly equivalent. The Gaussian is more valid since it matches closely both the bayesian conditional theoretical probabilities and the MCs for both bayesian and frequentist. Gaussian approximation is only valid when $N\rightarrow\infty$
To generate a d-dimensional Gaussian, we have to do the following steps: $$f\vec({x}) = \frac{1}{(2\pi)^{\frac{d}{2}}(detV)^{\frac{1}{2}}}e^{-\frac{1}{2}(\overrightarrow{x}-\overrightarrow{\mu})^{T}V^{-1}(\overrightarrow{x}-\overrightarrow{\mu})}$$
where $V = LL^{T}$, so:
$$(\overrightarrow{x}-\overrightarrow{\mu})^{T}V^{-1}(\overrightarrow{x}-\overrightarrow{\mu}) = (\overrightarrow{x}-\overrightarrow{\mu})^{T}(LL^{T})^{-1}(\overrightarrow{x}-\overrightarrow{\mu}) = (L^{-1}(\overrightarrow{x}-\overrightarrow{\mu}))^{T}(L^{-1}(\overrightarrow{x}-\overrightarrow{\mu})) = \overrightarrow{Y}^{T}\overrightarrow{Y}$$where $\overrightarrow{Y} = L^{-1}(\overrightarrow{x}-\overrightarrow{\mu})$ and $\overrightarrow{x} = \overrightarrow{\mu} + L\overrightarrow{Y}$
$\overrightarrow{0}^{T}V\overrightarrow{0} = 0$ and $\overrightarrow{Y}^{T}V\overrightarrow{Y} > 0$
$$f\propto e^{-\frac{1}{2}\overrightarrow{Y}^{T}\overrightarrow{Y}} = \Pi^{d}_{i=1} e^{-\frac{1}{2}Y^{2}_{1}}\propto \Pi^{d}_{i=1}N_{Y_{i}}(0,1)$$$$L_{jj} = (V_{jj}-\Sigma^{j-1}_{k=1}L^{2}_{jk})^{\frac{1}{2}}; j = 1,...,d$$$$L_{ij} = \frac{V_{ij}-\Sigma^{j-1}_{k=1}L_{ik}K_{kj}}{L_{jj}}$$j = 1,...,d; i = j+1,...,d
if: $V = M= \left[ {\begin{array}{cc} \sigma^{2}_{1} & \rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{2}\sigma_{1} & \sigma^{2}_{2} \\ \end{array} } \right]$; $Y_{1}\sim N(0,1); Y_{2}\sim N(0,1)$
$L = M= \left[ {\begin{array}{cc} \sigma_{1} & 0\\ \rho\sigma_{2}& \sqrt{1-\rho^{2}}\sigma_{2} \\ \end{array} } \right]$, so:
$$x_{1} = \mu_{1} + \sigma_{1}Y_{1}$$
and $$x_{2} = \mu_{2} + \rho^{2}\sigma_{2}Y_{1} + \sqrt{1-\rho^{2}}\sigma_{2}Y_{2}$$
$$N(\mu_{1},\mu_{2},\sigma_{2},\sigma_{1},\rho)$$
In [25]:
# function to creat a 3d histogram
def hist3d(x,y,bins=5,fig=None,ax=None):
# Note: fig and ax are inputs in order to allow the caller to create
# them outside and in this way, modify labels and legents once the function has been called
# In addition, they being inputs allows to use this function to create subplots
# x and y are the positions in the plane of the data
# bins is directly passed to np.histugram2d, and therefore, accepts the same options as numpy https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.histogram2d.html
# for simplicity, fig and ax arguments are optional, therefore, if they are the default value None,
# they are created directly inside the function
if fig==None:
fig = plt.figure()
if ax==None:
ax = fig.add_subplot(111, projection='3d')
# Call np.histogramm2d, it returns a 2d array H with the number of data points inside the area
# delimited by xedges and yedges
H, xedges, yedges = np.histogram2d(x, y, bins=bins, normed=1)
# Construct arrays with the lower, left corner of the bars
xpos, ypos = np.meshgrid(xedges[:-1], yedges[:-1])
xpos = xpos.flatten('F')
ypos = ypos.flatten('F')
zpos = np.zeros_like(xpos)
# Construct arrays with the dimensions of the bars. It assumes that spacings only depend on the axis
dx = (xedges[1]-xedges[0]) * np.ones_like(zpos)
dy = (yedges[1]-yedges[0]) * np.ones_like(zpos)
dz = H.flatten()
#Choose colors according to colormap. As our data is normalized between 0 ans 1 but all vales may be small,
# they are rescaled for aesthetic reasons
plot_colors = cm.jet(dz/dz.max())
# Plot. bar3d arguments are the cordinates of the vertex and the dimensions in each of the 3 axis,
# the other arguments are optional
ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color=plot_colors, alpha=None)
return
In [26]:
N_3I = 10**4
mu1 = 2
mu2 = 1
sigma1 = 1
sigma2 = 2
rho = 0.8
y1 = np.random.normal(size=N_3I)
y2 = np.random.normal(size=N_3I)
x1 = mu1 + sigma1*y1
x2 = mu2 + rho*sigma2*y1 + np.sqrt(1-rho**2)*sigma2*y2
D = np.transpose(np.array([x1,x2]))
In [27]:
fig=plt.figure()
ax1=fig.add_subplot(111)
hist1 = ax1.hist2d(x1,x2)
ax1.set_xlabel('$x_1$')
ax1.set_ylabel('$x_2$')
fig.colorbar(hist1[3], ax=ax1)
fig2 = plt.figure()
ax3d = fig2.gca(projection='3d')
xedges = np.arange(round(x1.min()),round(x1.max())+2)-.5
yedges = np.arange(round(x2.min()),round(x2.max())+2)-.5
hist3d(x1,x2,bins=(xedges,yedges),ax=ax3d,fig=fig2)
ax3d.set_xlabel('$x_1$')
ax3d.set_ylabel('$x_2$')
ax3d.set_zlabel('Number of events')
Out[27]:
Con $\rho=0.999$, el plot muestra una correlación mucho más grande respecto x1 y x2 (es casi una linea recta en diagonal). Si $\rho=1$, x1 = x2.
The PDF of a multi-dimensional (2D) gaussian is:
$$f(x_{1},x_{2})=\frac{1}{2\pi\sqrt{\det V}}e^{-\frac{1}{2}\left(\begin{array}{cc} x_{1}-\mu_{1} & x_{2}-\mu_{2}\end{array}\right)V^{-1}\left(\begin{array}{c} x_{1}-\mu_{1}\\ x_{2}-\mu_{2} \end{array}\right)}=\frac{1}{2\pi\sigma_{1}\sigma_{2}\sqrt{1-\rho^{2}}}e^{-\frac{1}{2}\text{A}}$$where
$$ A\equiv\left(\begin{array}{cc} x_{1}-\mu_{1} & x_{2}-\mu_{2}\end{array}\right)V^{-1}\left(\begin{array}{c} x_{1}-\mu_{1}\\ x_{2}-\mu_{2} \end{array}\right) $$and V is the covariance matrix:
$$ V=\left[\begin{array}{cc} \sigma_{1}^{2} & \rho\sigma_{1}\sigma_{2}\\ \rho\sigma_{1}\sigma_{2} & \sigma_{2}^{2} \end{array}\right] $$The inverse of V is:
$$ V^{-1}=\frac{1}{\det V}\left[\begin{array}{cc} \sigma_{2}^{2} & -\rho\sigma_{1}\sigma_{2}\\ -\rho\sigma_{1}\sigma_{2} & \sigma_{1}^{2} \end{array}\right] $$The determinant of V is:
$$ \det V=\sigma_{1}^{2}\sigma_{2}^{2}-\rho^{2}\sigma_{1}^{2}\sigma_{2}^{2}=\sigma_{1}^{2}\sigma_{2}^{2}\left(1-\rho^{2}\right) $$Calculate A:
$$A=\left(\begin{array}{cc} x_{1}-\mu_{1} & x_{2}-\mu_{2}\end{array}\right)V^{-1}\left(\begin{array}{c} x_{1}-\mu_{1}\\ x_{2}-\mu_{2} \end{array}\right)= \frac{1}{\det V}\left(\begin{array}{cc} x_{1}-\mu_{1} & x_{2}-\mu_{2}\end{array}\right)\left(\begin{array}{c} \sigma_{2}^{2}\left(x_{1}-\mu_{1}\right)-\rho\sigma_{1}\sigma_{2}\left(x_{2}-\mu_{2}\right)\\ \sigma_{1}^{2}\left(x_{2}-\mu_{2}\right)-\rho\sigma_{1}\sigma_{2}\left(x_{1}-\mu_{1}\right) \end{array}\right)\\ =\frac{1}{\det V}\left[\sigma_{2}^{2}\left(x_{1}-\mu_{1}\right)^{2}-\rho\sigma_{1}\sigma_{2}\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)+\sigma_{1}^{2}\left(x_{2}-\mu_{2}\right)^{2}-\rho\sigma_{1}\sigma_{2}\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)\right]\\ =\frac{1}{\det V}\left\{ \left[\sigma_{1}\left(x_{2}-\mu_{2}\right)\right]^{2}+\left[\sigma_{2}\left(x_{1}-\mu_{1}\right)\right]^{2}-2\rho\sigma_{1}\sigma_{2}\left(x_{1}-\mu_{1}\right)\left(x_{2}-\mu_{2}\right)\right\} $$Let $\pi=c$ the prior $\forall x_{1},x_{2}$ and $\Pi$ be the product (productorio).
The likelihood is: $$L=\Pi_{i=1}^{N}f(x_{1i},x_{2i})$$
and the posterior is:
$$p\left(\mu_{1},\mu_{2}|D\right)\propto\pi\text{·}L=\frac{c}{\left(2\pi\sigma_{1}\sigma_{2}\sqrt{1-\rho^{2}}\right)^{N}}e^{-\frac{1}{2}\sum A}\propto ce^{-\frac{1}{2}\sum A}$$so the non-normalized likelihood is just:
$$L\propto e^{-\frac{1}{2}\sum A}$$Our Metropolis Hasting algorithm consists of the following steps: Preliminary step: Take two functions: the likelihood function $L$ (or pdf, since it's the same in 1D) (part II), and the 2D Gaussian distribution from Part 1. Then, loop over (*):
(*) Note that the Metropolis Hasting algorithm is iterative. more info on it here:
https://wiseodd.github.io/techblog/2015/10/17/metropolis-hastings/
https://python4mpia.github.io/fitting_data/Metropolis-Hastings.html
http://nbviewer.jupyter.org/github/drphilmarshall/LearningInference/blob/master/straightline.ipynb
In [28]:
def try_mu(muold,step):
return np.random.normal(loc=muold,scale=step)
def g_ratio(munew,muold,D,rhoD2,det,s1,s2,rho):
gnew = (s1*(D[:,0]-munew[0]))**2+(s2*(D[:,1]-munew[1]))**2-rhoD2 #rhoD2=2*rho*s1*s2*x1*x2
gold = (s1*(D[:,0]-muold[0]))**2+(s2*(D[:,1]-muold[1]))**2-rhoD2
return np.exp(-1./(2*det)*sum(gnew-gold))
In [29]:
N3_III = 10**6
step = 0.5 # this is an initial guess
mus = np.ones((2,N3_III+1)) #the 1, 1 values are an initial guess
s1 = 1
s2 = 2
rho = 0.999
rhoD2 = 2*rho*s1*s2*D[:,0]*D[:,1]
det = (s1*s2*np.sqrt(1-rho**2))**2
for i in xrange(N3_III):
if i%5000==0:
print i
mutry = try_mu(mus[:,i],step)
#print mutry
A = min(1, g_ratio(mutry,mus[:,i],D,rhoD2,det,s1,s2,rho))
#print A
eta = np.random.random() #Generate a number from the uniform distribution
if A>eta:
mus[:,i+1]=mutry
else:
mus[:,i+1]=mus[:,i]
print mus
In [30]:
plt.plot(mus[0,5000:],mus[1,5000:],'.')
print np.mean(mus[0,5000:]),np.mean(mus[1,5000:])
print mus[:,-1]
Con $\rho=0.999$, el algoritmo tarda mucho más en converger (y por tanto da peor).
Gibbs sampler is a Markov chain algorithm that generates a sample following a given pdf by making some tricks with the conditional pdf (see notes). To do that, the following steps are applied:
Once the burn in time has passed (let's say after Nburn iterations) the obtained values zNburn,zNburn+1,...,zN is our sample following the desired pdf.
This mechanism iterations are simpler and thus faster than MCMC Metropolis hasting. But they are not really comparable. Gibbs obtains a sample by making displacements independently in x and y, whereas the metropolis hastings is a pure random walk with an acceptance probability.
Therefore, the conditional probabilities P(mux|muy,D) and P(muy|mux,D) have to be calculated:
Sea $D=(x_i,y_i)_{i=1,..,N}$ una muestra de una distribucion gaussiana 2D. Sean $\mu_x$, $\mu_y$, $\sigma_x$, $\sigma_y$ y $\rho$ las correspondientes medias, desviaciones estandar y el coeficiente de correlacion.
Por el ejercicio 3.II sabemos que el posterior es:
$$P(\mu_x,\mu_y|D)=\frac{c}{(2\pi\sigma_x\sigma_y\sqrt{1-\rho^2})^N}e^{-\frac{1}{2}\sum_{i=1}^N{A_i}}$$Donde $A_i$ viene dada por:
$$A_i=\frac{(x_i-\mu_x)^2}{\sigma_x^2(1-\rho^2)}+\frac{(y_i-\mu_y)^2}{\sigma_y^2(1-\rho^2)}-\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}(y_i-\mu_y)(x_i-\mu_x)$$Desarrollando $A_i$ se llega a que:
$$A_i=\frac{x^2_i+\mu_x^2-2\mu_x x_i}{\sigma_x^2(1-\rho^2)}+\frac{(y_i-\mu_y)^2}{\sigma_y^2(1-\rho^2)}-\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}x_i y_i+\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}y_i\mu_x+\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}x_i\mu_y-\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}\mu_x\mu_y$$Por lo tanto se tiene que:
$$P(\mu_x | \mu_y;D)\propto \exp\bigg(-\frac{1}{2}\sum_{i=1}^N\Big(\frac{\mu_x^2-2\mu_x x_i}{\sigma_x^2(1-\rho^2)}+\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}y_i\mu_x-\frac{2\rho}{\sigma_x\sigma_y(1-\rho^2)}\mu_x\mu_y\Big)\bigg)$$Esto es debido a que $P(\mu_x | \mu_y;D)$ es la probabilidad de obtener $\mu_x$ cuando $\mu_y$ esta FIJO, por lo tanto cualquier termino donde no haya un $\mu_x$ sera constante.
Desarrollando aun mas esta expresion
$$P(\mu_x | \mu_y;D)\propto \exp\bigg(-\frac{1}{2}\sum_{i=1}^N{\frac{\mu_x^2\sigma_y-2\mu_x x_i\sigma_y+2\rho\sigma_x\mu_x(y_i-\mu_y)}{\sigma_x^2\sigma_y(1-\rho^2)}}\bigg)=\exp\bigg(-\frac{1}{2}\sum_{i=1}^N{\frac{\mu_x^2-\mu_x\big(2x_i-2\rho\frac{\sigma_x}{\sigma_y}(y_i-\mu_y)\big)}{\sigma_x^2(1-\rho^2)}}\bigg)$$Entonces, ya que $\mu_j$ (j=x,y) no depende del sumatorio se tiene que:
$$\sum_{i=1}^N{\Big(\mu_x^2-2\mu_x\big(x_i-\rho\frac{\sigma_x}{\sigma_x}(y_i-\mu_y)\big)\Big)}=\mu_x^2 N-2\mu_x\bigg(\sum_{i=1}^N{x_i}-\rho\frac{\sigma_x}{\sigma_y}\Big(\sum_{i=1}^N{y_i}-N\mu_y\Big)\bigg)=N\Big(\mu_x^2-2\mu_x\big(\frac{1}{N}\sum_{i=1}^N{x_i}-\rho\frac{\sigma_x}{\sigma_y}(\frac{1}{N}\sum_{i=1}^N{y_i}-\mu_y)\big)\Big)$$Definiendo las medias muestrales como $$\bar{x}=\frac{1}{N}\sum_{i=1}^N{x}_i \quad ; \quad \bar{y}=\frac{1}{N}\sum_{i=1}^N{y}_i$$ Se obtiene la expresion: $$N\Big(\mu_x^2-2\mu_x\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)\Big)$$ Y por lo tanto: $$P(\mu_x | \mu_y;D)\propto \exp\bigg(-\frac{N}{2\sigma_x^2(1-\rho^2)}\Big(\mu_x^2-2\mu_x\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)\Big)\bigg)$$
Sumando y restando el termino $\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)^2$:
$$P(\mu_x | \mu_y;D)\propto \exp\bigg(-\frac{N}{2\sigma_x^2(1-\rho^2)}\Big(\mu_x^2-2\mu_x\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)+\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)^2-\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)^2\Big)\bigg)$$Y haciendo cuadrados: $$P(\mu_x | \mu_y;D)\propto \exp\Bigg(-\frac{N}{2\sigma_x^2(1-\rho^2)}\bigg(\Big(\mu_x-\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)\Big)^2-\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)^2\bigg)\Bigg) \propto \exp\bigg(-\frac{\Big(\mu_x-\big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)\big)\Big)^2}{2\frac{\sigma_x^2(1-\rho^2)}{N}}\bigg)$$
Que se corresponde con la forma de una distribucion normal con media $\mu=\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y)$ y desviacion estandar $\sigma=\frac{\sigma_x\sqrt{1-\rho^2}}{\sqrt{N}}$.
Se concluye que: $$P(\mu_x | \mu_y;D)\equiv \mathcal{N}\Big(\bar{x}-\rho\frac{\sigma_x}{\sigma_y}(\bar{y}-\mu_y),\frac{\sigma_x\sqrt{1-\rho^2}}{\sqrt{N}}\Big)$$ $$P(\mu_y | \mu_x;D)\equiv \mathcal{N}\Big(\bar{y}-\rho\frac{\sigma_y}{\sigma_x}(\bar{x}-\mu_x),\frac{\sigma_y\sqrt{1-\rho^2}}{\sqrt{N}}\Big)$$ Donde la segunda probabilidad condicionada se obtiene directamente debido a la simetria que existe entre $\mu_x$ y $\mu_y$ en la ecuacion original.
In [31]:
N3_IV = 10**5
# Target mean
Mean1 = np.mean(D[:,0])
Mean2 = np.mean(D[:,1])
s1 = 1
s2 = 2
rho = 0.8
Sig1 = sigma1*np.sqrt(1-rho**2)/np.sqrt(len(D))
Sig2 = sigma2*np.sqrt(1-rho**2)/np.sqrt(len(D))
sample_IV = np.zeros((2,N3_IV+1))
D1 = Mean1-rho*sigma1/sigma2*Mean2
D2 = Mean2-rho*sigma2/sigma1*Mean1
for i in range(N3_IV):
sample_IV[0,i+1] = np.random.normal(D1+rho*sigma1/sigma2*sample_IV[1,i], Sig1)
sample_IV[1,i+1] = np.random.normal(D2+rho*sigma2/sigma1*sample_IV[0,i+1], Sig2)
SampleMean1=np.mean(sample_IV[0,50:])
SampleMean2=np.mean(sample_IV[1,50:])
print 'Sample means are:\n\tmean_x=%.4g\n\tmean_y=%.4g' %(SampleMean1, SampleMean2)
SampleStd1=np.std(sample_IV[0,50:])
SampleStd2=np.std(sample_IV[1,50:])
print 'And standard devistions:\n\tstd_x=%.4f\n\tstd_y=%.4f' %(SampleStd1, SampleStd2)
In [ ]:
Repetir ejercicios 1-4 con $\rho=0.999$
In [ ]: