MLE 모수 추정

베르누이 분포의 모수 추정

  • 각각의 시도 $x_i$에 대한 확률은 베르누이 분포
$$ P(x ; \theta ) = \text{Bern}(x ; \theta ) = \theta^x (1 - \theta)^{1-x}$$
  • $N$개의 독립 샘플 $x_{1:N}$ 이 있는 경우, $$ L(\theta ; x_{1:N}) = P(x_{1:N};\theta) = \prod_{i=1}^N \theta^{x_i} (1 - \theta)^{1-x_i} $$

  • Log-Likelihood $$ \begin{eqnarray*} \log L &=& \log P(x_{1:N};\theta) \\ &=& \sum_{i=1}^N \big\{ {x_i} \log\theta + (1-x_i)\log(1 - \theta) \big\} \\ &=& \sum_{i=1}^N {x_i} \log\theta + \left( N-\sum_{i=1}^N x_i \right) \log( 1 - \theta ) \\ \end{eqnarray*} $$

  • $x = 1$(성공) 또는 $x= 0$ (실패) 이므로
    • 전체 시도 횟수 $N$
    • 그 중 성공 횟수 $N_1 = \sum_{i=1}^N {x_i}$
  • 따라서 Log-Likelihood는 $$ \begin{eqnarray*} \log L &=& N_1 \log\theta + (N-N_1) \log(1 - \theta) \\ \end{eqnarray*} $$

  • Log-Likelihood Derivative

$$ \begin{eqnarray*} \dfrac{\partial \log L}{\partial \theta} &=& \dfrac{\partial}{\partial \theta} \big\{ N_1 \log\theta + (N-N_1) \log(1 - \theta) \big\} = 0\\ &=& \dfrac{N_1}{\theta} - \dfrac{N-N_1}{1-\theta} = 0 \\ \end{eqnarray*} $$$$ \dfrac{N_1}{\theta} = \dfrac{N-N_1}{1-\theta} $$$$ \dfrac{1-\theta}{\theta} = \dfrac{N-N_1}{N_1} $$$$ \dfrac{1}{\theta} - 1 = \dfrac{N}{N_1} - 1 $$$$ \theta= \dfrac{N_1}{N} $$

In [5]:
np.random.seed(0)
theta0 = 0.6
x = sp.stats.bernoulli(theta0).rvs(1000)
N0, N1 = np.bincount(x, minlength=2)
N = N0 + N1
theta = N1/N
theta


Out[5]:
0.60999999999999999

카테고리 분포의 모수 추정

  • 각각의 시도 $x_i$에 대한 확률은 카테고리 분포
$$ P(x ; \theta ) = \text{Cat}(x ; \theta) = \prod_{k=1}^K \theta_k^{x_k} $$$$ \sum_{k=1}^K \theta_k = 1 $$
  • $N$개의 독립 샘플 $x_{1:N}$ 이 있는 경우, $$ L(\theta ; x_{1:N}) = P(x_{1:N};\theta) = \prod_{i=1}^N \prod_{k=1}^K \theta_k^{x_{i,k}} $$

  • Log-Likelihood $$ \begin{eqnarray*} \log L &=& \log P(x_{1:N};\theta) \\ &=& \sum_{i=1}^N \sum_{k=1}^K {x_{i,k}} \log\theta_k \\ &=& \sum_{k=1}^K \log\theta_k \sum_{i=1}^N {x_{i,k}} \end{eqnarray*} $$

  • $x_k$가 나온 횟수 $N_k = \sum_{i=1}^N {x_{i,k}}$이라고 표시
  • 따라서 Log-Likelihood는 $$ \begin{eqnarray*} \log L &=& \sum_{k=1}^K \log\theta_k N_k \end{eqnarray*} $$

  • 추가 조건 $$ \sum_{k=1}^K \theta_k = 1 $$

  • Log-Likelihood Derivative with Lagrange multiplier
$$ \begin{eqnarray*} \dfrac{\partial \log L}{\partial \theta_k} &=& \dfrac{\partial}{\partial \theta_k} \left\{ \sum_{k=1}^K \log\theta_k N_k + \lambda \left(1- \sum_{k=1}^K \theta_k\right) \right\} = 0 \\ \dfrac{\partial \log L}{\partial \lambda} &=& \dfrac{\partial}{\partial \lambda} \left\{ \sum_{k=1}^K \log\theta_k N_k + \lambda \left(1- \sum_{k=1}^K \theta_k \right) \right\} = 0\\ \end{eqnarray*} $$$$ \dfrac{N_1}{\theta_1} = \dfrac{N_2}{\theta_2} = \cdots = \dfrac{N_K}{\theta_K} = \lambda $$$$ \sum_{k=1}^K N_k = N $$$$ \lambda \sum_{k=1}^K \theta_k = \lambda = N $$$$ \theta_k = \dfrac{N_k}{N} $$

In [6]:
np.random.seed(0)
theta0 = np.array([0.1, 0.3, 0.6])
x = np.random.choice(np.arange(3), 1000, p=theta0)
N0, N1, N2 = np.bincount(x, minlength=3)
N = N0 + N1 + N2
theta = np.array([N0, N1, N2]) / N
theta


Out[6]:
array([ 0.098,  0.317,  0.585])

정규 분포의 모수 추정

  • 각각의 시도 $x_i$에 대한 확률밀도는 가우시안 정규 분포
$$ p(x ; \theta ) = N(x ; \mu, \sigma^2) = \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp \left(-\dfrac{(x-\mu)^2}{2\sigma^2}\right) $$
  • $N$개의 독립 샘플 $x_{1:N}$ 이 있는 경우, $$ L(\theta;x_{1:N}) = p(x_{1:N};\theta) = \prod_{i=1}^N \dfrac{1}{\sqrt{2\pi\sigma^2}} \exp \left(-\dfrac{(x_i-\mu)^2}{2\sigma^2}\right)$$

  • Log-Likelihood $$ \begin{eqnarray*} \log L &=& \log p(x_{1:N};\theta) \\ &=& \sum_{i=1}^N \left\{ -\dfrac{1}{2}\log(2\pi\sigma^2) - \dfrac{(x_i-\mu)^2}{2\sigma^2} \right\} \\ &=& -\dfrac{N}{2} \log(2\pi\sigma^2) - \dfrac{1}{2\sigma^2}\sum_{i=1}^N (x_i-\mu)^2 \end{eqnarray*} $$

  • Log-Likelihood Derivative

$$ \begin{eqnarray*} \dfrac{\partial \log L}{\partial \mu} &=& \dfrac{\partial}{\partial \mu} \left\{ \dfrac{N}{2} \log(2\pi\sigma^2) + \dfrac{1}{2\sigma^2}\sum_{i=1}^N (x_i-\mu)^2 \right\} = 0 \\ \dfrac{\partial \log L}{\partial \sigma^2} &=& \dfrac{\partial}{\partial \sigma^2} \left\{ \dfrac{N}{2} \log(2\pi\sigma^2) + \dfrac{1}{2\sigma^2}\sum_{i=1}^N (x_i-\mu)^2 \right\} = 0\\ \end{eqnarray*} $$$$ \dfrac{2}{2\sigma^2}\sum_{i=1}^N (x_i-\mu) = 0 $$$$ N \mu = \sum_{i=1}^N x_i $$$$ \mu = \dfrac{1}{N}\sum_{i=1}^N x_i = \bar{x} $$$$ \dfrac{N}{2\sigma^2 } - \dfrac{1}{2(\sigma^2)^2}\sum_{i=1}^N (x_i-\mu)^2 = 0 $$$$ \sigma^2 = \dfrac{1}{N}\sum_{i=1}^N (x_i-\mu)^2 = \dfrac{1}{N}\sum_{i=1}^N (x_i-\bar{x})^2 = s^2 $$

In [7]:
np.random.seed(0)
mu0 = 1
sigma0 = 2
x = sp.stats.norm(mu0, sigma0).rvs(1000)
xbar = x.mean()
s2 = x.std(ddof=1)
xbar, s2


Out[7]:
(0.90948658501960922, 1.9750540913890255)

다변수 정규 분포의 모수 추정

MLE for Multivariate Gaussian Normal Distribution

  • 각각의 시도 $x_i$에 대한 확률은 다변수 정규 분포
$$ p(x ; \theta ) = N(x ; \mu, \Sigma) = \dfrac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp \left( -\dfrac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu) \right) $$
  • $N$개의 독립 샘플 $x_{1:N}$ 이 있는 경우, $$ L(\theta;x_{1:N}) = p(x_{1:N};\theta) = \prod_{i=1}^N \dfrac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp \left( -\dfrac{1}{2} (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) \right)$$

  • Log-Likelihood $$ \begin{eqnarray*} \log L &=& \log P(x_{1:N};\theta) \\ &=& \sum_{i=1}^N \left\{ -\log((2\pi)^{D/2} |\Sigma|^{1/2}) - \dfrac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu) \right\} \\ &=& C -\dfrac{N}{2} \log|\Sigma| - \dfrac{1}{2} \sum (x-\mu)^T \Sigma^{-1} (x-\mu) \end{eqnarray*} $$

  • precision matrix $\Lambda = \Sigma^{-1}$
$$ \begin{eqnarray*} \log L &=& C + \dfrac{N}{2} \log|\Lambda| - \dfrac{1}{2} \sum(x-\mu)^T \Lambda (x-\mu) \end{eqnarray*} $$$$ \dfrac{\partial L}{\partial \mu} = - \dfrac{\partial}{\partial \mu} \sum_{i=1}^N (x_i-\mu)^T \Lambda (x_i-\mu) = \sum_{i=1}^N 2\Lambda (x_i - \mu) = 0 $$$$ \mu = \dfrac{1}{N}\sum_{i=1}^N x_i $$$$ \dfrac{\partial L}{\partial \Lambda} = \dfrac{\partial}{\partial \Lambda} \dfrac{N}{2} \log|\Lambda| - \dfrac{\partial}{\partial \Lambda} \dfrac{1}{2} \sum_{i=1}^N \text{tr}( (x_i-\mu)(x_i-\mu)^T\Lambda) =0 $$$$ \dfrac{N}{2} \Lambda^{-T} = \dfrac{1}{2}\sum_{i=1}^N (x_i-\mu)(x_i-\mu)^T $$

$$ \Sigma = \dfrac{1}{N}\sum_{i=1}^N (x_i-\mu)(x_i-\mu)^T $$


In [8]:
np.random.seed(0)
mu0 = np.array([0, 1])
sigma0 = np.array([[1, 0.2], [0.2, 4]])
x = sp.stats.multivariate_normal(mu0, sigma0).rvs(1000)
xbar = x.mean(axis=0)
S2 = np.cov(x, rowvar=0)
print(xbar)
print(S2)


[-0.0126996   0.95720206]
[[ 0.96100921  0.16283508]
 [ 0.16283508  3.80507694]]