Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva
http://www.fer.unizg.hr/predmet/su
Ak. god. 2015./2016.
(c) 2015 Jan Šnajder
Verzija: 0.7 (2015-10-31)
In [1]:
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas as pd
%pylab inline
MAP-hipoteza: \begin{align*} h : \mathcal{X} &\to \{\mathcal{C}_1, \mathcal{C}_2,\dots, \mathcal{C}_K\}\\ h(\mathbf{x})&=\displaystyle\mathrm{argmax}_{\mathcal{C}_k}\ p(\mathbf{x}|\mathcal{C}_k) P(\mathcal{C}_k) \end{align*}
Pouzdanost klasifikacije u $\mathcal{C}_j$: \begin{align*} h_j : \mathcal{X} &\to [0,\infty)\\ h_j(\mathbf{x})&=p(\mathbf{x}|\mathcal{C}_k) P(\mathcal{C}_k) \end{align*}
Vjerojatnost klasifikacije u $\mathcal{C}_j$: \begin{align*} h_j : \mathcal{X} &\to [0,1]\\ h_j(\mathbf{x})&=P(\mathcal{C}_k|\mathbf{x}) \end{align*}
Pretpostavimo da primjeri u stvarnosti dolaze iz dva područja:
Vjerojatnost pogrešne klasifikacije:
h(\mathbf{x}=x_1,\dots,x_n) &= \mathrm{argmax}_{j}\ P(\mathbf{x}=x_1,\dots,x_n|y=\mathcal{C}_j)P(y = \mathcal{C}_j)
\end{align*}Naivan Bayesov klasifikator: $$ h(x_1,\dots,x_n) = \mathrm{argmax}_j\ P(\mathcal{C}_j)\prod_{k=1}^n P(x_k|\mathcal{C}_j) $$
ML-procjena: $$ \hat{P}(x_k|\mathcal{C}_j)=\frac{\sum_{i=1}^N\mathbf{1}\big\{x^{(i)}_k=x_k \land y^{(i)}=\mathcal{C}_j\big\}} {\sum_{i=1}^N \mathbf{1}\{y^{(i)} = \mathcal{C}_j\}} = \frac{N_{kj}}{N_j} $$
Laplaceov procjenitelj: $$ \hat{P}(x_k|\mathcal{C}_j)=\frac{\sum_{i=1}^N\mathbf{1}\big\{x^{(i)}_k=x_k \land y^{(i)}=\mathcal{C}_j\big\} + \lambda} {\sum_{i=1}^N \mathbf{1}\{y^{(i)} = \mathcal{C}_j\} + \lambda K_k} = \frac{N_{kj}+\lambda}{N_j+\lambda K_k} $$
Vrijedi li općenito nezavisnost $x_i\bot x_k|\mathcal{C}_j\ (i\neq k)$?
Primjer: Klasifikacija teksta
In [2]:
q101 = pd.read_csv("http://www.fer.unizg.hr/_download/repository/questions101-2014.csv", comment='#')
In [38]:
q101[:20]
Out[38]:
In [41]:
q101[['Q7','Q101','Q97','Q4']][:20]
Out[41]:
In [42]:
X = q101[['Q7','Q101','Q97']][:20].as_matrix()
y = q101['Q4'][:20].as_matrix()
In [43]:
# Apriorna vjerojatnost klase: P(C_j)
def class_prior(y, label):
N = len(y)
return len(y[y==label]) / float(len(y))
# Izglednost klase: P(x_i|C_j)
def class_likelihood(X, y, feature_ix, value, label):
N = len(X)
y_ix = y==label
Nj = len(y[y_ix])
Nkj = len(X[sp.logical_and(y_ix, X[:,feature_ix]==value)])
return (Nkj + 1) / (float(Nj) + 2) # Laplace smoothed
In [44]:
p_Psi = class_prior(y, 'Psi')
p_Psi
Out[44]:
In [45]:
p_Macke = class_prior(y, 'Mačke')
p_Macke
Out[45]:
In [46]:
p_Messi_Psi = class_likelihood(X, y, 0, 'Messi', 'Psi')
p_Messi_Psi
Out[46]:
In [47]:
p_Ronaldo_Psi = class_likelihood(X, y, 0, 'Ronaldo', 'Psi')
p_Ronaldo_Psi
Out[47]:
In [48]:
class_prior(y, 'Psi') \
* class_likelihood(X, y, 0, 'Messi', 'Psi') \
* class_likelihood(X, y, 1, 'Batman', 'Psi') \
* class_likelihood(X, y, 2, 'Tenisice', 'Psi') \
Out[48]:
In [49]:
class_prior(y, 'Mačke') \
* class_likelihood(X, y, 0, 'Messi', 'Mačke') \
* class_likelihood(X, y, 1, 'Batman', 'Mačke') \
* class_likelihood(X, y, 2, 'Tenisice', 'Mačke') \
Out[49]:
&\{ \{a\}, \{b\}, \{c\} \}\\
&\{ \{a\}, \{b, c\} \}\\
&\{ \{b\}, \{a, c\} \}\\
&\{ \{c\}, \{a, b\} \}\\
&\{ \{a, b, c\} \}
\end{align*}
$$
Bellov broj: $B3=5, B{4} = 15, B{5} = 52, \dots, B{10} = 115975, \dots$Entropija $$ H(P) = -\sum_x P(x) \ln P(x) $$
Unakrsna entropija: $$ H(P,Q) = -\sum_x P(x) \ln Q(x) $$
Relativa entropija $P(x)$ u odnosu na $Q(x)$: $$ \begin{align*} H(P,Q) - H(P) =& -\sum_x P(x)\ln Q(x) - \big(-\sum_x P(x)\ln P(x) \big) =\\ &-\sum_x P(x)\ln Q(x) + \sum_x P(x)\ln P(x) =\\ &-\sum_x P(x)\ln \frac{P(x)}{Q(x)} = \color{red}{D_{\mathrm{KL}}(P||Q)}\\ \end{align*} $$ $\Rightarrow$ Kullback-Leiblerova divergencija
Uzajamna informacija ili uzajamni sadržaj informacije (engl. mutual information): $$ I(x,y) = D_\mathrm{KL}\big(P(x,y) || P(x) P(y)\big) = \sum_{x,y} P(x,y) \ln\frac{P(x,y)}{P(x)P(y)} $$
$I(x, y) = 0$ akko su $x$ i $y$ nezavisne varijable, inače $I(x,y) > 0$
In [14]:
from sklearn.metrics import mutual_info_score
In [15]:
X = stats.bernoulli.rvs(0.5, size=100)
Y = stats.bernoulli.rvs(0.2, size=100)
In [16]:
mutual_info_score(X, Y)
Out[16]:
In [17]:
mutual_info_score(X, X)
Out[17]:
In [18]:
X = stats.bernoulli.rvs(0.5, size=100)
Y = [(sp.random.randint(2) if x==1 else 0) for x in X ]
In [19]:
mutual_info_score(X, Y)
Out[19]:
In [20]:
likelihood_c1 = stats.norm(110, 5)
likelihood_c2 = stats.norm(150, 20)
likelihood_c3 = stats.norm(180, 10)
xs = linspace(70, 200, 200)
plt.plot(xs, likelihood_c1.pdf(xs), label='p(x|C_1)')
plt.plot(xs, likelihood_c2.pdf(xs), label='p(x|C_2)')
plt.plot(xs, likelihood_c3.pdf(xs), label='p(x|C_3)')
plt.legend()
plt.show()
Model: $$ h_j(x) = p(x,\mathcal{C}_j) = p(x|\mathcal{C}_j)P(\mathcal{C}_j) $$
Radi matematičke jednostavnosti, prelazimo u logaritamsku domenu: \begin{align*} h_j(x) & = \ln p(x|\mathcal{C}_j) + \ln P(\mathcal{C}_j)\
&=
\color{gray}{-\frac{1}{2}\ln 2\pi}
Uklanajnje konstante (ne utječe na maksimizaciju): $$ h_j(x|\boldsymbol{\theta}_j) = - \ln\hat{\sigma}_j
ML-procjene parametara:
In [21]:
likelihood_c1 = stats.norm(100, 5)
likelihood_c2 = stats.norm(150, 20)
plt.plot(xs, likelihood_c1.pdf(xs), label='p(x|C_1)')
plt.plot(xs, likelihood_c2.pdf(xs), label='p(x|C_2)')
plt.legend()
plt.show()
In [22]:
p_c1 = 0.3
p_c2 = 0.7
def joint_x_c1(x) : return likelihood_c1.pdf(x) * p_c1
def joint_x_c2(x) : return likelihood_c2.pdf(x) * p_c2
plt.plot(xs, joint_x_c1(xs), label='p(x, C_1)')
plt.plot(xs, joint_x_c2(xs), label='p(x, C_2)')
plt.legend()
plt.show()
In [23]:
def p_x(x) : return joint_x_c1(x) + joint_x_c2(x)
plt.plot(xs, p_x(xs), label='p(x)')
plt.legend()
plt.show()
In [24]:
def posterior_c1(x) : return joint_x_c1(x) / p_x(x)
def posterior_c2(x) : return joint_x_c2(x) / p_x(x)
plt.plot(xs, posterior_c1(xs), label='p(C_1|x)')
plt.plot(xs, posterior_c2(xs), label='p(C_2|x)')
plt.legend()
plt.show()
In [25]:
mu_1 = [-2, 1]
mu_2 = [2, 0]
covm_1 = sp.array([[1, 1], [1, 3]])
covm_2 = sp.array([[2, -0.5], [-0.5, 1]])
p_c1 = 0.4
p_c2 = 0.6
likelihood_c1 = stats.multivariate_normal(mu_1, covm_1)
likelihood_c2 = stats.multivariate_normal(mu_2, covm_2)
In [26]:
x = np.linspace(-5, 5)
y = np.linspace(-5, 5)
X, Y = np.meshgrid(x, y)
XY = np.dstack((X,Y))
In [35]:
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1)
plt.contour(X, Y, likelihood_c2.pdf(XY) * p_c2);
In [28]:
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1, cmap='gray_r')
plt.contour(X, Y, likelihood_c2.pdf(XY) * p_c2, cmap='gray_r')
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1 - likelihood_c2.pdf(XY) * p_c2, levels=[0], colors='r', linewidths=2);
In [29]:
mu_1 = [-2, 1]
mu_2 = [2, 0]
covm_1 = sp.array([[1, 1], [1, 3]])
covm_2 = sp.array([[2, -0.5], [-0.5, 1]])
p_c1 = 0.4
p_c2 = 0.6
covm_shared = (p_c1 * covm_1 + p_c2 * covm_2) / 2
likelihood_c1 = stats.multivariate_normal(mu_1, covm_shared)
likelihood_c2 = stats.multivariate_normal(mu_2, covm_shared)
In [30]:
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1, cmap='gray_r')
plt.contour(X, Y, likelihood_c2.pdf(XY) * p_c2, cmap='gray_r')
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1 - likelihood_c2.pdf(XY) * p_c2, levels=[0], colors='r', linewidths=2);
&\Rightarrow
-\frac{1}{2}\sum_{i=1}^n\Big(\frac{xi-\mu{ij}}{\sigma_i}\Big)^2 + \ln \Pr{\mathcal{C}_j}
\end{align*}
In [31]:
mu_1 = [-2, 1]
mu_2 = [2, 0]
p_c1 = 0.4
p_c2 = 0.6
covm_shared_diagonal = [[2,0],[0,1]]
likelihood_c1 = stats.multivariate_normal(mu_1, covm_shared_diagonal)
likelihood_c2 = stats.multivariate_normal(mu_2, covm_shared_diagonal)
In [32]:
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1, cmap='gray_r')
plt.contour(X, Y, likelihood_c2.pdf(XY) * p_c2, cmap='gray_r')
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1 - likelihood_c2.pdf(XY) * p_c2, levels=[0], colors='r', linewidths=2);
In [33]:
mu_1 = [-2, 1]
mu_2 = [2, 0]
p_c1 = 0.4
p_c2 = 0.6
covm_shared_diagonal = [[1,0],[0,1]]
likelihood_c1 = stats.multivariate_normal(mu_1, covm_shared_diagonal)
likelihood_c2 = stats.multivariate_normal(mu_2, covm_shared_diagonal)
In [34]:
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1, cmap='gray_r')
plt.contour(X, Y, likelihood_c2.pdf(XY) * p_c2, cmap='gray_r')
plt.contour(X, Y, likelihood_c1.pdf(XY) * p_c1 - likelihood_c2.pdf(XY) * p_c2, levels=[0], colors='r', linewidths=2);
\begin{align*}
\mathcal{H} &= \big\{h(\mathbf{x}|\boldsymbol{\theta})\big\}_{\boldsymbol{\theta}}\\
h(\mathbf{x}|\boldsymbol{\theta})
&=\big(h_1(\mathbf{x}|\boldsymbol{\theta}_1),\dots,h_K(\mathbf{x}|\boldsymbol{\theta}_K)\big)\\
\boldsymbol{\theta} &= (\boldsymbol{\theta}_1,\dots,\boldsymbol{\theta}_K)\
h_j(\mathbf{x}|\boldsymbol{\theta}_j) &= \ln p(\mathbf{x}|\boldsymbol{\mu}_j,\boldsymbol{\Sigma}_j) + \ln P(\mathcal{C}_j)\
\boldsymbol{\theta}_j & = (\boldsymbol{\mu}_j,\boldsymbol{\Sigma}_j,P(\mathcal{C}_j))\\
\end{align*}
$$generativan - diskriminativan ?
parametarski - neparametarski ?
linearan - nelinearan ?
pristanost jezika - pristranost pretraživanja ?
(3) Optimizacijski postupak
(2) Funkcija gubitka $L$