Sveučilište u Zagrebu
Fakultet elektrotehnike i računarstva
http://www.fer.unizg.hr/predmet/su
Ak. god. 2015./2016.
(c) 2015 Jan Šnajder
Verzija: 0.2 (2015-11-16)
In [2]:
import scipy as sp
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas as pd
%pylab inline
In [4]:
def sigm(x): return 1 / (1 + sp.exp(-x))
xs = sp.linspace(-10, 10)
plt.plot(xs, sigm(xs));
In [6]:
plt.plot(xs, sigm(0.5*xs), 'r');
plt.plot(xs, sigm(xs), 'g');
plt.plot(xs, sigm(2*xs), 'b');
In [3]:
xs = linspace(0, 1)
plt.plot(xs, -sp.log(xs));
In [4]:
plt.plot(xs, 1 - sp.log(1 - xs));
$\Rightarrow$ pogreška unakrsne entropije (engl. cross-entropy error)
$\Rightarrow$ gubitak unakrsne entropije (engl. cross-entropy loss)
In [14]:
def cross_entropy_loss(h_x, y):
return -y * sp.log(h_x) - (1 - y) * sp.log(1 - h_x)
In [24]:
xs = linspace(0, 1)
plt.plot(xs, cross_entropy_loss(xs, 0), label='y=0')
plt.plot(xs, cross_entropy_loss(xs, 1), label='y=1')
plt.ylabel('$L(h(\mathbf{x}),y)$')
plt.xlabel('$h(\mathbf{x}) = \sigma(w^\intercal\mathbf{x}$)')
plt.legend()
plt.show()
In [26]:
#TODO: konkretan primjer u ravnini
Minimiziramo gradijentnim spustom: $$ \nabla E(\mathbf{w}) = \sum_{i=1}^N \nabla L\big(h(\mathbf{x}^{(i)}|\mathbf{w}),y^{(i)}\big) $$
Prisjetimo se: $$ \frac{\partial\sigma(\alpha)}{\partial\alpha} = \sigma(\alpha)\big(1 - \sigma(\alpha)\big) $$
Dobivamo: $$ \nabla L\big(h(\mathbf{x}),y\big) = \Big(-\frac{y}{h(\mathbf{x})} + \frac{1-y}{1-h(\mathbf{x})}\Big)h(\mathbf{x})\big(1-h(\mathbf{x})\big) \tilde{\mathbf{x}} = \big(h(\mathbf{x})-y\big)\tilde{\mathbf{x}} $$
$\mathbf{w} \gets (0,0,\dots,0)$
ponavljaj do konvergencije
$\quad \Delta\mathbf{w} \gets (0,0,\dots,0)$
$\quad$ za $i=1,\dots, N$
$\qquad h \gets \sigma(\mathbf{w}^\intercal\tilde{\mathbf{x}}^{(i)})$
$\qquad \Delta \mathbf{w} \gets \Delta\mathbf{w} + (h-y^{(i)})\, \tilde{\mathbf{x}}^{(i)}$
$\quad \mathbf{w} \gets \mathbf{w} - \eta \Delta\mathbf{w} $
$\mathbf{w} \gets (0,0,\dots,0)$
ponavljaj do konvergencije
$\quad$ (slučajno permutiraj primjere u $\mathcal{D}$)
$\quad$ za $i=1,\dots, N$
$\qquad$ $h \gets \sigma(\mathbf{w}^\intercal\tilde{\mathbf{x}}^{(i)})$
$\qquad$ $\mathbf{w} \gets \mathbf{w} - \eta (h-y^{(i)})\tilde{\mathbf{x}}^{(i)}$
In [ ]:
#TODO kod + primjer
Korekcija težina: $$ \mathbf{w} \gets \mathbf{w} - \eta\Big( \sum_{i=1}^N\big(h(\mathbf{x}^{(i)}) - y^{(i)}\big)\mathbf{x}^{(i)} + \color{red}{\lambda \mathbf{w}}\Big) $$
Ekvivalentno: $$ \mathbf{w} \gets \mathbf{w}(1\color{red}{-\eta\lambda}) - \eta \sum_{i=1}^N\big(h(\mathbf{x}^{(i)}) - y^{(i)}\big)\mathbf{x}^{(i)} $$ gdje $\mathbf{w}(1-\eta\lambda)$ uzrokuje prigušenje težina (engl. weight decay)
$\mathbf{w} \gets (0,0,\dots,0)$
ponavljaj do konvergencije
$\quad \color{red}{\Delta w_0 \gets 0}$
$\quad \Delta\mathbf{w} \gets (0,0,\dots,0)$
$\quad$ za $i=1,\dots, N$
$\qquad h \gets \sigma(\mathbf{w}^\intercal\tilde{\mathbf{x}}^{(i)})$
$\qquad \color{red}{\Delta w_0 \gets \Delta w_0 + h-y^{(i)}}$
$\qquad \Delta\mathbf{w} \gets \Delta\mathbf{w} + (h-y^{(i)})\mathbf{x}^{(i)}$
$\quad \color{red}{w_0 \gets w_0 - \eta \Delta w_0}$
$\quad \mathbf{w} \gets \mathbf{w}(1\color{red}{-\eta\lambda}) - \eta \Delta\mathbf{w}$
$\mathbf{w} \gets (0,0,\dots,0)$
ponavljaj do konvergencije:
$\quad$ (slučajno permutiraj primjere u $\mathcal{D}$)
$\quad$ za $i=1,\dots, N$
$\qquad h \gets \sigma(\mathbf{w}^\intercal\mathbf{x}^{(i)})$
$\qquad \color{red}{w_0 \gets w_0 - \eta (h-y^{(i)})}$
$\qquad \mathbf{w} \gets \mathbf{w}(1\color{red}{-\eta\lambda}) - \eta (h-y^{(i)})\mathbf{x}^{(i)}$
TODO
TODO