Yapay Öğrenmeye Giriş I

Ali Taylan Cemgil

Parametrik Regresyon, Parametrik Fonksyon Oturtma Problemi (Parametric Regression, Function Fitting)

Verilen girdi ve çıktı ikilileri $x, y$ için parametrik bir fonksyon $f$ oturtma problemi.

Parametre $w$ değerlerini öyle bir seçelim ki $$ y \approx f(x; w) $$

$x$: Girdi (Input)

$y$: Çıktı (Output)

$w$: Parametre (Weight, ağırlık)

$e$: Hata

Örnek 1: $$ e = y - f(x) $$

Örnek 2: $$ e = \frac{y}{f(x)}-1 $$

$E$, $D$: Hata fonksyonu (Error function), Iraksay (Divergence)

Doğrusal Regresyon (Linear Regression)

Oturtulacak $f$ fonksyonun model parametreleri $w$ cinsinden doğrusal olduğu durum (Girdiler $x$ cinsinden doğrusal olması gerekmez).

Tanım: Doğrusallık

Bir $g$ fonksyonu doğrusaldır demek, herhangi skalar $a$ ve $b$ içn $$ g(aw_1 + b w_2) = a g(w_1) + b g(w_2) $$ olması demektir.

Örnek: Doğru oturtmak (Line Fitting)

  • Girdi-Çıktı ikilileri $$ (x_i, y_i) $$ $i=1\dots N$

  • Model $$ y_i \approx f(x; w_1, w_0) = w_0 + w_1 x $$

$x$ : Girdi

$w_1$: Eğim

$w_0$: Kesişme

$f_i \equiv f(x_i; w_1, w_0)$

Örnek 2: Parabol Oturtma

  • Girdi-Çıktı ikilileri $$ (x_i, y_i) $$ $i=1\dots N$

  • Model $$ y_i \approx f(x_i; w_2, w_1, w_0) = w_0 + w_1 x_i + w_2 x_i^2 $$

$x$ : Girdi

$w_2$: Karesel terimin katsayısı

$w_1$: Doğrusal terimin katsayısı

$w_0$: Sabit terim katsayısı

$f_i \equiv f(x_i; w_2, w_1, w_0)$

Bir parabol $x$'in doğrusal fonksyonu değil ama $w_2, w_1, w_0$ parametrelerinin doğrusal fonksyonu.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import matplotlib.pylab as plt
from IPython.display import clear_output, display, HTML

x = np.array([8.0 , 6.1 , 11.,  7.,   9.,   12. , 4.,   2.,   10,    5,    3])
y = np.array([6.04, 4.95, 5.58, 6.81, 6.33, 7.96, 5.24, 2.26, 8.84, 2.82, 3.68])

def plot_fit(w1, w0):
    f = w0 + w1*x

    plt.figure(figsize=(4,3))
    plt.plot(x,y,'sk')
    plt.plot(x,f,'o-r')
    #plt.axis('equal')
    plt.xlim((0,15))
    plt.ylim((0,10))
    for i in range(len(x)):
        plt.plot((x[i],x[i]),(f[i],y[i]),'b')
#    plt.show()
#    plt.figure(figsize=(4,1))
    plt.bar(x,(f-y)**2/2)
    plt.title('Toplam kare hata = '+str(np.sum((f-y)**2/2)))
    plt.ylim((0,10))
    plt.xlim((0,15))
    plt.show()
    
plot_fit(0.0,3.79)



In [63]:
interact(plot_fit, w1=(-2, 2, 0.01), w0=(-5, 5, 0.01));


Rasgele Arama


In [62]:
x = np.array([8.0 , 6.1 , 11.,  7.,   9.,   12. , 4.,   2.,   10,    5,    3])
y = np.array([6.04, 4.95, 5.58, 6.81, 6.33, 7.96, 5.24, 2.26, 8.84, 2.82, 3.68])


def hata(y, x, w):
    N = len(y)
    f = x*w[1]+w[0]
    e = y-f
    return np.sum(e*e)/2


w = np.array([0, 0])
E = hata(y, x, w)

for e in range(1000):
    g = 0.1*np.random.randn(2)   
    w_temp = w + g
    E_temp = hata(y, x, w_temp)
    if E_temp<E:
        E = E_temp
        w = w_temp
        #print(e, E)
print(e, E)
w


999 6.88573142353
Out[62]:
array([ 2.01760086,  0.49685693])

Gerçek veri: Türkiyedeki araç sayıları


In [6]:
%matplotlib inline

import scipy as sc
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pylab as plt

df_arac = pd.read_csv(u'data/arac.csv',sep=';')
df_arac[['Year','Car']]
#df_arac


Out[6]:
Year Car
0 1966 91469
1 1967 112367
2 1968 125375
3 1969 137345
4 1970 137771
5 1971 153676
6 1972 187272
7 1973 240360
8 1974 313160
9 1975 403546
10 1976 488894
11 1977 560424
12 1978 624438
13 1979 688687
14 1980 742252
15 1981 776432
16 1982 811465
17 1983 856350
18 1984 919577
19 1985 983444
20 1986 1087234
21 1987 1193021
22 1988 1310257
23 1989 1434830
24 1990 1649879
25 1991 1864344
26 1992 2181388
27 1993 2619852
28 1994 2861640
29 1995 3058511
30 1996 3274156
31 1997 3570105
32 1998 3838288
33 1999 4072326
34 2000 4422180
35 2001 4534803
36 2002 4600140
37 2003 4700343
38 2004 5400440
39 2005 5772745
40 2006 6140992
41 2007 6472156
42 2008 6796629
43 2009 7093964
44 2010 7544871
45 2011 8113111
46 2012 8648875
47 2013 9283923
48 2014 9857915
49 2015 10589337
50 2016 11317998
51 2017 12035978

In [7]:
BaseYear = 1995
x = np.matrix(df_arac.Year[0:]).T-BaseYear
y = np.matrix(df_arac.Car[0:]).T/1000000.

plt.plot(x+BaseYear, y, 'o-')
plt.xlabel('Yil')
plt.ylabel('Araba (Milyon)')

plt.show()



In [8]:
%matplotlib inline
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import matplotlib.pylab as plt
from IPython.display import clear_output, display, HTML


w_0 = 0.27150786
w_1 = 0.37332256

BaseYear = 1995
x = np.matrix(df_arac.Year[0:]).T-BaseYear
y = np.matrix(df_arac.Car[0:]).T/1000000.

fig, ax = plt.subplots()

f = w_1*x + w_0
plt.plot(x+BaseYear, y, 'o-')
ln, = plt.plot(x+BaseYear, f, 'r')


plt.xlabel('Years')
plt.ylabel('Number of Cars (Millions)')
ax.set_ylim((-2,13))
plt.close(fig)

def set_line(w_1, w_0):

    f = w_1*x + w_0
    e = y - f

    ln.set_ydata(f)
    ax.set_title('Total Error = {} '.format(np.asscalar(e.T*e/2)))
    display(fig)

set_line(0.32,3)



In [9]:
interact(set_line, w_1=(-2, 2, 0.01), w_0=(-5, 5, 0.01));



In [17]:
w_0 = 0.27150786
w_1 = 0.37332256
w_2 = 0.1

BaseYear = 1995
x = np.array(df_arac.Year[0:]).T-BaseYear
y = np.array(df_arac.Car[0:]).T/1000000.

fig, ax = plt.subplots()

f = w_2*x**2 + w_1*x + w_0
plt.plot(x+BaseYear, y, 'o-')
ln, = plt.plot(x+BaseYear, f, 'r')


plt.xlabel('Yıl')
plt.ylabel('Araba Sayısı (Milyon)')
ax.set_ylim((-2,13))
plt.close(fig)

def set_line(w_2, w_1, w_0):
    f = w_2*x**2 + w_1*x + w_0
    e = y - f
    ln.set_ydata(f)
    ax.set_title('Ortalama Kare Hata = {} '.format(np.sum(e*e/len(e))))
    display(fig)

set_line(w_2, w_1, w_0)



In [18]:
interact(set_line, w_2=(-0.1,0.1,0.001), w_1=(-2, 2, 0.01), w_0=(-5, 5, 0.01))


Out[18]:
<function __main__.set_line>

Örnek 1, devam: Modeli Öğrenmek

  • Öğrenmek: parametre kestirimi $w = [w_0, w_1]$

  • Genelde model veriyi hatasız açıklayamayacağı için her veri noktası için bir hata tanımlıyoruz:

$$e_i = y_i - f(x_i; w)$$
  • Toplam kare hata
$$ E(w) = \frac{1}{2} \sum_i (y_i - f(x_i; w))^2 = \frac{1}{2} \sum_i e_i^2 $$
  • Toplam kare hatayı $w_0$ ve $w_1$ parametrelerini değiştirerek azaltmaya çalışabiliriz.

  • Hata yüzeyi


In [12]:
from itertools import product

BaseYear = 1995
x = np.matrix(df_arac.Year[0:]).T-BaseYear
y = np.matrix(df_arac.Car[0:]).T/1000000.

# Setup the vandermonde matrix
N = len(x)
A = np.hstack((np.ones((N,1)), x))

left = -5
right = 15
bottom = -4
top = 6
step = 0.05
W0 = np.arange(left,right, step)
W1 = np.arange(bottom,top, step)

ErrSurf = np.zeros((len(W1),len(W0)))

for i,j in product(range(len(W1)), range(len(W0))):
    e = y - A*np.matrix([W0[j], W1[i]]).T
    ErrSurf[i,j] = e.T*e/2

plt.figure(figsize=(7,7))
plt.imshow(ErrSurf, interpolation='nearest', 
           vmin=0, vmax=1000,origin='lower',
           extent=(left,right,bottom,top),cmap='Blues_r')
plt.xlabel('w0')
plt.ylabel('w1')
plt.title('Error Surface')
plt.colorbar(orientation='horizontal')
plt.show()


Modeli Nasıl Kestirebiliriz?

Fikir: En küçük kare hata

(Gauss 1795, Legendre 1805)

  • Toplam hatanın $w_0$ ve $w_1$'e göre türevini hesapla, sıfıra eşitle ve çıkan denklemleri çöz
\begin{eqnarray} \left( \begin{array}{c} y_0 \\ y_1 \\ \vdots \\ y_{N-1} \end{array} \right) \approx \left( \begin{array}{cc} 1 & x_0 \\ 1 & x_1 \\ \vdots \\ 1 & x_{N-1} \end{array} \right) \left( \begin{array}{c} w_0 \\ w_1 \end{array} \right) \end{eqnarray}\begin{eqnarray} y \approx A w \end{eqnarray}

$A = A(x)$: Model Matrisi

$w$: Model Parametreleri

$y$: Gözlemler

  • Hata vektörü: $$e = y - Aw$$
\begin{eqnarray} E(w) & = & \frac{1}{2}e^\top e = \frac{1}{2}(y - Aw)^\top (y - Aw)\\ & = & \frac{1}{2}y^\top y - \frac{1}{2} y^\top Aw - \frac{1}{2} w^\top A^\top y + \frac{1}{2} w^\top A^\top Aw \\ & = & \frac{1}{2} y^\top y - y^\top Aw + \frac{1}{2} w^\top A^\top Aw \\ \end{eqnarray}

Gradyan

https://tr.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/the-gradient

\begin{eqnarray} \frac{d E}{d w } & = & \left(\begin{array}{c} \partial E/\partial w_0 \\ \partial E/\partial w_1 \\ \vdots \\ \partial E/\partial w_{K-1} \end{array}\right) \end{eqnarray}

Toplam hatanın gradyanı \begin{eqnarray} \frac{d}{d w }E(w) & = & \frac{d}{d w }(\frac{1}{2} y^\top y) &+ \frac{d}{d w }(- y^\top Aw) &+ \frac{d}{d w }(\frac{1}{2} w^\top A^\top Aw) \\ & = & 0 &- A^\top y &+ A^\top A w \\ & = & - A^\top (y - Aw) \\ & = & - A^\top e \\ & \equiv & \nabla E(w) \end{eqnarray}

Yapay zekaya gönül veren herkesin bilmesi gereken eşitlikler

  • Vektör iç çarpımının gradyeni \begin{eqnarray} \frac{d}{d w }(h^\top w) & = & h \end{eqnarray}

  • Karesel bir ifadenin gradyeni \begin{eqnarray} \frac{d}{d w }(w^\top K w) & = & (K+K^\top) w \end{eqnarray}

En küçük kare hata çözümü doğrusal modellerde doğrusal denklemlerin çözümü ile bulunabiliyor

\begin{eqnarray} w^* & = & \arg\min_{w} E(w) \end{eqnarray}
  • Eniyileme Şartı (gradyan sıfır olmalı )
\begin{eqnarray} \nabla E(w^*) & = & 0 \end{eqnarray}\begin{eqnarray} 0 & = & - A^\top y + A^\top A w^* \\ A^\top y & = & A^\top A w^* \\ w^* & = & (A^\top A)^{-1} A^\top y \end{eqnarray}
  • Geometrik (Projeksyon) yorumu:
\begin{eqnarray} f & = A w^* = A (A^\top A)^{-1} A^\top y \end{eqnarray}

In [82]:
# Solving the Normal Equations

# Setup the Design matrix
N = len(x)
A = np.hstack((np.ones((N,1)), x))

#plt.imshow(A, interpolation='nearest')
# Solve the least squares problem
w_ls,E,rank,sigma = np.linalg.lstsq(A, y)

print('Parametreler: \nw0 = ', w_ls[0],'\nw1 = ', w_ls[1] )
print('Toplam Kare Hata:', E/2)

f = np.asscalar(w_ls[1])*x + np.asscalar(w_ls[0])
plt.plot(x+BaseYear, y, 'o-')
plt.plot(x+BaseYear, f, 'r')


plt.xlabel('Yıl')
plt.ylabel('Araba sayısı (Milyon)')
plt.show()


Parametreler: 
w0 =  [[ 4.13258253]] 
w1 =  [[ 0.20987778]]
Toplam Kare Hata: [[ 37.19722385]]

Polinomlar

Parabol

\begin{eqnarray} \left( \begin{array}{c} y_0 \\ y_1 \\ \vdots \\ y_{N-1} \end{array} \right) \approx \left( \begin{array}{ccc} 1 & x_0 & x_0^2 \\ 1 & x_1 & x_1^2 \\ \vdots \\ 1 & x_{N-1} & x_{N-1}^2 \end{array} \right) \left( \begin{array}{c} w_0 \\ w_1 \\ w_2 \end{array} \right) \end{eqnarray}

$K$ derecesinde polinom

\begin{eqnarray} \left( \begin{array}{c} y_0 \\ y_1 \\ \vdots \\ y_{N-1} \end{array} \right) \approx \left( \begin{array}{ccccc} 1 & x_0 & x_0^2 & \dots & x_0^K \\ 1 & x_1 & x_1^2 & \dots & x_1^K\\ \vdots \\ 1 & x_{N-1} & x_{N-1}^2 & \dots & x_{N-1}^K \end{array} \right) \left( \begin{array}{c} w_0 \\ w_1 \\ w_2 \\ \vdots \\ w_K \end{array} \right) \end{eqnarray}\begin{eqnarray} y \approx A w \end{eqnarray}

$A = A(x)$: Model matrisi

$w$: Model Parametreleri

$y$: Gözlemler

Polinom oturtmada ortaya çıkan özel yapılı matrislere Vandermonde matrisleri de denmektedir.


In [13]:
x = np.array([10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5])
N = len(x)
x = x.reshape((N,1))
y = np.array([8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]).reshape((N,1))
#y = np.array([9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]).reshape((N,1))
#y = np.array([7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]).reshape((N,1))

def fit_and_plot_poly(degree):

    #A = np.hstack((np.power(x,0), np.power(x,1), np.power(x,2)))
    A = np.hstack((np.power(x,i) for i in range(degree+1)))
    # Setup the vandermonde matrix
    xx = np.matrix(np.linspace(np.asscalar(min(x))-1,np.asscalar(max(x))+1,300)).T
    A2 = np.hstack((np.power(xx,i) for i in range(degree+1)))

    #plt.imshow(A, interpolation='nearest')
    # Solve the least squares problem
    w_ls,E,rank,sigma = np.linalg.lstsq(A, y)
    f = A2*w_ls
    plt.plot(x, y, 'o')
    plt.plot(xx, f, 'r')

    plt.xlabel('x')
    plt.ylabel('y')

    plt.gca().set_ylim((0,20))
    #plt.gca().set_xlim((1950,2025))
    
    if E:
        plt.title('Mertebe = '+str(degree)+' Hata='+str(E[0]))
    else:
        plt.title('Mertebe = '+str(degree)+' Hata= 0')
        
    plt.show()

fit_and_plot_poly(0)



In [15]:
interact(fit_and_plot_poly, degree=(0,10))


Out[15]:
<function __main__.fit_and_plot_poly>

Overfit: Aşırı uyum