Softmax classification: Multinomial classfication

  • 여러개의 class가 있을 때 예측

Logistic regression 복습

  • $H_2(x)=WX$ :linear hypothesis
  • $z=H_2(x), g(z)$ :z로 놓고 0 or 1로 압축 할 수 있는 함수인
  • $g(z)=\frac{1}{1+e^{-z}}$ :Sigmoid를 g(z)로 사용
  • $H_{r}(x) = g(H_2(x))$ :Hypothesis

Multinomial classfication

  • Binary classfication으로 구분을 해낼 수 있음
  • 각각의 classfier를 가지기 보단 Vector를 추가하면 계산을 한번에 할 수 있음

$ \begin{bmatrix} w_{A1} & w_{A2} & w_{A3} \\ w_{B1} & w_{B2} & w_{B3} \\ w_{C1} & w_{C2} & w_{C3} \end{bmatrix} \times \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3} \end{bmatrix} = \begin{bmatrix} w_{A1}x_{1} + w_{A2}x_{2} + w_{A3}x_{3} \\ w_{B1}x_{1} + w_{B2}x_{2} + w_{B3}x_{3} \\ w_{C1}x_{1} + w_{C2}x_{2} + w_{C3}x_{3} \end{bmatrix} = \begin{bmatrix} \bar{y}_{A}\\ \bar{y}_{B}\\ \bar{y}_{C} \end{bmatrix} $

Sigmoid

  • sigmoid로 0~1 내로 압축하고
  • One-hot encoding: 그중에 가장 큰 값을 1.0으로 만들어서 선택

$ WX = Y \begin{bmatrix} 2.0 -> p 0.7 -> 1.0\\ 1.0 -> p 0.2 -> 0.0\\ 0.1 -> p 0.1 -> 0.0 \end{bmatrix} $

Softmax

  • 전체의 합이 1이 되도록 만들어 줌
  • Score -> probabilities

$S(y_i) = \frac{e^{y_1}}{\sum_{j}e^{y_j}}$

Cost function

  • Cross-entropy
  • element-wise 곱셈을 사용

$D(S,L) = -\sum_{i}L_{i}\log (s_{i}) = -\sum_{i}L_{i}\log (\bar{y}_{i}) = \sum_{i}L_{i} * - \log(\bar{y}_{i}) $

숙제는 logistic cost function과 cross entropy cost function이 결국은 같은데 이유에 대해서는 생각 해 볼 것

$C(H(x), y) = y\log(H(x)) - (1 - y)\log(1 - H(x))$

Gradient descent

  • cost function C를 최소화 하는 w vector를 찾기

$-\alpha \Delta C(w_1, w_2)$


In [8]:
import tensorflow as tf
import numpy as np
import os

print(os.getcwd())
xy = np.loadtxt('softmax.in', unpack=True, dtype='float')
x_data = np.transpose(xy[0:3])
y_data = np.transpose(xy[3:])

# input
X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 3])
# set model weights
W = tf.Variable(tf.zeros([3, 3]))

# Construct model
# X와 W의 위치를 바꾸어줬기 때문에 위에 x_data 읽어올때 transpose를 해줌
hypothesis = tf.nn.softmax(tf.matmul(X, W))

# learning rate
learning_rate = 0.001

# Cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), reduction_indices=1))

# Gradent Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Init
init = tf.global_variables_initializer()

# Launch
with tf.Session() as sess:
    sess.run(init)

    for step in range(2001):
        sess.run(optimizer, feed_dict={X: x_data, Y: y_data})
        if step % 200 == 0:
            print(step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W))

    # Test
    a = sess.run(hypothesis, feed_dict={X:[[1, 11, 7]]})
    print(a, sess.run(tf.argmax(a, 1)))

    b = sess.run(hypothesis, feed_dict={X:[[1, 3, 4]]})
    print(b, sess.run(tf.argmax(b, 1)))

    c = sess.run(hypothesis, feed_dict={X:[[1, 1, 0]]})
    print(c, sess.run(tf.argmax(c, 1)))

    all = sess.run(hypothesis, feed_dict={X:[[1, 11, 7], [1, 3, 4], [1, 1, 0]]})
    print(all, sess.run(tf.argmax(all, 1)))


/Users/bruc2kim/PycharmProjects/b2yond
0 1.09774 [[ -8.33333252e-05   4.16666626e-05   4.16666480e-05]
 [  1.66666694e-04   2.91666773e-04  -4.58333408e-04]
 [  1.66666636e-04   4.16666706e-04  -5.83333429e-04]]
200 1.05962 [[-0.02051384 -0.00103983  0.02155367]
 [ 0.01406438  0.01097753 -0.02504191]
 [ 0.01431208  0.03574873 -0.05006079]]
400 1.04985 [[-0.04282598 -0.00625899  0.04908497]
 [ 0.01747187  0.00156368 -0.01903554]
 [ 0.01831204  0.04954104 -0.06785304]]
600 1.0407 [[-0.06517859 -0.01176361  0.07694222]
 [ 0.01943521 -0.00848972 -0.01094548]
 [ 0.0211558   0.06118308 -0.0823388 ]]
800 1.03194 [[-0.08734013 -0.01729389  0.10463405]
 [ 0.0211172  -0.01796171 -0.00315548]
 [ 0.02396628  0.07198346 -0.09594974]]
1000 1.02354 [[-0.10928574 -0.02282181  0.13210757]
 [ 0.02266254 -0.02678034  0.00411783]
 [ 0.02685853  0.08210357 -0.10896212]]
1200 1.01547 [[-0.13101467 -0.02834092  0.15935561]
 [ 0.02409703 -0.03497621  0.01087923]
 [ 0.029832    0.091598   -0.12143001]]
1400 1.0077 [[-0.15252906 -0.03384715  0.18637618]
 [ 0.02543302 -0.04258838  0.01715542]
 [ 0.03287466  0.10050805 -0.13338271]]
1600 1.00021 [[-0.17383142 -0.03933692  0.21316831]
 [ 0.02668081 -0.04965464  0.0229739 ]
 [ 0.03597427  0.10887136 -0.14484565]]
1800 0.992979 [[-0.19492456 -0.04480688  0.23973146]
 [ 0.02784993 -0.05621058  0.02836074]
 [ 0.03911954  0.11672314 -0.15584265]]
2000 0.985988 [[-0.21581143 -0.05025396  0.26606542]
 [ 0.02894915 -0.06228962  0.03334056]
 [ 0.04230019  0.12409624 -0.16639642]]
[[ 0.46272627  0.35483006  0.18244371]] [0]
[[ 0.33820099  0.42101386  0.24078514]] [1]
[[ 0.27002314  0.29085544  0.4391214 ]] [2]
[[ 0.46272627  0.35483006  0.18244371]
 [ 0.33820099  0.42101386  0.24078514]
 [ 0.27002314  0.29085544  0.4391214 ]] [0 1 2]