Reference: UFLDL Softmax Regression
training dataset is: $$\{(x^{(1)}, y^{(1)}), \{(x^{(2)}, y^{(2)}),\dots, (x^{(m)}, y^{(m)})\}$$
In binary classification setting, the predicting function is $$y^{(i)} \in \{0, 1\} $$
$$h_\theta(x) = y_i = \frac{1}{1+exp(-\theta^Tx)} $$and the model parameter θ supposed to be trained to minimize the cost function $$J(\theta) = - \frac{1}{m} [ \sum_{i=1}^m y^{(i)} \bullet log h_\theta (x^{(i)}) + (1-y^{(i)}) \bullet log( 1- h_\theta(x^{(i)}))]$$
from Logicstic Regression
training dataset is: $$\{(x^{(1)}, y^{(1)}), \{(x^{(2)}, y^{(2)}),\dots, (x^{(m)}, y^{(m)})\}$$ and target categories are $$ y_i \in \{ 1, 2, \dots, k\}$$
then the predicting probability is $$ P(y=j|x, w) $$ and the predicting function is
$$ h_{\theta} (x^{(i)}) = \begin{bmatrix} p(y^{(i)}=1|x^{(i)}; \theta) \\ p(y^{(i)}=2|x^{(i)}; \theta) \\ \vdots \\ p(y^{(i)}=k|x^{(i)}; \theta) \\ \end{bmatrix} = \frac {1} { \sum_{j=1}^k e^{ \theta_j^T x^{(i)} } } \bullet \begin{bmatrix} e^{ \theta_1^T x^{(i)} } \\ e^{ \theta_2^T x^{(i)} } \\ \vdots \\ e^{ \theta_k^T x^{(i)} } \\ \end{bmatrix} $$and we can write θ as a k(label)-by-(n + 1)(sample size +1) matrix, so $$ \theta = \begin{bmatrix} \mbox{---} \theta_1^T \mbox{---} \\ \mbox{---} \theta_2^T \mbox{---} \\ \vdots \\ \mbox{---} \theta_k^T \mbox{---} \\ \end{bmatrix} $$
the cost function will be $$ j(\theta) = - \frac{1}{m} [\sum_{i=1}^m \sum_{j=1}^k 1 \{j^{(i)}=j\} log \frac {e^{ \theta_j^T x^{(i)} }} {\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} ] $$ note:
Using gradient descent to solve minimize problem. Drivatives cost function: $$ \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} \left( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) \right) \right] } $$ on each iteration, we would perform the update $$ \theta_j := \theta_j - \alpha \nabla_{\theta_j}J(\theta), \forall j = 1, \dots, k $$
In [98]:
"""Softmax."""
import numpy as np
scores = np.array([3.0, 1.0, 0.2])
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
print(scores, softmax(scores))
In [92]:
# Plot softmax curves
import matplotlib.pyplot as plt
x = np.arange(-2.0, 6.0, 0.1)
scores = np.vstack([x, np.ones_like(x), 0.2 * np.ones_like(x)])
plt.plot(x, softmax(scores).T, linewidth=2)
plt.show()
In [94]:
scores = np.array([[1, 2, 3, 6],
[2, 4, 5, 6],
[3, 8, 7, 4]])
In [99]:
x = 1000000000
y = 0.000001
for i in range(1000000):
x += y
x -= 1000000000
print x
In [ ]: