Model

Given a training set of size $N$, consisting samples K classes, each with $n_k$ training samples, in LDA, we use Bayes rule to model

$$P(Y = k \mid x) = \frac{\pi_k f_k(x)}{\sum\limits_{i=1}^{K}{\pi_k f_k(x)}}$$

where

$$\pi_i = \frac{n_k}{N}$$

is the prior probability for class k,

$$f_k(x) = \frac{1}{(2\pi)^p \lvert \Sigma_k \rvert^{\frac{1}{2}}} e^{\frac{1}{2} (x - \mu_k)^T \Sigma_k^{-1} (x - \mu_k) } $$

is the density of the Gaussian (aka Normal) distribution computed at $x$.

Given a sample $x$, we predict its class to be

$$ C = \underset{k}{\mathrm{\arg \max}} P(Y = k \mid x) $$

The key assumption here is that all K classes share the same covariance matrix. This is a form of parameter-tying, which can be viewed as a way to limit the complexity of the model.

Learning the model

We learn the model parameters, $\Sigma, \{ \mu_i \}_{i=1}^K$ using MLE estimation.

$\Sigma$ is computed as

$$\Sigma = \frac{1}{N - K} \sum\limits_{k=1}^{K}{\sum\limits_{i : y_i = k}{(x_i - \mu_k)} } $$

and $\mu_k$ is the mean vector of class $k$,

$$\mu_k = \frac{1}{n_k} \sum\limits_{i : y_i = k}{ x_i } $$

Inference

If we take $log$ of $P(Y = k \mid x)$ and consider only terms involving class specific parameters $\mu_k$, we see that finding $ \underset{k}{\mathrm{\arg \max}} P(Y = k \mid x) $ reduces to finding

$$ C = \underset{k}{\arg\max} \delta_k(x) $$

where $\delta_k(x)$ is the discriminant function,

$$\delta_k(x) = x^T\Sigma^{-1}\mu_k - \frac{1}{2}\mu_k\Sigma^{-1}\mu_k + \log(\pi_k)$$

$\delta_k(x)$ linear in $x$ as $\frac{1}{2}\mu_k\Sigma^{-1}\mu_k \text{and} \log(\pi_k)$ are just scalar constants.



In [2]:

    
def compute_means(X, y):
    m, n = X.shape[:2]
    K = len(np.unique(y))
    means = np.empty((K, n))
    for k in np.unique(y):
        means[k] = np.mean(X[y == k], axis=0)

    return means

def compute_cov(X, y):
    N, d = X.shape[:2]
    K = len(np.unique(y))
    C = np.zeros((d, d))
    for k in range(K):
        C += np.cov(X[y == k], rowvar=0)

    return C / (N - K)



In [5]:

    
import numpy as np
from sklearn import datasets
from sklearn.cross_validation import train_test_split

# import some data to play with
iris = datasets.load_iris()
X = iris.data  
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=13)

N_train, d = X_train.shape[:2]  # d-dim vectors
N_test, _ = X_test.shape[:2]

K = len(np.unique(y_train))  # Number of classes

# Compute model parameters
means = compute_means(X_train, y_train)
covariance = compute_cov(X_train, y_train)
pk = np.asarray([np.sum(y_train == k) / float(N_train) for k in range(len(np.unique(y)))])
invCov = np.linalg.pinv(covariance)  # Inverse of covariance

# Compute the parts of the discriminant functions
b = np.log(pk) - 0.5 * np.diag(means.dot(invCov).dot(means.T))
b = b.reshape(K, 1)
W = means.dot(invCov) # Row k contains weight vector for class k

y_pred_train = np.argmax(W.dot(X_train.T) + b, axis=0)
y_pred_test = np.argmax(W.dot(X_test.T) + b, axis=0)

print("Training accuracy: {}".format(np.sum(y_pred_train == y_train) / float(N_train)))
print("Testing accuracy: {}".format(np.sum(y_pred_test == y_test) / float(N_test)))









    



Training accuracy: 0.98
Testing accuracy: 0.98



In [7]:

    
import GDA
reload(GDA)
clf = GDA.GDA()
clf.fit(X_train, y_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
print("Training accuracy: {}".format(np.sum(y_pred_train == y_train) / float(X_train.shape[0])))
print("Testing accuracy: {}".format(np.sum(y_pred_test == y_test) / float(X_test.shape[0])))









    



Training accuracy: 0.98
Testing accuracy: 0.98



In [87]:

    
np.random.randint?



In [94]:

    
X_rand = np.random.randn(100, 4)
y_rand = np.random.randint(0, len(np.unique(y_train)), size=(100, 1))
X_rand[:10]
y_pred_rand = clf.predict(X_rand)
print("Random data accuracy: {}".format(np.sum(y_pred_rand == y_rand) / float(X_rand.shape[0])))









    



Random data accuracy: 38.32