Udacity Machine Learning mini-project 1

Prep stuff:


In [1]:
from sklearn.naive_bayes import GaussianNB
import sys
from time import time
sys.path.append("../tools/")
from email_preprocess import preprocess

Training and Testing data:


In [2]:
features_train, features_test, labels_train, labels_test = preprocess()


no. of Chris training emails: 7936
no. of Sara training emails: 7884

Fitting the model


In [3]:
clf = GaussianNB()
clf.fit(features_train,labels_train)


Out[3]:
GaussianNB()

Getting Model accuracy


In [4]:
clf.score(features_test,labels_test)


Out[4]:
0.97383390216154719

Timing model fit and prediction


In [5]:
%timeit clf.fit(features_train,labels_train)


1 loops, best of 3: 418 ms per loop

In [6]:
%timeit pred = clf.predict(features_test)


10 loops, best of 3: 84.9 ms per loop

Not shockingly, making predictions is a lot faster.