Title: Gaussian Naive Bayes Classifier
Slug: gaussian_naive_bayes_classifier
Summary: How to train a Gaussian naive bayes classifer in Scikit-Learn
Date: 2017-09-22 12:00
Category: Machine Learning
Tags: Naive Bayes
Authors: Chris Albon

Because of the assumption of the normal distribution, Gaussian Naive Bayes is best used in cases when all our features are continuous.

Preliminaries


In [1]:
# Load libraries
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

Load Iris Flower Dataset


In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target

Train Gaussian Naive Bayes Classifier


In [3]:
# Create Gaussian Naive Bayes object with prior probabilities of each class
clf = GaussianNB(priors=[0.25, 0.25, 0.5])

# Train model
model = clf.fit(X, y)

Create Previously Unseen Observation


In [4]:
# Create new observation
new_observation = [[ 4,  4,  4,  0.4]]

Predict Class


In [5]:
# Predict class
model.predict(new_observation)


Out[5]:
array([1])

Note: the raw predicted probabilities from Gaussian naive Bayes (outputted using predict_proba) are not calibrated. That is, they should not be believed. If we want to create useful predicted probabilities we will need to calibrate them using an isotonic regression or a related method.