Naive Bayes classifier builds directly on conditional probability and this

$$ p(y|x) = \frac{p(y \cap x)}{p(x)} $$from the above formula, $p(y \cap x)$ can be written as

$$ p(y \cap x) = p(x | y).p(y) $$thus

$$ p(y|x) = \frac{p(x|y).p(y)}{p(x)} $$In machine learning, Naive Bayes is used to compute conditional probability of predicted class $y$ occuring given all the predictor variables $x$. In other words, Bayes theorem relates **P(outcome/evidence)** (what we want to predict) to **P(evidence / outcome)** (training set).

The algorithm builds the conditional probability and applies prediction. The algorithm is called **naive** because it assumes independence between predictor variables, while in reality this may not be true in all cases.

$$
P(A_i | B_j) = \frac{P(B_j | A_i)P(A_i)}{P(B_j | A_1)P(A_1) + P(B_j | A_2)P(A_2) + ... + P(B_j | A_k)P(A_k)}
$$

Consider $A_1 ,... A_k$ as $k$ predictor variables in machine learning. The Naive Bayes classifier will build the conditional probabilities of $p(B_j|A_k)$ to later predict what would $p(A_i | B_j)$ be.

```
In [1]:
```import seaborn as sns
iris = sns.load_dataset('iris')

```
In [2]:
```iris.head()

```
Out[2]:
```

```
In [3]:
```iris.shape

```
Out[3]:
```

```
In [9]:
```iris['species'].value_counts().plot(kind='bar')

```
Out[9]:
```

```
In [13]:
```%matplotlib inline
sns.pairplot(iris, hue='species')

```
Out[13]:
```

```
In [2]:
```X_iris = iris.drop('species', axis=1)
X_iris.shape

```
Out[2]:
```

```
In [3]:
```y_iris = iris['species']
y_iris.shape

```
Out[3]:
```

`X_iris`

is in caps as `X`

is a vector of multiple features for each record, whereas `y_iris`

is in small case as it is a scalar for each record.

```
In [ ]:
```