Classification with Naive Bayes

Probability Theory

Dependences vs Independence

Dependence: Two events E and F are dependent when knowing something about whether E happens gives us information about whether F happens (and vice versa).
Independence: We say that two events E and F are independent if the probability that they both happen is the product of the probabilities that each one hapens: $P(E,F) = P(E) \cdot P(F)$

For example, when flipping a coin twice, knowing whether the first flip is Heads or Tails gives us no information about whether the second flip is Heads. However, knowing whether the first flip is Heads gives us information about whether both flips are Tails.

Sample Space: The set of all possible outcomes. The "universe" in this picture above.
Cardinality: The number of elements in a set. The "pipe" symbol represents cardinality. $|A|$ is the number of elements in A.

Let's define the event "people with cancer" as $A$ and "people with no cancer" as $\neg A$.

What is the probability of A?

$$P(A) = \frac{|A|}{|U|}$$

Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not.

Questions:

What is the max probability of A?

Since $|A| <= |U|$ (number of elements of A <= number of elements of U), $P(A) <= 1$ (probability <= 100%).

Let's define the event "people who tested positive for cancer" as $B$ and "people who tested negative for cancer" as $\neg B$.

What is the probability of B?

$$P(B) = \frac{|B|}{|U|}$$

Let’s say there is a new screening test that is supposed to test for cancer. That test will be “positive” for some people, and “negative” for some other people. If we take the event $B$ to mean “people for which the test is positive”.

Event A: People with cancer
Event B: People who tested positive for cancer

What is the probability of AB?

$$P(AB) = \frac{|AB|}{|U|}$$

Given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?

$$P(A|B) = \frac{|AB|}{|B|}$$

Questions:

How would you describe the “cancer status” and “test status” of people in each portion of the diagram (by color)?

Pink: cancer, negative test
Purple: cancer, positive test
Blue: no cancer, positive test
White: no cancer, negative test

Conditional Probability Notes

The notation for this is P(A|B) and it is read “the probability of A given B”.

Given that we are in region B, what is the probability that we are in region AB?
If we make region B our new Universe, what is the probability of A?

What we’ve effectively done is change the Universe from U (all people), to B (people for whom the test is positive).

This is known as transforming the sample space.

Conditional Probability

Probability that one event occurs given that another event has occurred.

Probability of A given B (prob of cancer given that the test is positive)

$$ P(A|B) = \frac{P(AB)}{P(B)} $$

Probability of B given A (prob of testing positive given that you have cancer)

$$ P(B|A) = \frac{P(AB)}{P(A)} $$

Note that when writing a joint probability the order does not matter $P(AB) == P(BA)$.

General Multiplicative Rule for Probability

$$ P(AB) = P(A | B) \cdot P(B) $$

Note that this is just the conditional probability rearranged.

Bayes Rule

$$ P(A|B) = \frac{P(AB)}{P(B)} = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Probability Example

Researchers randomly assigned 72 chronic users of cocaine into three groups: desipramine (antidepressant), lithium (standard treatment for cocaine) and placebo. Results of the study are summarized below.

	relapse	no relapse	total
desipramine	10	14	24
lithium	18	6	24
placebo	20	4	24
total	48	24	72

WRITE THESE QUESTIONS ON THE BOARD AND HAVE PEOPLE SOLVE THEM

Marginal Probability

P(relapsed) = 48 / 72 ~ 0.67

Joint Probability

P(relapsed and desipramine) = 10 / 72 ~ 0.14

Conditional Probability

P(relapse | desipramine) = P(relapsed and desipramine) / P(desipramine) = (10/72) / (24/72) = .42
P(relapse | lithium) = 18 / 24 ~ 0.75
P(relapse | placebo) = 20 / 24 ~ 0.83

Generative vs Discriminative Learning Algorithms

talk about how discriminative learning algorithms learn the DIFFERENCE between multiple classes. i.e. a logistic regression trying to find the best fit line between the classes

generative learning models looks at each class individually and tries to learn that class in of itself. then it looks at a new observation and sees which model (for each class) it more closely resembles

Naive Bayes

$$ P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)} {P(x_1, \dots, x_n)} $$$$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y)$$$$ P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)} $$$$P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)$$$$\Downarrow$$$$\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)$$

Laplace Smoothing

$$\frac{x + k}{N+2k}$$

Source: https://en.wikipedia.org/wiki/Additive_smoothing

Sources

General Assembly Data Science 8 DC Notebooks by Kevin Markham LinkedIn | Twitter | Github | Website
Andrew Ng CS229 Video Lectures / Lecture Notes
https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
https://docs.google.com/presentation/d/1psUIyig6OxHQngGEHr3TMkCvhdLInnKnclQoNUr4G4U/edit#slide=id.gfc69f484_023