naive_bayes_theory


Classification with Naive Bayes

Probability Theory

Dependences vs Independence

  • Dependence: Two events E and F are dependent when knowing something about whether E happens gives us information about whether F happens (and vice versa).
  • Independence: We say that two events E and F are independent if the probability that they both happen is the product of the probabilities that each one hapens: $P(E,F) = P(E) \cdot P(F)$

For example, when flipping a coin twice, knowing whether the first flip is Heads or Tails gives us no information about whether the second flip is Heads. However, knowing whether the first flip is Heads gives us information about whether both flips are Tails.

  • Sample Space: The set of all possible outcomes. The "universe" in this picture above.
  • Cardinality: The number of elements in a set. The "pipe" symbol represents cardinality. $|A|$ is the number of elements in A.

Let's define the event "people with cancer" as $A$ and "people with no cancer" as $\neg A$.

What is the probability of A?

$$P(A) = \frac{|A|}{|U|}$$

Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not.


Questions:

What is the max probability of A?

Since $|A| <= |U|$ (number of elements of A <= number of elements of U), $P(A) <= 1$ (probability <= 100%).

Let's define the event "people who tested positive for cancer" as $B$ and "people who tested negative for cancer" as $\neg B$.

What is the probability of B?

$$P(B) = \frac{|B|}{|U|}$$

Let’s say there is a new screening test that is supposed to test for cancer. That test will be “positive” for some people, and “negative” for some other people. If we take the event $B$ to mean “people for which the test is positive”.

  • Event A: People with cancer
  • Event B: People who tested positive for cancer

What is the probability of AB?

$$P(AB) = \frac{|AB|}{|U|}$$

Given that the test is positive for a randomly selected individual, what is the probability that said individual has cancer?

$$P(A|B) = \frac{|AB|}{|B|}$$

Questions:

How would you describe the “cancer status” and “test status” of people in each portion of the diagram (by color)?

  • Pink: cancer, negative test
  • Purple: cancer, positive test
  • Blue: no cancer, positive test
  • White: no cancer, negative test

Conditional Probability Notes

The notation for this is P(A|B) and it is read “the probability of A given B”.

  • Given that we are in region B, what is the probability that we are in region AB?
  • If we make region B our new Universe, what is the probability of A?

What we’ve effectively done is change the Universe from U (all people), to B (people for whom the test is positive).

This is known as transforming the sample space.

Conditional Probability

Probability that one event occurs given that another event has occurred.


Probability of A given B (prob of cancer given that the test is positive)

$$ P(A|B) = \frac{P(AB)}{P(B)} $$

Probability of B given A (prob of testing positive given that you have cancer)

$$ P(B|A) = \frac{P(AB)}{P(A)} $$

Note that when writing a joint probability the order does not matter $P(AB) == P(BA)$.

General Multiplicative Rule for Probability

$$ P(AB) = P(A | B) \cdot P(B) $$

Note that this is just the conditional probability rearranged.

Bayes Rule

$$ P(A|B) = \frac{P(AB)}{P(B)} = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Probability Example

Researchers randomly assigned 72 chronic users of cocaine into three groups: desipramine (antidepressant), lithium (standard treatment for cocaine) and placebo. Results of the study are summarized below.

relapse no relapse total
desipramine 10 14 24
lithium 18 6 24
placebo 20 4 24
total 48 24 72

WRITE THESE QUESTIONS ON THE BOARD AND HAVE PEOPLE SOLVE THEM

Marginal Probability

P(relapsed) = 48 / 72 ~ 0.67

Joint Probability

P(relapsed and desipramine) = 10 / 72 ~ 0.14

Conditional Probability

  • P(relapse | desipramine) = P(relapsed and desipramine) / P(desipramine) = (10/72) / (24/72) = .42
  • P(relapse | lithium) = 18 / 24 ~ 0.75
  • P(relapse | placebo) = 20 / 24 ~ 0.83

talk about how discriminative learning algorithms learn the DIFFERENCE between multiple classes. i.e. a logistic regression trying to find the best fit line between the classes

generative learning models looks at each class individually and tries to learn that class in of itself. then it looks at a new observation and sees which model (for each class) it more closely resembles

Naive Bayes

$$ P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)} {P(x_1, \dots, x_n)} $$$$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y)$$$$ P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)} $$$$P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)$$$$\Downarrow$$$$\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)$$

Laplace Smoothing

$$\frac{x + k}{N+2k}$$

Source: https://en.wikipedia.org/wiki/Additive_smoothing