```
In [34]:
```import math
import random
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Probabilities are a way of quantifying the possibility of the occurrence of a *specific* event or events given the set of all possible events.

Notationally, $P(E)$ means "the probability of event $E$."

Events $E$ and $F$ are *dependent* if information about $E$ gives us information about the probability of $F$ occurring (or vice versa). If this is not the case, the variables are *independent* of each other.

For independent events, the probability of both occurring is the product of the probabilities of each occurring: $$P(E, F) = P(E)P(F)$$

If events are not independent, we can express conditional probability ($E$ is conditional on $F$ or what is the probability that $E$ happens given that $F$ happens): $$P(E\ |\ F) = P(E, F)\ /\ P(F)$$ which (if $E$ and $F$ are dependent) can be written as $$P(E, F) = P(E\ |\ F)P(F)$$

When $E$ and $F$ are independent: $$P(E\ |\ F) = P(E)$$

Conditional probabilities can be "reversed": $$P(E \text{ | } F) = P(E, F) \text{ / } P(F) = P(F \text{ | } E)P(E) \text{ / } P(F)$$ If $E$ doesn't happen: $$P(F) = P(F, E) + P(F, \neg E)$$

Leads to Bayes's Theorem: $$P(E\ |\ F) = P(F\ |\ E)P(E)\ /\ [P(F\ |\ E)P(E) + P(F\ |\ \neg E)P(\ \neg E)]$$

Coin flips represent a *discrete distribution*, i.e., one that takes on mutually exclusive values with no "in-betweens." A *continuous distribution* is one that allows for a full range of values along a continuum such as height or weight.

Continuous distributions use a *probability density function (pdf)* to define probability of a value within a given range.

The pdf for the uniform distribution is:

```
In [1]:
```def uniform_pdf(x):
return 1 if x >= 0 and x < 1 else 0

```
In [22]:
```xs = np.arange(-1, 2, .001)
ys = [uniform_pdf(x) for x in xs]
plt.plot(xs, ys);

```
```

```
In [15]:
```uniform_pdf(-0.01)

```
Out[15]:
```

*cumulative distribution function (cdf)* gives the probability that a random variable is less than or equal to a certain value.

```
In [11]:
```def uniform_cdf(x):
"""Returns probability that a value is <= x"""
if x < 0: return 0
elif x < 1: return x
else: return 1

```
In [21]:
```xs = np.arange(-1, 2, .001)
ys = [uniform_cdf(x) for x in xs]
plt.step(xs, ys);

```
```

The *normal distribution* is the definitive example of a random distribution (the classic bell curve shape). It is defined by two parameters: the *mean* $\mu$ and the *standard deviation* $\sigma$.

The function for the distribution is: $$f(x\ |\ \mu, \sigma) = \frac{1}{\sqrt{2\pi}\sigma}\ exp\bigg(-\frac{(x - \mu)^2}{2\sigma^2}\bigg)$$

```
In [23]:
```def normal_pdf(x, mu=0, sigma=1):
sqrt_two_pi = math.sqrt(2 * math.pi)
return (math.exp(-(x - mu)**2 / 2 / sigma**2) / (sqrt_two_pi * sigma))

```
In [26]:
```xs = [x / 10.0 for x in range(-50, 50)]
plt.plot(xs, [normal_pdf(x, sigma=1) for x in xs],'-',label='mu=0,sigma=1')
plt.plot(xs, [normal_pdf(x, sigma=2) for x in xs],'--',label='mu=0,sigma=2')
plt.plot(xs, [normal_pdf(x, sigma=0.5) for x in xs],':',label='mu=0,sigma=0.5')
plt.plot(xs, [normal_pdf(x, mu=-1) for x in xs],'-.',label='mu=-1,sigma=1')
plt.legend()
plt.title("Various Normal pdfs")
plt.show()

```
```

When $\mu = 0$ and $\sigma = 1$ we call a distribution the *standard normal distribution*.

```
In [27]:
```def normal_cdf(x, mu=0, sigma=1):
return (1 + math.erf((x - mu) / math.sqrt(2) / sigma)) / 2
plt.plot(xs, [normal_cdf(x, sigma=1) for x in xs],'-',label='mu=0,sigma=1')
plt.plot(xs, [normal_cdf(x, sigma=2) for x in xs],'--',label='mu=0,sigma=2')
plt.plot(xs, [normal_cdf(x, sigma=0.5) for x in xs],':',label='mu=0,sigma=0.5')
plt.plot(xs, [normal_cdf(x, mu=-1) for x in xs],'-.',label='mu=-1,sigma=1')
plt.legend(loc=4) # bottom right
plt.title("Various Normal cdfs")
plt.show()

```
```

```
In [28]:
```def inverse_normal_cdf(p, mu=0, sigma=1, tolerance=0.00001):
"""find approimate inverse using binary search"""
# if not standard, compute standard and rescale
if mu!= 0 or sigma != 1:
return mu + sigma * inverse_normal_cdf(p, tolerance=tolerance)
low_z, low_p = -10.0, 0
hi_z, hi_p = 10.0, 1
while hi_z - low_z > tolerance:
mid_z = (low_z + hi_z) / 2
mid_p = normal_cdf(mid_z)
if mid_p < p:
low_z, low_p = mid_z, mid_p
elif mid_p > p:
hi_z, hi_p = mid_z, mid_p
else:
break
return mid_z

The *central limit theorem* states that the average of a large number of independent and identically distributed random variables is itself normally distributed.

So, if $x_1, ..., x_n$ are random and they have mean $\mu$ and standard deviation $\sigma$, then $\frac{1}{n}(x_1 +\ ...\ + x_n)$ will be normally distributed. An equivalent expression is $\frac{(x_1 +\ ...\ + x_n)\ -\ \mu n}{\sigma \sqrt{n}}$

*binomial* random variable (Binomial(n, p)) is the sum of $n$ independent Bernoulli (Bernoulli(p)) random variables. Each of the variables equals 1 with a probability of $p$ and equals 0 with a probability of $1 - p$.

```
In [29]:
```def bernoulli_trial(p):
return 1 if random.random() < p else 0
def binomial(n, p):
return sum(bernoulli_trial(p) for _ in range(n))

The mean of a Bernoulli(p) variable is $p$ and its standard deviation is $\sqrt{p(1 - p)}$.

```
In [30]:
```def plot_binomial(p, n, num_points):
data = [binomial(n, p) for _ in range(num_points)]
histogram = Counter(data)
plt.bar([x - 0.04 for x in histogram.keys()],
[v / num_points for v in histogram.values()],
0.8,
color='0.75')
mu = p * n
sigma = math.sqrt(n * p * (1 - p))
xs = range(min(data), max(data) + 1)
ys = [normal_cdf(i + 0.5, mu, sigma) - normal_cdf(i - 0.5, mu, sigma) for i in xs]
plt.plot(xs, ys)
plt.title('Binomial Distribution vs. Normal Approximation')

```
In [35]:
```plot_binomial(0.75, 100, 10000)

```
```

```
In [ ]:
```