Next we are going to learn about probability distributions. Central to the fields of probability and statistics are *random variables*. Random variables (RVs) are used to model random experiments, such as the roll of a die.

For each random variable, there is a *probability distribution function* (or probability mass function for discrete RVs) that gives the probability of each outcome in the experiment.

We will learn about three important distributions today:

- the binomial distribution
- the normal (or "Gaussian") distribution
- and the t distribution.

As you may remember, random variables (and their outcomes) can be either *discrete* or *continuous*. The binomial distribution is a discrete distribution, while the normal and t distributions are continuous distributions.

A Binomial experiment:

- The process consists of a sequence of $n$ trials.
- $n=1$ is a common case, and this is known as the
**Bernoulli distribution**

- $n=1$ is a common case, and this is known as the
- Only two exclusive outcomes are possible for each trial (a success and a failure)
- If the probability of a success is '$p$' then the probability of failure is $q=1-p$
- The trials are independent.
- The random variable $X$ is the number of successes (after these $n$ trials)

The formula for a Binomial Distribution Probability Mass Function turns out to be:

$$Pr(X=k)={n \choose k} p^k (1-p)^{n-k}$$where n = number of trials, $k$ = number of successes, $p$ = probability of success, $1-p$ = probability of failure (often written as $q=1-p$). This means that to get exactly '$k$' successes in '$n$' trials, we want exactly '$k$' successes: $$p^k$$ and we want '$n-k$' failures:$$(1-p)^{n-k}$$ Then finally, there are ${n \choose k}$ ways of putting '$k$' successes in '$n$' trials. So we multiply all these together to get the probability of exactly that many success and failures in those $n$ trials! Quick note, ${n \choose k}$ refers to the number of possible combinations of $N$ things taken $k$ at a time. This is also equal to: $${n \choose k} = \frac{n!}{k!(n-k)!}$$

Quick example to get you thinking. Let's say I'm a basketball player and I shoot 200 shots a day with 50% accuracy. On any given day, this is a random experiment with a binomial distribution and 200 trials.

```
In [ ]:
```from scipy.stats import binom
n = 200 #number of trials
p = 0.5 #probability of success
# We can get stats: Mean('m'), variance('v'), skew('s'), and/or kurtosis('k')
mn,vr= binom.stats(n,p)
print(mn)
print(vr**0.5)

Now let's investigate the mean and standard deviation for the binomial distribution further. The mean of a binomial distribution is simply: $$\mu=n*p$$ This intuitively makes sense, the average number of successes should be the total trials multiplied by your average success rate. Similarly we can see that the standard deviation of a binomial is: $$\sigma=\sqrt{n*q*p}$$

Let's try another example to see the full PMF (Probability Mass Function) plot. Imagine you flip a fair coin. Your probability of getting a heads is p=0.5 (success in this example). So what does your probability mass function look like for 10 coin flips?

```
In [ ]:
```import numpy as np
# Set up a new example, let's say n= 10 coin flips and p=0.5 for a fair coin.
n=10
p=0.5
# Set up n success, remember indexing starts at 0, so use n+1
x = range(n+1)
# Now create the probability mass function
Y = binom.pmf(x,n,p)
#Show
Y
# Next we'll visualize the pmf by plotting it.

Finally we will plot the binomial distribution.

```
In [ ]:
```import matplotlib.pyplot as plt
%matplotlib inline
# For simple plots, matplotlib is fine, seaborn is unnecessary.
# Now simply use plot
plt.plot(x,Y,'o')
#Title (use y=1.08 to raise the long title a little more above the plot)
plt.title('Binomial Distribution PMF: 10 coin Flips, Odds of Success for Heads is p=0.5',y=1.08)
#Axis Titles
plt.xlabel('Number of Heads')
plt.ylabel('Probability')

Looks awfully bell shaped...

Suppose you play a Blackjack, and have a 50\% chance of winning. You start with a 1 dollar bet, and if you lose, you double the amount you bet on the next play. So if you play three rounds, and Lose, Lose, Win, then you lost 1 dollar on the first round, 2 dollars on the second round, but gained 4 dollars on the first round, for a net profit of 1 dollar.

What is the expected value of your winnings (assuming you play as many rounds as it takes until you win)?

In fact, are you guaranteed to make money?

Why might this be a bad idea in practice?

Note: this is a famous strategy known as the Gambler's Ruin

```
In [1]:
```from IPython.display import Image
Image(url='https://static.squarespace.com/static/549dcda5e4b0a47d0ae1db1e/54a06d6ee4b0d158ed95f696/54a06d70e4b0d158ed960413/1412514819046/1000w/Gauss_banknote.png')

```
Out[1]:
```

$$ f(x\;|\;\mu ,\sigma ^{2})={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\;e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}} $$$$ f(x,\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{z^2}{2}} $$$$ z=\frac{(X-\mu)}{\sigma} $$

The plot of the pdf may be familiar to some of you by now.

```
In [ ]:
```#Import
import matplotlib as mpl
%matplotlib inline
#Import the stats library
from scipy import stats
# Set the mean
mn = 0
#Set the standard deviation
std_dev = 1
# Create a range
X = np.arange(-4,4,0.001)
#Create the normal distribution for the range
Y = stats.norm.pdf(X,mn,std_dev)
#
plt.plot(X,Y)

```
In [2]:
```from IPython.display import Image
Image(url='http://upload.wikimedia.org/wikipedia/commons/thumb/2/25/The_Normal_Distribution.svg/725px-The_Normal_Distribution.svg.png')

```
Out[2]:
```

```
In [ ]:
```#Set the mean and the standard deviaiton
mu,sigma = 0,1
# Now grab 30 random numbers from the normal distribution
norm_set = np.random.normal(mu,sigma,30)
#Now let's plot it using seaborn
import seaborn as sns
import sklearn as sk
from scipy.stats import gaussian_kde
results, edges = np.histogram(norm_set, normed=True)
binWidth = edges[1] - edges[0]
plt.bar(edges[:-1], results*binWidth, binWidth)
density = gaussian_kde(norm_set)
density.covariance_factor = lambda : .4 #this is the bandwidth in the kernel density estimator
density._compute_covariance()
plt.plot(X,density(X))
plt.plot(X,Y)

With enough samples, this should start to look normal.

```
In [ ]:
```norm2 = np.random.normal(mu, sigma, 1000)
results, edges = np.histogram(norm2, normed=True)
binWidth = edges[1] - edges[0]
plt.bar(edges[:-1], results*binWidth, binWidth)
density = gaussian_kde(norm2)
density.covariance_factor = lambda : .4
density._compute_covariance()
plt.plot(X,density(X))
plt.plot(X,Y)

The Central Limit Theorem is one of the most important theorems in statistical theory. It states that when independent random variables are added, their sum tends toward a normal distribution even if the original variables themselves are not normally distributed.

This may seem obscure so we will explain it with the previous example. We saw that as we took more samples from the normal distribution, the total distribution looks more and more normal. Here we will take more and more samples from a normal, and see how the mean of the samples behaves.

```
In [ ]:
```n10 = np.random.normal(mu, sigma, 10)
n100 = np.random.normal(mu, sigma, 100)
n1000 = np.random.normal(mu, sigma, 1000)
n10000 = np.random.normal(mu, sigma, 10000)
print(n10.mean(), n100.mean(), n1000.mean(), n10000.mean() )

```
In [ ]:
```Image(url='https://upload.wikimedia.org/wikipedia/commons/8/8c/Dice_sum_central_limit_theorem.svg')

For the normal distribution it is often assumed that the sample size is assumed large ($N>30$). The t distribution allows for use of small samples, but does so by sacrificing certainty with a margin-of-error trade-off (i.e. a larger variance). The t distribution takes into account the sample size using n-1 degrees of freedom, which means there is a different t distribution for every different sample size. If we see the t distribution against a normal distribution, you'll notice the tail ends increase as the peak get 'squished' down.

To be precise, the t-distribution models the **sample mean** of $N$ observations that are taken from a normal distrubtion.

It's important to note, that as $N$ gets larger, the t distribution converges into a normal distribution. To further explain degrees of freedom and how it relates to the t distribution, you can think of degrees of freedom as an adjustment to the sample size, such as (n-1). This is connected to the idea that we are estimating something of a larger population, in practice it gives a slightly larger margin of error in the estimate. Let's define a new variable called t, where : $$t=\frac{\overline{X}-\mu}{s}\sqrt{N-1}=\frac{\overline{X}-\mu}{s/\sqrt{N}}$$ which is analogous to the z statistic given by $$z=\frac{\overline{X}-\mu}{\sigma/\sqrt{N}}$$ The sampling distribution for t can be obtained: $$ f(t) = \frac {\varGamma(\frac{v+1}{2})}{\sqrt{v\pi}\varGamma(\frac{v}{2})} (1+\frac{t^2}{v})^{-\frac{v+1}{2}}$$ Where the gamma function is: $$\varGamma(n)=(n-1)!$$ And v is the number of degrees of freedom, typically equal to N-1.

Please don't worry about these formulas. Literally no one memorizes this distribution. Just know the binomial and the normal, and the idea of what a t distribution is for (small sample sizes).

The t distribution is plotted in blue, and the normal distribution is in green as well to compare.

```
In [ ]:
```#Import for plots
import matplotlib.pyplot as plt
%matplotlib inline
#Import the stats library
from scipy.stats import t
#import numpy
import numpy as np
# Create x range
x = np.linspace(-5,5,100)
# Create the t distribution with scipy
rv = t(3)
# Plot the PDF versus the x range
plt.plot(x, rv.pdf(x))
plt.plot(X, Y)

A multivariate normal with unequal variances in the x and y direction and principal axes also rotated from the origin. Notice how each shade or horizontal slice looks like an ellipse.

What *is* multivariate data? It just means more than one dimension.

- An example of a
**one-dimensional**random variable is the height of a person randomly chosen from this class - An example of a
**multi-variate**random variable is the (height, shoe-size) of a person randomly chosen from this class

For an example like (height, shoe-size), where x=height and y=shoe-size, we expect the two variables to be correlated. Other examples, like (height, first-letter-of-your-name) are not correlated. If the variables are correlated, statistical tests should know about it!

```
In [ ]:
```import scipy.stats
x, y = np.mgrid[-1:1:.01, -1:1:.01]
pos = np.empty(x.shape + (2,))
pos[:, :, 0] = x; pos[:, :, 1] = y
rv = scipy.stats.multivariate_normal([0.5, -0.2], [[2.0, 0.3], [0.3, 0.5]])
plt.contourf(x, y, rv.pdf(pos))