When the objective is to predict the category (qualitative, such as predicting political party affiliation), we term the it as predicting a ** qualitative random variable**. On the other hand, if we are predicting a quantitative value (number of cars sold), we term it a

`quantitative random variable`

When the observations of a `quantitative random variable`

can assume values in a continuous interval (such as predicting temperature), it is called a ** continuous random variable**.

Say, we are predicting the probability of getting heads in two coin tosses P(y). Then

- probability of y ranges from 0 and 1
- sum of probabilities of all values of y = 1
- probabilities of outcomes of discrete random variable is additive. Thus probability of y = 1 or 2 is P(1) + P(2)

A **binomial** experiment is one in which the outcome is one of two possible outcomes. Coin tosses, accept / reject, pass / fail, infected / uninfected, these are the kinds of studies that involve a binomial experiment. Thus an experiment is of binomial in nature if

- experiment has
`n`

identical trials - each trial results in 1 of 2 outcomes ( success and failure )
- probability of one of the outcome, say success remains the same for all trials
- trials are independent of each other
- the random variable
`y`

is the number of successes observed in`n`

trials.

The probability of observing `y`

success in `n`

trials of a binomial experiment is
$$
P(y) = \frac{n!}{y!(n-y)!}\pi^y (1-\pi)^{n-y}
$$

where

- n = number of trials
- $\pi$ = probability of success in a single trial
- $1-\pi$ = probability of failure in a single tiral
`y`

= number of successes in`n`

trials- $n!$ (n factorial) = $n(n-1)(n-2)..(n-(n-1))$

We can build a simple Python function to calculate the binomial probability as shown below:

```
In [1]:
```import math
def bin_prob(n,y,pi):
a = math.factorial(n)/(math.factorial(y)*math.factorial(n-y))
b = math.pow(pi, y) * math.pow((1-pi), (n-y))
p_y = a*b
return p_y

Let us consider a problem where 100 seeds are drawn at random. The germination rate of each seed is `85%`

. Or in other words, the probability that a seed will germinate is `0.85`

, derived from experiment that `85`

out of `100`

seeds would germinate in a nursery. Now we want to calculate what is the probability

- that utmost only 80 seeds will germinate
- that utmost only 50 seeds will germinate
- that utmost only 10 seeds will germinate
- that utmost only 95 seeds will germinate

```
In [5]:
```utmost_80 = bin_prob(100,80,0.85)
print("utmost 80: " + str(utmost_80))
utmost_50 = bin_prob(100,50,0.85)
print("utmost 50: " + str(utmost_50))
utmost_10 = bin_prob(100,10,0.85)
print("utmost 10: " + str(utmost_10))
utmost_95 = bin_prob(100, 95, 0.85)
print("utmost 95: " + str(utmost_95))

```
```

```
In [20]:
```x =[]
y =[]
cum_prob = []
for i in range(1,101):
x.append(i)
p_y = bin_prob(100,i,0.85)
# print(str(i) + " " + str(p_y))
y.append(p_y)
if i==1:
cum_prob.append(p_y)
else:
cum_prob.append(cum_prob[i-2] + p_y)

```
In [7]:
```import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(x,y)
ax[0].set_title('Probability of y successes')
ax[0].set_xlabel('num of successes in 100 trials')
ax[0].set_ylabel('probability of successes')
ax[1].plot(x,cum_prob)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('num of successes in 100 trials')
ax[1].set_ylabel('cumulative probability of successes')

```
Out[7]:
```

`x`

number of seeds will germinate peaks around `85`

, matching the germination rate of `0.85`

.

```
In [24]:
```#find x corresponding to the max probability value
y.index(max(y)) + 1

```
Out[24]:
```

The probability falls steeply before and after 85. Using the `cumulative probability`

, we can answer the question of `atleast`

. Find the probability that

- atleast 20 seeds will germinate = prob(that 21 + 22 + 23 ... 100) will germinate

```
In [30]:
```atleast_20 = cum_prob[99] - cum_prob[19]
print("atleast 20 = " + str(atleast_20))
atleast_85 = cum_prob[99] - cum_prob[84]
print("atleast 85 = " + str(atleast_85))
atleast_95 = cum_prob[99] - cum_prob[94]
print("atleast 95 = " + str(atleast_95))

```
```

We can repeat the experiment with a sample size of `20`

and plot the results

```
In [31]:
```x =[]
y =[]
cum_prob = []
for i in range(1,21):
x.append(i)
p_y = bin_prob(20,i,0.85)
# print(str(i) + " " + str(p_y))
y.append(p_y)
if i==1:
cum_prob.append(p_y)
else:
cum_prob.append(cum_prob[i-2] + p_y)

```
In [32]:
```#find x corresponding to the max probability value
y.index(max(y)) + 1

```
Out[32]:
```

```
In [33]:
```import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(x,y)
ax[0].set_title('Probability of y successes')
ax[0].set_xlabel('num of successes in 20 trials')
ax[0].set_ylabel('probability of successes')
ax[1].plot(x,cum_prob)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('num of successes in 20 trials')
ax[1].set_ylabel('cumulative probability of successes')

```
Out[33]:
```

Poisson is used for modeling the events of a particular time over a period of time or region of space. An example is the number of vehicles passing through a security checkpoint in a 5 min interval.

**Conditions**

The probability distribution of a discrete random variable *y* is Poisson, if:

- Events occur one at a time. Two or more events do not occur precisely at the same time or space
- Events are independent - occurrence of an event at a time is independent of any other event in during a non-overlapping period of time or space
- The expected number of events during one period or region $\mu$ is the same as the expected number of events in any other period or region

Thus the probability of observing *y* events in a unit of time or space is given by

where

- $\mu$ is average value of
*y* *e*is naturally occurring constant.`e = 2.71828`

**Example**
Let *y* denote number of field mice captured in a trap in 24 hour period. The average value of y is `2.3`

. What is the probability of capturing exactly `4`

mice in a randomly selected trap?

**Ans:**
$$
\mu=2.3
$$
$$
P(y=4)=?
$$

```
In [1]:
```import math
def poisson_prob(y,mu):
e = 2.71828
numerator = math.pow(mu, y) * math.pow(e, 0-mu)
denomenator = math.factorial(y)
return numerator/denomenator

```
In [2]:
```#calculate p(4)
p_4 = poisson_prob(4, 2.3)
p_4

```
Out[2]:
```

Lets plot the distribution of y for values 0 to 10

```
In [11]:
```y=list(range(0,11))
p_y = []
cum_y = []
mu = 2.3
for yi in y:
prob = poisson_prob(yi, mu)
p_y.append(prob)
if yi==0:
cum_y.append(prob)
else:
cum_y.append(cum_y[yi-1] + prob)

```
In [13]:
```#plot this
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(y, p_y)
ax[0].set_title('Probability of finding y mice in 24 hours')
ax[0].set_xlabel('Probability of finding exactly y mice in 24 hours')
ax[0].set_ylabel('Probability')
ax[1].plot(y,cum_y)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('Probability of finding atleast y mice in 24 hours')
ax[1].set_ylabel('Cumulative probability')

```
Out[13]:
```