When the objective is to predict the category (qualitative, such as predicting political party affiliation), we term the it as predicting a qualitative random variable. On the other hand, if we are predicting a quantitative value (number of cars sold), we term it a quantitative random variable.
When the observations of a quantitative random variable can assume values in a continuous interval (such as predicting temperature), it is called a continuous random variable.
Say, we are predicting the probability of getting heads in two coin tosses P(y). Then
A binomial experiment is one in which the outcome is one of two possible outcomes. Coin tosses, accept / reject, pass / fail, infected / uninfected, these are the kinds of studies that involve a binomial experiment. Thus an experiment is of binomial in nature if
n identical trialsy is the number of successes observed in n trials.The probability of observing y success in n trials of a binomial experiment is
$$
P(y) = \frac{n!}{y!(n-y)!}\pi^y (1-\pi)^{n-y}
$$
where
y = number of successes in n trialsWe can build a simple Python function to calculate the binomial probability as shown below:
In [1]:
    
import math
def bin_prob(n,y,pi):
    a = math.factorial(n)/(math.factorial(y)*math.factorial(n-y))
    b = math.pow(pi, y) * math.pow((1-pi), (n-y))
    p_y = a*b
    return p_y
    
Let us consider a problem where 100 seeds are drawn at random. The germination rate of each seed is 85%. Or in other words, the probability that a seed will germinate is 0.85, derived from experiment that 85 out of 100 seeds would germinate in a nursery. Now we want to calculate what is the probability
In [5]:
    
utmost_80 = bin_prob(100,80,0.85)
print("utmost 80: " + str(utmost_80))
utmost_50 = bin_prob(100,50,0.85)
print("utmost 50: " + str(utmost_50))
utmost_10 = bin_prob(100,10,0.85)
print("utmost 10: " + str(utmost_10))
utmost_95 = bin_prob(100, 95, 0.85)
print("utmost 95: " + str(utmost_95))
    
    
We could calculate the probability for all possible values of the discrete random varibale in a loop and plot the probabilities as shown below:
In [20]:
    
x =[]
y =[]
cum_prob = []
for i in range(1,101):
    x.append(i)
    p_y = bin_prob(100,i,0.85)
#     print(str(i) + "  " + str(p_y))
    y.append(p_y)
    
    if i==1:
        cum_prob.append(p_y)
    else:
        cum_prob.append(cum_prob[i-2] + p_y)
    
In [7]:
    
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(x,y)
ax[0].set_title('Probability of y successes')
ax[0].set_xlabel('num of successes in 100 trials')
ax[0].set_ylabel('probability of successes')
ax[1].plot(x,cum_prob)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('num of successes in 100 trials')
ax[1].set_ylabel('cumulative probability of successes')
    
    Out[7]:
    
As we can see in the graph above, the probability that x number of seeds will germinate peaks around 85, matching the germination rate of 0.85.
In [24]:
    
#find x corresponding to the max probability value
y.index(max(y)) + 1
    
    Out[24]:
The probability falls steeply before and after 85. Using the cumulative probability, we can answer the question of atleast. Find the probability that
In [30]:
    
atleast_20 = cum_prob[99] - cum_prob[19]
print("atleast 20 = " + str(atleast_20))
atleast_85 = cum_prob[99] - cum_prob[84]
print("atleast 85 = " + str(atleast_85))
atleast_95 = cum_prob[99] - cum_prob[94]
print("atleast 95 = " + str(atleast_95))
    
    
We can repeat the experiment with a sample size of 20 and plot the results
In [31]:
    
x =[]
y =[]
cum_prob = []
for i in range(1,21):
    x.append(i)
    p_y = bin_prob(20,i,0.85)
#     print(str(i) + "  " + str(p_y))
    y.append(p_y)
    
    if i==1:
        cum_prob.append(p_y)
    else:
        cum_prob.append(cum_prob[i-2] + p_y)
    
In [32]:
    
#find x corresponding to the max probability value
y.index(max(y)) + 1
    
    Out[32]:
In [33]:
    
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(x,y)
ax[0].set_title('Probability of y successes')
ax[0].set_xlabel('num of successes in 20 trials')
ax[0].set_ylabel('probability of successes')
ax[1].plot(x,cum_prob)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('num of successes in 20 trials')
ax[1].set_ylabel('cumulative probability of successes')
    
    Out[33]:
    
Poisson is used for modeling the events of a particular time over a period of time or region of space. An example is the number of vehicles passing through a security checkpoint in a 5 min interval.
Conditions
The probability distribution of a discrete random variable y is Poisson, if:
Thus the probability of observing y events in a unit of time or space is given by
$$ P(y) = \frac{\mu^{y}e^{-\mu}}{y!} $$where
e = 2.71828Example
Let y denote number of field mice captured in a trap in 24 hour period. The average value of y is 2.3. What is the probability of capturing exactly 4 mice in a randomly selected trap?
Ans: $$ \mu=2.3 $$ $$ P(y=4)=? $$
In [1]:
    
import math
def poisson_prob(y,mu):
    e = 2.71828
    numerator = math.pow(mu, y) * math.pow(e, 0-mu)
    denomenator = math.factorial(y)
    
    return numerator/denomenator
    
In [2]:
    
#calculate p(4)
p_4 = poisson_prob(4, 2.3)
p_4
    
    Out[2]:
Lets plot the distribution of y for values 0 to 10
In [11]:
    
y=list(range(0,11))
p_y = []
cum_y = []
mu = 2.3
for yi in y:
    prob = poisson_prob(yi, mu)
    p_y.append(prob)
    if yi==0:
        cum_y.append(prob)
    else:
        cum_y.append(cum_y[yi-1] + prob)
    
In [13]:
    
#plot this
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,2, figsize=(13,5))
ax[0].plot(y, p_y)
ax[0].set_title('Probability of finding y mice in 24 hours')
ax[0].set_xlabel('Probability of finding exactly y mice in 24 hours')
ax[0].set_ylabel('Probability')
ax[1].plot(y,cum_y)
ax[1].set_title('Cumulative Probability of y successes')
ax[1].set_xlabel('Probability of finding atleast y mice in 24 hours')
ax[1].set_ylabel('Cumulative probability')
    
    Out[13]: