Exploratory notebook related to the theory and introductory concepts behind probability. Includes toy examples implementation and visualization.
Probability is the science concerned with the understanding and manipulation of uncertainty.
In [ ]:
import numpy as np
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt, animation
%matplotlib notebook
#%matplotlib inline
sns.set_context("paper")
In [ ]:
# interactive imports
import plotly
import cufflinks as cf
cf.go_offline(connected=True)
plotly.offline.init_notebook_mode(connected=True)
In [ ]:
class RandomVar:
def __init__(self, probs):
self.values = np.arange(len(probs))
self.probs = probs
def pick(self, n=1):
return np.random.choice(self.values, p=self.probs)
In [ ]:
coin = RandomVar([0.5, 0.5])
coin.pick()
In [ ]:
biased_coin = RandomVar([0.1, 0.9])
biased_coin.pick()
In [ ]:
die = RandomVar([1/6]*6)
die.pick()
We interested in understanding the amount of information related to events. For example given a random variable $x$, the amount of information of a specific value can also be seen as "degree of surprise" of seeing $x$ being equal to such value.
$$ h(x) = - \log_2 p(x) $$For a random variable $x$, the corresponding measure calles entropy is defines as:
$$ H[x] = - \sum_x{ p(x) \log_2 p(x) } $$
In [ ]:
# information content for a target probability
def info_content(p_x):
return -np.log2(p_x)
# entropy of a random variable probability distribution
def entropy(p_x):
return -sum(p_x*np.log2(p_x))
In [ ]:
entropy([1/8]*8)
Maximum entropy for a discrete random variable is obtained with a uniform distribution. For a continuous random variable we have an equivalent increase in entropy for an increase in the variance.
In [ ]:
# log function
x = np.linspace(0.00001, 2, 100)
plt.plot(x, np.log(x), label='Log')
plt.legend()
plt.show()
In [ ]:
#log of product equals sum of logs
n = 10
#a = np.random.random_sample(n)
#b = np.random.random_sample(n)
plt.plot(a, label='a')
plt.plot(b, label='b')
plt.plot(np.log(a), label='log(a)')
plt.plot(np.log(b), label='log(b)')
#plt.plot(np.log(a)+np.log(b), label='log(a)+log(b)')
plt.plot(np.log(a*b), label='log(a+b)')
plt.legend()
plt.show()
In [ ]: