Probability - Intro


Intro

Exploratory notebook related to the theory and introductory concepts behind probability. Includes toy examples implementation and visualization.

Probability

Probability is the science concerned with the understanding and manipulation of uncertainty.


In [ ]:
import numpy as np
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt, animation

%matplotlib notebook
#%matplotlib inline

sns.set_context("paper")

In [ ]:
# interactive imports
import plotly
import cufflinks as cf
cf.go_offline(connected=True)
plotly.offline.init_notebook_mode(connected=True)

In [ ]:
class RandomVar:
    def __init__(self, probs):
        self.values = np.arange(len(probs))
        self.probs = probs
        
    def pick(self, n=1):
        return np.random.choice(self.values, p=self.probs)

In [ ]:
coin = RandomVar([0.5, 0.5])
coin.pick()

In [ ]:
biased_coin = RandomVar([0.1, 0.9])
biased_coin.pick()

In [ ]:
die = RandomVar([1/6]*6)
die.pick()

Information Theory

We interested in understanding the amount of information related to events. For example given a random variable $x$, the amount of information of a specific value can also be seen as "degree of surprise" of seeing $x$ being equal to such value.

$$ h(x) = - \log_2 p(x) $$

For a random variable $x$, the corresponding measure calles entropy is defines as:

$$ H[x] = - \sum_x{ p(x) \log_2 p(x) } $$

In [ ]:
# information content for a target probability
def info_content(p_x):
    return -np.log2(p_x)

# entropy of a random variable probability distribution
def entropy(p_x):
    return -sum(p_x*np.log2(p_x))

In [ ]:
entropy([1/8]*8)

Maximum entropy for a discrete random variable is obtained with a uniform distribution. For a continuous random variable we have an equivalent increase in entropy for an increase in the variance.


In [ ]:
# log function
x = np.linspace(0.00001, 2, 100)
plt.plot(x, np.log(x), label='Log')
plt.legend()
plt.show()

In [ ]:
#log of product equals sum of logs

n = 10
#a = np.random.random_sample(n)
#b = np.random.random_sample(n)
plt.plot(a, label='a')
plt.plot(b, label='b')
plt.plot(np.log(a), label='log(a)')
plt.plot(np.log(b), label='log(b)')
#plt.plot(np.log(a)+np.log(b), label='log(a)+log(b)')
plt.plot(np.log(a*b), label='log(a+b)')
plt.legend()
plt.show()

In [ ]: