Introduction to Lea


In [1]:
from lea import *

In [2]:
# mandatory die example - initilize a die object
die = Lea.fromVals(1, 2, 3, 4, 5, 6)

In [3]:
# throw the die a few times
die.random(20)


Out[3]:
(3, 4, 5, 4, 3, 3, 6, 4, 5, 3, 2, 2, 6, 1, 1, 5, 1, 6, 3, 3)

In [4]:
# mandatory coin toss example - states can be strings!
coin = Lea.fromVals('Head', 'Tail')

In [5]:
# toss the coin a few times
coin.random(10)


Out[5]:
('Tail',
 'Tail',
 'Head',
 'Head',
 'Head',
 'Head',
 'Tail',
 'Head',
 'Head',
 'Head')

In [6]:
# how about a Boolean variable - only True or False ? 
rain = Lea.boolProb(5, 100)

In [7]:
# how often does it rain in Chennai ? 
rain.random(10)


Out[7]:
(True, False, False, False, False, False, False, False, False, False)

In [8]:
# How about standard statistics ? 
die.mean, die.mode, die.var, die.entropy


Out[8]:
(3.5, (1, 2, 3, 4, 5, 6), 2.9166666666666665, 2.584962500721156)

Summary

Random variables are abstract objects. Transparent method for drawing random samples variable.random(times). Standard statistical metrics of the probability distribution is also part of the object.

Coolness - Part 1


In [9]:
# Lets create two dies
die1 = die.clone()
die2 = die.clone()

In [10]:
# Two throws of the die 
dice = die1 + die2

In [11]:
dice


Out[11]:
 2 : 1/36
 3 : 2/36
 4 : 3/36
 5 : 4/36
 6 : 5/36
 7 : 6/36
 8 : 5/36
 9 : 4/36
10 : 3/36
11 : 2/36
12 : 1/36

In [12]:
dice.random(10)


Out[12]:
(4, 5, 9, 7, 9, 8, 7, 7, 9, 6)

In [13]:
dice.mean


Out[13]:
7.0

In [14]:
dice.mode


Out[14]:
(7,)

In [15]:
print dice.histo()


 2 :  ---
 3 :  ------
 4 :  --------
 5 :  -----------
 6 :  --------------
 7 :  -----------------
 8 :  --------------
 9 :  -----------
10 :  --------
11 :  ------
12 :  ---

Summary

Random variables are abstract objects. Methods are available for operating on them algebraically. The probability distributions, methods for drawing random samples, statistical metrics, are transparently propagated.

Coolness - Part 2

"You just threw two dice. Can you guess the result ?"

"Here's a tip : the sum is less than 6"


In [16]:
## We can create a new distribution, conditioned on our state of knowledge : P(sum | sum <= 6)
conditionalDice = dice.given(dice<=6)

In [17]:
## What is our best guess for the result of the throw ? 
conditionalDice.mode


Out[17]:
(6,)

In [18]:
## Conditioning can be done in many ways : suppose we know that the first die came up 3. 
dice.given(die1 == 3)


Out[18]:
4 : 1/6
5 : 1/6
6 : 1/6
7 : 1/6
8 : 1/6
9 : 1/6

In [19]:
## Conditioning can be done in still more ways : suppose we know that **either** of the two dies came up 3
dice.given((die1 == 3) | (die2 == 3))


Out[19]:
4 : 2/11
5 : 2/11
6 : 1/11
7 : 2/11
8 : 2/11
9 : 2/11

Summary

Conditioning, which is the first step towards inference, is done automatically. A wide variety of conditions can be used. P(A | B) translates to a.given(b).

Inference

An entomologist spots what might be a rare subspecies of beetle, due to the pattern on its back. In the rare subspecies, 98% have the pattern, or P(pattern|species = rare) = 0.98. In the common subspecies, 5% have the pattern, or P(pattern | species = common) = 0.05. The rare subspecies accounts for only 0.1% of the population. How likely is the beetle having the pattern to be rare, or what is P(species=rare|pattern) ?


In [20]:
# Species is a random variable with states "common" and "rare", with probabilities determined by the population. Since
# are only two states, species states are, equivalently, "rare" and "not rare". Species can be a Boolean!
rare = Lea.boolProb(1,1000)

In [21]:
# Similarly, pattern is either "present" or "not present". It too is a Boolean, but, its probability distribution
# is conditioned on "rare" or "not rare"
patternIfrare = Lea.boolProb(98, 100)
patternIfNotrare = Lea.boolProb(5, 100)

In [22]:
# Now, lets build the conditional probability table for P(pattern | species)
pattern = Lea.buildCPT((rare , patternIfrare), ( ~rare , patternIfNotrare))

In [23]:
# Sanity check : do we get what we put in ? 
pattern.given(rare)


Out[23]:
False :  1/50
 True : 49/50

In [24]:
# Finally, our moment of truth : Bayesian inference - what is P(rare | pattern )? 
rare.given(pattern)


Out[24]:
False : 4995/5093
 True :   98/5093

In [25]:
# And, now some show off : what is the probability of being rare and having a pattern ? 
rare & pattern


Out[25]:
False : 49951/50000
 True :    49/50000

In [26]:
# All possible outcomes
Lea.cprod(rare,pattern)


Out[26]:
(False, False) : 94905/100000
 (False, True) :  4995/100000
 (True, False) :     2/100000
  (True, True) :    98/100000

Summary

Lea contains all the basic ingredients of a proabilistic programming language. It is an excellent way to learn the paradigms of probabilistic programming. Lea is currently limited to discrete random variables. For continuous random variables, and for use in live applications, a more mature and capable tool like Stan or BayesPy should be used.