Author : Ronojoy Adhikari
Email : rjoy@imsc.res.in | Web : www.imsc.res.in/~rjoy
Github : www.github.com/ronojoy | Twitter: @phyrjoy
In [ ]:
from lea import *
In [ ]:
# the canonical random variable : a fair coin
faircoin = Lea.fromVals('Head', 'Tail')
In [ ]:
# toss the coin a few times
faircoin.random(10)
In [ ]:
# Amitabh Bachan's coin from Sholay
sholaycoin = Lea.fromVals('Head', 'Head')
# Amitabh always wins (and, heroically, sacrifices himself for Dharamendra!)
sholaycoin.random(10)
In [ ]:
# more reasonably, a biased coin
biasedcoin = Lea.fromValFreqs(('Head', 1), ('Tail', 2))
# toss it a few times
biasedcoin.random(10)
In [28]:
# random variables with more states : a fair die
die = Lea.fromVals(1, 2, 3, 4, 5, 6)
# throw the die a few times
die.random(20)
In [29]:
# Lea does standard statistics
# die.mean
# die.mode
# die.var
# die.entropy
Summary : Random variables are objects. N
samples are drawn from the random variable y
using
y.random(N)
Standard statistical measures of distributions are provided. Nothing extraordinary (yet!).
Exercise : Write a Python code that produces the same output as the following Lea code
Lea.fromVals('rain', 'sun').random(20)
How many lines do you need in Python ?
In [30]:
# Lets create a pair of dies
die1 = die.clone()
die2 = die.clone()
In [ ]:
# The throw of dice
dice = die1 + die2
In [ ]:
dice
In [ ]:
dice.random(10)
In [ ]:
dice.mean
In [ ]:
dice.mode
In [ ]:
print dice.histo()
Summary
Random variables are abstract objects. Methods are available for operating on them algebraically. The probability distributions, methods for drawing random samples, statistical metrics, are transparently propagated.
"You just threw two dice. Can you guess the result ?"
"Here's a tip : the sum is less than 6"
In [ ]:
## We can create a new distribution, conditioned on our state of knowledge : P(sum | sum <= 6)
conditionalDice = dice.given(dice<=6)
In [ ]:
## What is our best guess for the result of the throw ?
conditionalDice.mode
In [ ]:
## Conditioning can be done in many ways : suppose we know that the first die came up 3.
dice.given(die1 == 3)
In [ ]:
## Conditioning can be done in still more ways : suppose we know that **either** of the two dies came up 3
dice.given((die1 == 3) | (die2 == 3))
Summary
Conditioning, which is the first step towards inference, is done automatically. A wide variety of conditions can be used. P(A | B) translates to a.given(b)
.
Replace with medical example
In [ ]:
# Species is a random variable with states "common" and "rare", with probabilities determined by the population. Since
# are only two states, species states are, equivalently, "rare" and "not rare". Species can be a Boolean!
rare = Lea.boolProb(1,1000)
In [ ]:
# Similarly, pattern is either "present" or "not present". It too is a Boolean, but, its probability distribution
# is conditioned on "rare" or "not rare"
patternIfrare = Lea.boolProb(98, 100)
patternIfNotrare = Lea.boolProb(5, 100)
In [ ]:
# Now, lets build the conditional probability table for P(pattern | species)
pattern = Lea.buildCPT((rare , patternIfrare), ( ~rare , patternIfNotrare))
In [ ]:
# Sanity check : do we get what we put in ?
pattern.given(rare)
In [ ]:
# Finally, our moment of truth : Bayesian inference - what is P(rare | pattern )?
rare.given(pattern)
In [ ]:
# And, now some show off : what is the probability of being rare and having a pattern ?
rare & pattern
In [ ]:
# All possible outcomes
Lea.cprod(rare,pattern)
Summary: Lea contains the necessary features of a probilistic programming language:
Though Lea is currently limited to discrete random variables it is an excellent programe to learn the paradigm of probabilistic programming. We will now move on to a more sophisticated tool for probabilistic graphical models called Pomegranate.