Probabilistic Programming in Python

Author : Ronojoy Adhikari
Email : rjoy@imsc.res.in | Web : www.imsc.res.in/~rjoy
Github : www.github.com/ronojoy | Twitter: @phyrjoy

Part 1 : Introduction to Lea

1.1 : Getting started



In [ ]:

    
from lea import *



In [ ]:

    
# the canonical random variable : a fair coin
faircoin = Lea.fromVals('Head', 'Tail')



In [ ]:

    
# toss the coin a few times
faircoin.random(10)



In [ ]:

    
# Amitabh Bachan's coin from Sholay
sholaycoin = Lea.fromVals('Head', 'Head')

# Amitabh always wins (and, heroically, sacrifices himself for Dharamendra!)
sholaycoin.random(10)



In [ ]:

    
# more reasonably, a biased coin
biasedcoin = Lea.fromValFreqs(('Head', 1), ('Tail', 2))
# toss it a few times
biasedcoin.random(10)



In [28]:

    
# random variables with more states : a fair die
die = Lea.fromVals(1, 2, 3, 4, 5, 6)
# throw the die a few times
die.random(20)



In [29]:

    
# Lea does standard statistics
# die.mean
# die.mode
# die.var
# die.entropy

Summary : Random variables are objects. N samples are drawn from the random variable y using

y.random(N)

Standard statistical measures of distributions are provided. Nothing extraordinary (yet!).

Exercise : Write a Python code that produces the same output as the following Lea code

Lea.fromVals('rain', 'sun').random(20)

How many lines do you need in Python ?

1.2 : Now some PPL coolness



In [30]:

    
# Lets create a pair of dies
die1 = die.clone()
die2 = die.clone()



In [ ]:

    
# The throw of dice
dice = die1 + die2



In [ ]:

    
dice



In [ ]:

    
dice.random(10)



In [ ]:

    
dice.mean



In [ ]:

    
dice.mode



In [ ]:

    
print dice.histo()

Summary

Random variables are abstract objects. Methods are available for operating on them algebraically. The probability distributions, methods for drawing random samples, statistical metrics, are transparently propagated.

1.3 More PPL coolness : conditioning

"You just threw two dice. Can you guess the result ?"

"Here's a tip : the sum is less than 6"



In [ ]:

    
## We can create a new distribution, conditioned on our state of knowledge : P(sum | sum <= 6)
conditionalDice = dice.given(dice<=6)



In [ ]:

    
## What is our best guess for the result of the throw ? 
conditionalDice.mode



In [ ]:

    
## Conditioning can be done in many ways : suppose we know that the first die came up 3. 
dice.given(die1 == 3)



In [ ]:

    
## Conditioning can be done in still more ways : suppose we know that **either** of the two dies came up 3
dice.given((die1 == 3) | (die2 == 3))

Summary

Conditioning, which is the first step towards inference, is done automatically. A wide variety of conditions can be used. P(A | B) translates to a.given(b).

1.4 Reasoning under uncertainty "PPL style"

Replace with medical example



In [ ]:

    
# Species is a random variable with states "common" and "rare", with probabilities determined by the population. Since
# are only two states, species states are, equivalently, "rare" and "not rare". Species can be a Boolean!
rare = Lea.boolProb(1,1000)



In [ ]:

    
# Similarly, pattern is either "present" or "not present". It too is a Boolean, but, its probability distribution
# is conditioned on "rare" or "not rare"
patternIfrare = Lea.boolProb(98, 100)
patternIfNotrare = Lea.boolProb(5, 100)



In [ ]:

    
# Now, lets build the conditional probability table for P(pattern | species)
pattern = Lea.buildCPT((rare , patternIfrare), ( ~rare , patternIfNotrare))



In [ ]:

    
# Sanity check : do we get what we put in ? 
pattern.given(rare)



In [ ]:

    
# Finally, our moment of truth : Bayesian inference - what is P(rare | pattern )? 
rare.given(pattern)



In [ ]:

    
# And, now some show off : what is the probability of being rare and having a pattern ? 
rare & pattern



In [ ]:

    
# All possible outcomes
Lea.cprod(rare,pattern)

Summary: Lea contains the necessary features of a probilistic programming language:

random variables are represented as probability distribution objects
algebraic operations can be performed on the random variables
random variables can be conditioned on other random variables It is an excellent way to learn the paradigms of probabilistic programming.

Though Lea is currently limited to discrete random variables it is an excellent programe to learn the paradigm of probabilistic programming. We will now move on to a more sophisticated tool for probabilistic graphical models called Pomegranate.