Reasoning Under Uncertainty Workshop

PyCon 2015

Part 2 : Probabilistic programming with Lea


Author : Ronojoy Adhikari
Email : rjoy@imsc.res.in | Web : www.imsc.res.in/~rjoy
Github : www.github.com/ronojoy | Twitter: @phyrjoy

2. Introduction to Lea

2.1 : Getting started


In [ ]:
from lea import *

In [ ]:
# the canonical random variable : a fair coin
faircoin = Lea.fromVals('Head', 'Tail')

In [ ]:
# toss the coin a few times
faircoin.random(10)

In [ ]:
# Amitabh Bachan's coin from Sholay
sholaycoin = Lea.fromVals('Head', 'Head')

# Amitabh always wins (and, heroically, sacrifices himself for Dharamendra!)
sholaycoin.random(10)

In [ ]:
# more reasonably, a biased coin
biasedcoin = Lea.fromValFreqs(('Head', 1), ('Tail', 2))

# toss it a few times
biasedcoin.random(10)

In [ ]:
# random variables with more states : a fair die
die = Lea.fromVals(1, 2, 3, 4, 5, 6)
# throw the die a few times
die.random(20)

In [ ]:
# Lea does standard statistics
# die.mean
# die.mode
# die.var
# die.entropy

Summary : Random variables are objects. N samples are drawn from the random variable y using

y.random(N)

Standard statistical measures of distributions are provided. Nothing extraordinary (yet!).

Exercise : Write a Python code that produces the same output as the following Lea code

Lea.fromVals('rain', 'sun').random(20)

How many lines do you need in Python ?

2.2 Coolness : groking random variables


In [ ]:
# Lets create a pair of dies
die1 = die.clone()
die2 = die.clone()

In [ ]:
# The throw of dice
dice = die1 + die2

In [ ]:
# This really groks random variables! 
# The probability distribution of the sum is transparently calculated
dice

In [ ]:
dice.random(10)

In [ ]:
dice.mean

In [ ]:
dice.mode

In [ ]:
print dice.histo()

Summary

Random variables are abstract objects. Methods are available for operating on them algebraically. The probability distributions, methods for drawing random samples, statistical metrics, are transparently propagated.

2.3 More coolness : conditioning

"You just threw two dice. Can you guess the result ?"

"Here's a tip : the sum is less than 6"


In [ ]:
## We can create a new distribution, conditioned on our state of knowledge : P(sum | sum <= 6)
conditionalDice = dice.given(dice<=6)

In [ ]:
## What is our best guess for the result of the throw ? 
conditionalDice.mode

In [ ]:
## Conditioning can be done in many ways : suppose we know that the first die came up 3. 
dice.given(die1 == 3)

In [ ]:
## Conditioning can be done in still more ways : suppose we know that **either** of the two dies came up 3
dice.given((die1 == 3) | (die2 == 3))

Summary

Conditioning, which is the first step towards inference, is done automatically. A wide variety of conditions can be used. P(A | B) translates to a.given(b).

Diagnostic reasoning under uncertainty

Problem 1 : An entomologist spots what might be a rare subspecies of beetle, due to the pattern on its back. In the rare subspecies, 98% have the pattern. In the common subspecies, 5% have the pattern. The rare subspecies accounts for only 0.1% of the population. How likely is it that the beetle belongs to the rare species, given that it has a pattern ?

Problem 2 : 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?


In [ ]:
# 'the rare subspecies accounts for only 0.1% of the population'
species = Lea.fromValFreqs(('common', 999), ('rare', 1))

# lets check the beetle distribution
print species.asPct()

In [ ]:
# 'in the rare subspecies, 98% have the pattern'
pattern_on_rare = Lea.boolProb(98, 100)
print pattern_on_rare.asPct()

In [ ]:
# 'in the common subspecies, 5% have the pattern'
pattern_on_common = Lea.boolProb(5, 100)
print pattern_on_common.asPct()

In [ ]:
# pattern is a random variable that is conditioned on species
pattern = Lea.buildCPT((species=='common', pattern_on_common), ( species=='rare', pattern_on_rare))

In [ ]:
# the power of conditioning
print 'Probability of pattern in the common species is \n', pattern.given(species=='common').asPct()
print 'Probability of pattern in the rare species is \n', pattern.given(species=='rare').asPct()

In [ ]:
# what is probability of species, given that pattern is present ? 
# one beautiful line of code gives you the answer
species.given(pattern)

In [ ]:
# what is probability of species, given that the pattern is absent ? 
# another beautiful line of code gives you the answer
species.given(~pattern)

In [ ]:
# and, now some show off : all possibilities
Lea.cprod(species, pattern)

Summary: Lea contains the necessary features of a probilistic programming language:

  • random variables are represented as probability distribution objects
  • algebraic operations can be performed on the random variables
  • random variables can be conditioned on other random variables
  • inference is automatic

Though Lea is currently limited to discrete random variables it is an excellent programe to learn the paradigm of probabilistic programming.

We will now move on to more complex models and reasoning with them, using Lea and Pomegranate.