MAP inference is a task concerned with finding the assignment of random variables in a Bayesian network that optimizes the joint probability over all possible assignments. By "optimizes", we are talking about maximizing the joint probability (or minimizing the sum of the negative log-probabilities). Whenever the term "optimize" is thrown around, it should be clear that the problem lends itself to mathematical programming. MAP inference is no different - it can be solved as an integer linear program.
Let's see how we could do this. First, we will start by loading a very simple Bayesian network with two nodes "A" and "B", with only one edge pointing from "A" -> "B"
In [3]:
from pyBN import *
In [4]:
bn = read_bn('data/simple.bn')
In [8]:
print bn.V
print bn.E
print bn.F
From a quick look at the conditional probability values, it should be clear that the assignment of values to "A" and "B" has a maximal probability of 0.63 when "A=Yes" and "B=Yes".
Let's see how this problem can be solved and generalized using a well-known Python optimization library called "Pulp". Pulp is just almost as good as CPLEX and Gurobi, but doesn't require a license and can be installed directly from pip.
In [9]:
from pulp import *
We start by creating a model:
In [99]:
model = LpProblem("MAP Inference", LpMinimize)
We will now create a variable for each value in the Conditional Probability Table of each variable. These are the decision variables - which entry in the CPT to choose.
In [100]:
x1 = LpVariable("A=No",0,1,LpInteger)
x2 = LpVariable("A=Yes",0,1,LpInteger)
x3 = LpVariable("B=No|A=No",0,1,LpInteger)
x4 = LpVariable("B=Yes|A=No",0,1,LpInteger)
x5 = LpVariable("B=No|A=Yes",0,1,LpInteger)
x6 = LpVariable("B=Yes|A=Yes",0,1,LpInteger)
In [101]:
var_list = [x1,x2,x3,x4,x5,x6]
val_list = -np.log(bn.flat_cpt())
As you can see, the number of variables equals the number of conditional probability values which must be specified. This number grows quickly as the number of nodes increases - but not as quickly as if another representation besides Bayesian networks was used.
First, we will add the objective value - minimizing the sum of the negative log values of the conditional probability entries:
In [102]:
model += np.dot(var_list,val_list)
Now, let's add in our constraints to make sure that the conditional probability tables agree on the parent values. That is, the decision variable value of a given node must equal the sum of the decision variable values of all children nodes which include the parent node's value.
In [103]:
model += x1 == x3+x4, 'Edge Constraint 1'
model += x2 == x5+x6, 'Edge Constraint 2'
Moreover, let's add constraint to make sure only one value from each CPT can be chosen:
In [104]:
model += x1 + x2 == 1, 'RV Constraint 1'
model += x3 + x4 + x5 + x6 == 1, 'RV Constraint 2'
Finally, let's solve.
In [105]:
model.solve()
Out[105]:
We can now print out the variables and observe the solution:
In [106]:
for v in model.variables():
print v.name, v.varValue
And there it is - the solver shows us that the optimal solution indeed is "A=Yes" and "B=Yes|A=Yes"