Probabilistic Reasoning

This notebook serves as the supporting material for chapter 14Probabilistic Reasoning. In this notebook, we will learn how to use the code respository to build network models to reason under uncertainty according to the laws of probability theory. In the previous notebook, we briefly explained what Bayes' Rule is and how it can be utilised in Probabilistic Inference. This notebook introduces a systematic way to represent such conditional relationships in the form of Bayesian Networks. We will also have a look at a variety of approximate inference algorithms and will also explore ways in which probability theory can be applied to worlds with objects and relations(worlds represented in first order logic).

Representing knowledge in an uncertain domain

We saw in the previous notebooks that the full joint probability distribution models can answer any question about the domain but they are computationally expensive as the space complexity grows exponentially. However, we saw that independence and conditional independence relationships among variables can be of great help in defining a full joint distribution. Owing to these shortcomings of full joint distributions and to the merits of conditional relations between random variables, AI researchers have come up with a clever data structure called Bayesian Networks. Bayesian networks can represent essentially any full joint probability distribution and in many cases can do so very concisely.

A Bayesian network is a directed graph in which each node is annotated with quantitative probability information. The full specification is as follows:

Each node corresponds to a random variable, which may be discrete or continuous.
A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node Y , X is said to be a parent of Y. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG).
Each node $X_i$ has a conditional probability distribution $P(X_i \mid Parents(X_i ))$ that quantifies the effect of the parents on the node.

The topology of the network—the set of nodes and links—specifies the conditional independence relationships that hold in the domain. The intuitive meaning of an arrow is typically that X has a direct influence on Y, which suggests that causes should be parents of effects. It is usually easy for a domain expert to decide what direct influences exist in the domain—much easier, in fact, than actually specifying the probabilities themselves. Once the topology of the Bayesian network is laid out, we need only specify a conditional probability distribution for each variable, given its parents.

The Bayesian Networks possess many interesting properties which are quite brilliantly explained in the text. Readers are advised to go through the text to get a feel for the Bayesian Networks. In this notebook, our main focus will be to utilise Bayesian Networks to model various problems.

To work with the Bayesian Networks let us first load the aima jar.



In [3]:

    
%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar

The tooth cavity catch structure

Consider the simple world described in the text, consisting of variables Toothache, Cavity, Catch and Weather. It is easy to see that Weather is independent of other variables. Furthermore it can be argued that Toothache and Catch are conditionally independent given Cavity. These relationships can be represented by a Bayesian Network structure shown below. Formally, the conditional independence of Toothache and Catch, given Cavity, is indicated by the absence of a link between Toothache and Catch. Intuitively, the network represents the fact that Cavity is a direct cause of Toothache and Catch, whereas no direct causal relationship exists between Toothache and Catch.

APIs from the code repository

Let's have a look at the structure of APIs from the code repository. We will understand the APIs by considering the three defining points of the Bayesian Networks.

Each node corresponds to a random variable, which may be discrete or continuous.

The Node interface represents the node in a Bayesian Network. Given below is a description of the Node interface.

public interface Node {

    /**
     * 
     * @return the Random Variable this Node is for/on.
     */
    RandomVariable getRandomVariable();

    /**
     * 
     * @return true if this Node has no parents.
     * 
     * @see Node#getParents()
     */
    boolean isRoot();

    /**
     * 
     * @return the parent Nodes for this Node.
     */
    Set<Node> getParents();

    /**
     * 
     * @return the children Nodes for this Node.
     */
    Set<Node> getChildren();

    /**
     * Get this Node's Markov Blanket:<br>
     * 'A node is conditionally independent of all other nodes in the network,
     * given its parents, children, and children's parents - that is, given its
     * <b>MARKOV BLANKET</b> (AIMA3e pg, 517).
     * 
     * @return this Node's Markov Blanket.
     */
    Set<Node> getMarkovBlanket();

    /**
     * 
     * @return the Conditional Probability Distribution associated with this
     *         Node.
     */
    ConditionalProbabilityDistribution getCPD();

This interface can be implemented to obtain customised nodes. A default implementation is provided in the repository via the FullCPTNode class.

The second specification of the Bayesian Networks tells about the hierarchy of the network.

A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node Y , X is said to be a parent of Y. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG).

These links are stored as a set and can be obtained by the getParents() method of the Node interface.

The third specification states the data contained in each node.

Each node $X_i$ has a conditional probability distribution $P(X_i \mid Parents(X_i ))$ that quantifies the effect of the parents on the node.

This information is stored in the form of a ConditionalProbabilityDistribution inside a node. After we have defined the nodes for our network, we can use the BayesNet class from the repository and then construct a Bayesian Network. Let's work with the ToothAcheCavityAndCatch example.



In [4]:

    
package aima.notebooks.probabilisticreasoning;

import aima.core.probability.bayes.*;
import aima.core.probability.*;
import aima.core.probability.bayes.impl.*;
import aima.core.probability.util.*;
import aima.core.probability.domain.*;

// First let us define the Random Variables which make up our Bayes Network
RandVar cavityRv = new RandVar("Cavity", new BooleanDomain());
RandVar toothacheRv = new RandVar("Toothache", new BooleanDomain());
RandVar catchRv = new RandVar("Catch", new BooleanDomain());

// Now we will define the nodes that make up the network and represent the above network
// the order of the doubles in CPT is as follows
// If A,B and C are the three random variables then first the possibilities of C will be exhausted, then B and then A.
// For example if A, B and C are Boolean random variables then the doubles will be mentioned in the given order
//    A    B    C
//    1    1    1
//    1    1    0
//    1    0    1
//    1    0    0
//    0    1    1
//    0    1    0
//    0    0    1
//    0    0    0
FullCPTNode cavity = new FullCPTNode(cavityRv, new double[] {
                                    // True			
                                    0.2,
                                    // False
                                    0.8 });
FullCPTNode toothache = new FullCPTNode(toothacheRv,
				new double[] {
						// C=true, T=true
						0.6,
						// C=true, T=false
						0.4,
						// C=false, T=true
						0.1,
						// C=false, T=false
						0.9

				}, cavity);

FiniteNode catchNode = new FullCPTNode(catchRv, new double[] {
				// C=true, Catch=true
				0.9,
				// C=true, Catch=false
				0.1,
				// C=false, Catch=true
				0.2,
				// C=false, Catch=false
				0.8 }, cavity);

// Now let us consider the Bayesian network
// We need to specify only the root nodes from the nerwork
BayesNet cavityBayesNet = new BayesNet(cavity);

// Now let's extract whatever we can from the BayesNet
System.out.println("Random Variables = "+ cavityBayesNet.getVariablesInTopologicalOrder().toString());
System.out.println("The cavity Node: "+ cavityBayesNet.getNode(cavityRv).toString());
return cavityBayesNet;









    



Random Variables = [Cavity, Toothache, Catch]
The cavity Node: Cavity






    Out[4]:





aima.core.probability.bayes.impl.BayesNet@3e549709

The above block describes how to construct a Bayesian Network. However, a Bayesian Network itself is of little use. Hence,we need inference algorithms which can extract information from the network. Before, introducing various inference algorithms, we must note that a Bayesian Network is capable of describing a Full Joint Distribution by itself. This can be shown by considering the fact that a generic entry in the joint distribution is the probability of a conjunction of particular assignments to each variable, such as $P (X_1 = x_1 \land . . . \land X_n = x_n )$. We use the notation $P (x_1 , . . . , x_n )$ as an abbreviation for this. Now, in terms of conditional probability, using the product rule

$$P (x_1 , . . . , x_n ) = P (x_n | x_{n−1} , . . . , x_1 )P (x_{n−1} , . . . , x_1 )$$

Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product: $$P (x_1 , . . . , x_n ) = P (x_n | x_{n−1} , . . . , x_1 )P (x_{n−1} | x_{n−2} , . . . , x_1 ) · · · P (x_2 | x_1 )P (x_1 )$$ $$ = \prod_{i = 1}^{n}P(x_i|x_{i-1},...,x_1)$$

This identity is called the chain rule. It holds for any set of random variables. We can conclude that the specification of the joint distribution is equivalent to the general assertion that, for every variable $X_i$ in the network

$$\textbf{P}(X_i | X_{i−1} , . . . , X_1 ) = \textbf{P}(X_i | Parents(X_i ))$$

provided that $Parents(X_i) \subseteq {X_{i−1} , . . . , X_1 }$. This last condition is satisfied by numbering the nodes in a way that is consistent with the partial order implicit in the graph structure.

Since we can represent the full joint distribution of a particular system using Bayesian Networks, therefore we can consider the BayesianNetworks as Probability Models and can use them for inference procedures in the same way as we used the FullJointProbabilityModel. This model can be created using the FiniteBayesModel class from the repository. this class asks for a Bayesian Network and an Inference Procedure. If no inference procedure is mentioned, the EnumerationAsk algorithm is used as the default inference procedure. From now on, I will be using the BayesNetExampleFactory to creste the ToothAcheCavityCatch example. Let's have a look at how we can manipulate the BayesNetModel to perform inference using uncertain knowledge.



In [5]:

    
package aima.notebooks.probabilisticreasoning;

import aima.core.probability.example.*;
import aima.core.probability.bayes.*;
import aima.core.probability.bayes.impl.*;
import aima.core.probability.bayes.impl.*;
import aima.core.probability.bayes.model.*;
import aima.core.probability.proposition.*;

// Load the network from the network factory.
BayesianNetwork cavityNet = BayesNetExampleFactory.constructToothacheCavityCatchNetwork();
// Construct the BayesModel from the BayesNet
// We have not passed any inference procedure. Hence, the default inference procedure will be used.
FiniteBayesModel model = new FiniteBayesModel(cavityNet);
// Now we are ready to answer all sorts of questions.

// Let's define a few assignments
AssignmentProposition toothache = new AssignmentProposition(ExampleRV.TOOTHACHE_RV,true);
AssignmentProposition cavity = new AssignmentProposition(ExampleRV.CAVITY_RV,true);
// Now let's have a look at what we can do with the model.
// To print the random variables in the model
System.out.println("The random variables in the model = " + model.getRepresentation());
// We can calculate the prior probabilities of a variety of combinations of random variables
System.out.println("The prior probability of having a toothache = "+ model.prior(toothache));
System.out.println("The prior probability of having a cavity = "+ model.prior(cavity));
System.out.println("The probability of having a cavity and toothache simultaneously is = "+ model.prior(toothache, cavity));
// We can also calculate a variety of posterior probabilities from the model as follows
System.out.println("The probability of having a toothache given that the person has a cavity(causal direction) is = "+ 
                  model.posterior(toothache,cavity));
System.out.println("The probability of having a cavity given that the person is experiencing toothache(diagnostic direction) is = "
                  + model.posterior(cavity,toothache));









    



The random variables in the model = [Cavity, Toothache, Catch]
The prior probability of having a toothache = 0.2
The prior probability of having a cavity = 0.2
The probability of having a cavity and toothache simultaneously is = 0.12
The probability of having a toothache given that the person has a cavity(causal direction) is = 0.6
The probability of having a cavity given that the person is experiencing toothache(diagnostic direction) is = 0.6






    Out[5]:





null

Exact Inference in Bayesian Networks

The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set of query variables, given some observed event—that is, some assignment of values to a set of evidence variables.We will use the notation from the previous notebook: X denotes the query variable; E denotes the set of evidence variables $E_1 , . . . , E_m$ , and e is a particular observed event; Y will denote the nonevidence, nonquery variables $Y_1 , . . . , Y_l$ (called the hidden variables). Thus, the complete set of variables is $X = \{X\} \cup E \cup Y$. A typical query asks for the posterior probability distribution $P(X | \textbf{e})$.

Inference by enumeration

We proved in the previous notebook that any conditional probability can be computed by using the full joint distribution. Mathematically: $$ \textbf{P}(X|\textbf{e}) = \alpha \textbf{P}(X,\textbf{e}) = \alpha \sum_{y}\textbf{P}(X|\textbf{e},\textbf{y})$$

Also, we know that a Bayesian Network can give a complete representation of a full joint distribution. Hence, the above sum is calculable from a Bayesian Network. Now, let's have a look at the entire process. Let b, j and m be particular values of random variables B, J and M. Let E and A be the hidden variables. Then by using the sum and product rules of probability we get: $$P(b|j,m) = \alpha P(b) \sum_{e}P(e)\sum_{a}P(a|b,e)P(j|a)P(m|a)$$

Now, the above calculation can be represented in the form of a calculation tree shown below. The order of calculation brings with itself an intuition of a depth first tree. In fact, the enumeration ask algorithm employs a depth first approach to solve the inference problem.



In [6]:

    
%%python
from notebookUtils import *
pseudocode('Enumeration Ask')









    Out[6]:




AIMA3e
function ENUMERATION-ASK(X, e, bn) returns a distribution over X

 inputs: X, the query variable

     e, observed values for variables E

     bn, a Bayes net with variables {X} &Union E &Union Y /* Y = hidden variables */
 Q(X) ← a distribution over X, initially empty

 for each value x_i of X do

   Q(x_i) ← ENUMERATE-ALL(bn.VARS, e_{_x__{_i_}</sub>)

     where e_{_x__{_i_}</sub> is e extended with X = x_i

 return NORMALIZE(Q(X))}}

function ENUMERATE-ALL(vars, e) returns a real number

 if EMPTY?(vars) then return 1.0

 Y ← FIRST(vars)

 if Y has value y in e

   then return P(y &vert parents(Y)) × ENUMERATE-ALL(REST(vars), e)

   else return ∑_{_y_} P(y &vert parents(Y)) × ENUMERATE-ALL(REST(vars), e_{_y_})

     where e_{_y_} is e extended with Y = y

Figure ?? The enumeration algorithm for answering queries on Bayesian networks.

The above algorithm calculates the desired conditional distribution. It is implemented in the EnumerationAsk class in the repository. The algorithm takes as input a query variable, a few evidence variables and a Bayesian Network. However, to use the algorithm we will not directly call the algorithm. Instead, we will pass the EnumerationAsk as a parameter in the form of an Inference Procedure to our BayesNetModel. The cell below shows the steps.



In [20]:

    
package aima.notebooks.probabilisticreasoning;

import aima.core.probability.example.*;
import aima.core.probability.bayes.*;
import aima.core.probability.bayes.exact.*;
import aima.core.probability.bayes.impl.*;
import aima.core.probability.bayes.impl.*;
import aima.core.probability.bayes.model.*;
import aima.core.probability.proposition.*;

// Load the network from the network factory.
BayesianNetwork cavityNet = BayesNetExampleFactory.constructToothacheCavityCatchNetwork();
// Construct the BayesModel from the BayesNet
// We will pass EnumerationAsk as the new inference procedure
FiniteBayesModel model = new FiniteBayesModel(cavityNet, new EnumerationAsk());

// Now we will fully exhaust this model to extract as much information as we can

// First let us define some assignment propositions
AssignmentProposition atoothache = new AssignmentProposition(
				ExampleRV.TOOTHACHE_RV, true);
		AssignmentProposition anottoothache = new AssignmentProposition(
				ExampleRV.TOOTHACHE_RV, false);
		AssignmentProposition acavity = new AssignmentProposition(
				ExampleRV.CAVITY_RV, true);
		AssignmentProposition anotcavity = new AssignmentProposition(
				ExampleRV.CAVITY_RV, false);
		AssignmentProposition acatch = new AssignmentProposition(
				ExampleRV.CATCH_RV, true);
		AssignmentProposition anotcatch = new AssignmentProposition(
				ExampleRV.CATCH_RV, false);

// Now let us define some propositions which are conjunctions and/or disjunctions of the above propositions
ConjunctiveProposition toothacheAndNotCavity = new ConjunctiveProposition(
				atoothache, anotcavity);
DisjunctiveProposition cavityOrToothache = new DisjunctiveProposition(
				acavity, atoothache);

// First let us calculate the prior probabilities of our random variables
// The probabilities in the distribution are returned in the order <True, False>
System.out.println("The prior distribution for toothache is "+ model.priorDistribution(ExampleRV.TOOTHACHE_RV));
System.out.println("The prior distribution for cavity is "+ model.priorDistribution(ExampleRV.CAVITY_RV));
System.out.println("The prior distribution for catch is "+ model.priorDistribution(ExampleRV.CATCH_RV));
// Now let us calculate the posterior distribution is
// Posterior distribution first exhausts all the possibilities of the evidence variables
System.out.println("The posterior distribution for toothache given cavity is \n \t "+ model.posteriorDistribution(ExampleRV.TOOTHACHE_RV,
                                                                                                           ExampleRV.CAVITY_RV).toString());

System.out.println("The posterior distribution for catch given cavity is \n \t "+ model.posteriorDistribution(ExampleRV.CATCH_RV,
                                                                                                           ExampleRV.CAVITY_RV).toString());

// Now let us have a look at some individual probabilities
System.out.println("The prior probability of having a cavity is "+model.prior(acavity));
System.out.println("The posterior probability of having a cavity given a toothache is "+ model.posterior(acavity, atoothache));
System.out.println("The prior probability of having a cavity or a toothache is "+model.prior(cavityOrToothache));
System.out.println("The posterior probability of not having a cavity given a toothache is "+model.posterior(anotcavity, atoothache));
System.out.println("The prior probability of not having a cavity but having a toothache is "+model.prior(toothacheAndNotCavity));
System.out.println("The prior probability of having a cavity or a toothache is "+model.prior(cavityOrToothache));
System.out.println("The posterior probability of having a cavity given that the patient has a cavity or a toothache is  "+
                          model.posterior(acavity,cavityOrToothache));









    



The prior distribution for toothache is <0.2, 0.8>
The prior distribution for cavity is <0.2, 0.8>
The prior distribution for catch is <0.34, 0.66>
The posterior distribution for toothache given cavity is 
 	 <0.6, 0.10000000000000002, 0.4000000000000001, 0.9>
The posterior distribution for catch given cavity is 
 	 <0.9, 0.19999999999999998, 0.09999999999999999, 0.7999999999999999>
The prior probability of having a cavity is 0.2
The posterior probability of having a cavity given a toothache is 0.6
The prior probability of having a cavity or a toothache is 0.28
The posterior probability of not having a cavity given a toothache is 0.4000000000000001
The prior probability of not having a cavity but having a toothache is 0.08000000000000002
The prior probability of having a cavity or a toothache is 0.28
The posterior probability of having a cavity given that the patient has a cavity or a toothache is  0.7142857142857143






    Out[20]:





null

There are a large number of inferences that can be derived from a probability model. For the sake of conciseness, we will focus only on a few prior and posterior distributions in the upcoming examples.



In [ ]: