In [1]:
from discrete_probability import *

Let's start with a simple three variable Bayesian network. First we need to define our variables.


In [2]:
E, H, S = variables = map(Variable, 'EHS')
# equivalent to E, H, S = variables = Variable('E'), Variable('H'), Variable('S')

By default the values a variable can take are True and False.


In [3]:
print(E)
print(E.values)


E
(True, False)

An Assignment represents a variable taking on one of its values and can be made using the << operator.


In [4]:
e = E << True
print(e)


E=True

We can get all of the assignments for a variable using get_assignments() which returns all of the assignments as an unpackable list.


In [5]:
e, e_ = E
h, h_ = H
s, s_ = S
print(h)
print(s_)


H=True
S=False

Now we'll define a small data set over these variables that we can use to learn a parameterization for our network.


In [6]:
data_header = [H, S, E]
data_samples = [
    [True, False, True],
    [True, False, True],
    [False, True, False],
    [False, False, True],
    [True, False, False],
    [True, False, True],
    [False, False, False],
    [True, False, True],
    [True, False, True],
    [False, False, True],
    [True, False, True],
    [True, True, True],
    [True, False, True],
    [True, True, True],
    [True, False, True],
    [True, False, True]]

Currently the learning routines work on samples represented as Assignments (though this will probably change later). We can use the helper function data_to_assignments to convert our data.


In [7]:
data_assignments = data_to_assignments(data_header, data_samples)
print(data_assignments)


[(E=True, S=False, H=True), (E=True, S=False, H=True), (E=False, S=True, H=False), (E=True, S=False, H=False), (S=False, E=False, H=True), (E=True, S=False, H=True), (S=False, E=False, H=False), (E=True, S=False, H=True), (E=True, S=False, H=True), (E=True, S=False, H=False), (E=True, S=False, H=True), (E=True, H=True, S=True), (E=True, S=False, H=True), (E=True, H=True, S=True), (E=True, S=False, H=True), (E=True, S=False, H=True)]

Before creating our Bayesian network let's represent this data using a Table. This class can represent either marginal or conditional tables depending on the construction arguments. A single argument creates a marginal table.


In [8]:
P = Table(variables)
P.learn_from_complete_data(data_assignments)


Out[8]:
E     | S     | H     | P(E,S,H)
---------------------------------
True  | True  | True  | 0.125
True  | True  | False | 0.0
True  | False | True  | 0.5625
True  | False | False | 0.125
False | True  | True  | 0.0
False | True  | False | 0.0625
False | False | True  | 0.0625
False | False | False | 0.0625

Table provide methods for marginalization (marginalize_over, marginalize_out) and conditioning (condition, condition_on).


In [9]:
print(P.marginalize_over([E, S]))


E     | S     | P(E,S)
-----------------------
True  | True  | 0.125
True  | False | 0.6875
False | True  | 0.0625
False | False | 0.125

In [10]:
print(P.marginalize_out([E, S]))


H     | P(H)
-------------
True  | 0.75
False | 0.25

In [11]:
print(P.condition_on([H]))


H     || E     | S     | P(E,S|H)
----------------------------------
True  || True  | True  | 0.166666666667
True  || True  | False | 0.75
True  || False | True  | 0.0
True  || False | False | 0.0833333333333
False || True  | True  | 0.0
False || True  | False | 0.5
False || False | True  | 0.25
False || False | False | 0.25

In [12]:
print(P.condition([E], [H]))


H     || E     | P(E|H)
------------------------
True  || True  | 0.916666666667
True  || False | 0.0833333333333
False || True  | 0.5
False || False | 0.5

Using these operations alone we can already answer quite a few queries. However, a much easier interface is provided that matches conventional writing. For example, to represent the marginal over $E$ given evidence $H=True$ and $S=False$ we often write $P(E\mid h,\bar{s})$. We can write this query exactly the same (appending an underscore in lieu of a bar) and get the marginal.


In [13]:
P(E|h,s_)


Out[13]:
E     | P(E|S=False, H=True)
-----------------------------
True  | 0.9
False | 0.1

Specific probabilities such as $P(e\mid h,\bar{s})$ can also be written.


In [14]:
P(e|h,s_)


Out[14]:
0.9

Unconditioned marginals can also be found such as $P(E,S)$.


In [15]:
P(E,S)


Out[15]:
E     | S     | P(E,S)
-----------------------
True  | True  | 0.125
True  | False | 0.6875
False | True  | 0.0625
False | False | 0.125

In [16]:
P(e,s_)


Out[16]:
0.6875

Note that those lower-case variables were defined in cell [5]. If they are not defined we can make assignments explicitly. For example, the below is equivalent to the above.


In [17]:
P(E<<True, S<<False)


Out[17]:
0.6875

Now we'll create a simple Bayesian network. To define the graph we need the variables that will act as nodes and the edges between the nodes. A directed edge can be specified using < and >.


In [18]:
bn = BayesianNetwork(variables, [S < H, H > E])

If you're running this in an IPython Notebook you can view the graph using some slick d3.js (based on http://bl.ocks.org/mbostock/1153292).


In [19]:
bn.display(180, 180)


Out[19]:
[H > S, H > E]

Since the Javascript doesn't carry over nicely in the viewer here's a screenshot of the graph.


In [20]:
from IPython.core.display import Image 
Image(filename='hse_graph.png')


Out[20]:

At first our distribution is unparameterized. This means that the network is "invalid".


In [21]:
print(bn.is_valid)


False

We can parameterize it from the same data set.


In [22]:
bn.learn_from_complete_data(data_assignments)
print(bn.is_valid)


True

Unfortunately I haven't written any of the Bayesian network inference algorithms yet (variable elimination, factor elimination, belief propagation, stochastic sampling, network polynomial, etc.) but for now we can turn it into a marginal Table and use that for inference. Turning it into a Table is against the point of a Bayesian network (compact representation) but it will allow us to do inference for now.


In [23]:
Pb = bn.as_marginal_table()
print(Pb)


E     | S     | H     | P(E,S,H)
---------------------------------
True  | True  | True  | 0.114583333333
True  | True  | False | 0.03125
True  | False | True  | 0.572916666667
True  | False | False | 0.09375
False | True  | True  | 0.0104166666667
False | True  | False | 0.03125
False | False | True  | 0.0520833333333
False | False | False | 0.09375

You might notice that this marginal table (above), although learned from the same data, is not the same as learning the marginal Table directly from the data (below). This is due to the additional constraints on conditional independences imposed by the network structure of the Bayesian network.


In [24]:
print(P)


E     | S     | H     | P(E,S,H)
---------------------------------
True  | True  | True  | 0.125
True  | True  | False | 0.0
True  | False | True  | 0.5625
True  | False | False | 0.125
False | True  | True  | 0.0
False | True  | False | 0.0625
False | False | True  | 0.0625
False | False | False | 0.0625

Also note that this is in spite of the distributions sharing the same family marginals.


In [25]:
print(P(H))
print(Pb(H))


H     | P(H)
-------------
True  | 0.75
False | 0.25
H     | P(H)
-------------
True  | 0.75
False | 0.25

In [26]:
print(P(S|H))
print(Pb(S|H))


H     || S     | P(S|H)
------------------------
True  || True  | 0.166666666667
True  || False | 0.833333333333
False || True  | 0.25
False || False | 0.75
H     || S     | P(S|H)
------------------------
True  || True  | 0.166666666667
True  || False | 0.833333333333
False || True  | 0.25
False || False | 0.75

In [27]:
print(P(E|H))
print(Pb(E|H))


H     || E     | P(E|H)
------------------------
True  || True  | 0.916666666667
True  || False | 0.0833333333333
False || True  | 0.5
False || False | 0.5
H     || E     | P(E|H)
------------------------
True  || True  | 0.916666666667
True  || False | 0.0833333333333
False || True  | 0.5
False || False | 0.5

In fact, we can see conditional independences in the Bayesian network. Using d-separation (and the Markovian assumption on families) we expect $E$ to be independent of $S$ given $H$. In other words, $P(E\mid s,H)=P(E\mid \bar{s},H)$. In the conditional table below we see that the probabilities for $E$ do not change with respect to $S$ because $H$ is given.


In [28]:
print(Pb(E|S,H))


S     | H     || E     | P(E|S,H)
----------------------------------
True  | True  || True  | 0.916666666667
True  | True  || False | 0.0833333333333
True  | False || True  | 0.5
True  | False || False | 0.5
False | True  || True  | 0.916666666667
False | True  || False | 0.0833333333333
False | False || True  | 0.5
False | False || False | 0.5

In [29]:
print(Pb(E|h,s))
print(Pb(E|h,s_))


E     | P(E|H=True, S=True)
----------------------------
True  | 0.916666666667
False | 0.0833333333333
E     | P(E|S=False, H=True)
-----------------------------
True  | 0.916666666667
False | 0.0833333333333

In [30]:
print(Pb(E|h_,s))
print(Pb(E|h_,s_))


E     | P(E|S=True, H=False)
-----------------------------
True  | 0.5
False | 0.5
E     | P(E|S=False, H=False)
------------------------------
True  | 0.5
False | 0.5

However, we expect, due to lack of d-separation (open value through $H$), that $E$ is dependent on $S$. We see that this is the case in the conditional table below.


In [31]:
print(Pb(E|S))


S     || E     | P(E|S)
------------------------
True  || True  | 0.777777777778
True  || False | 0.222222222222
False || True  | 0.820512820513
False || False | 0.179487179487

Also note the lack of these conditional independences in the original marginal Table.


In [32]:
print(P(E|S,H))


S     | H     || E     | P(E|S,H)
----------------------------------
True  | True  || True  | 1.0
True  | True  || False | 0.0
True  | False || True  | 0.0
True  | False || False | 1.0
False | True  || True  | 0.9
False | True  || False | 0.1
False | False || True  | 0.666666666667
False | False || False | 0.333333333333

Again, the Bayesian network imposes conditional independence constraints on the distribution. Specifically this conditional independence arises from the lack of an edge between $S$ and $E$. If we were to represent the original distribution as a Bayesian network the resulting structure would be a complete graph (all nodes connected to all other nodes).