In [53]:
import graphmodels as gm
import numpy as np
import networkx as nx
from scipy import stats
%matplotlib inline

Directed Graphical Models

DGM is a class implementing Directed Graphical Models.

Loading DGMs

You can load bayesian networks in .bif format. For example, here we load a simple 'Earthquake' network.

Some networks can be downloaded here: http://www.bnlearn.com/bnrepository/


In [2]:
dgm = gm.DGM.read('../networks/earthquake.bif')

In [3]:
dgm.draw() # you can move the cursor on a node to see it's CPD


Out[3]:

DGM is a subclass of networkx.DiGraph, so you can use any networkx functions on it.


In [4]:
nx.dag_longest_path(dgm)


Out[4]:
['Earthquake', 'Alarm', 'MaryCalls']

In [5]:
nx.average_neighbor_degree(dgm)


Out[5]:
{'Alarm': 0.0,
 'Burglary': 2.0,
 'Earthquake': 2.0,
 'JohnCalls': 0.0,
 'MaryCalls': 0.0}

Also, some DGM-specific queries about the graph are implemented:


In [6]:
list(dgm.immoralities) # list of all immoralities in graph


Out[6]:
[('Alarm', 'Burglary', 'Earthquake')]

In [7]:
list(dgm.v_structures) # list of all v_structures in graph


Out[7]:
[('Alarm', 'Burglary', 'Earthquake')]

In [8]:
changed_dgm1 = gm.DGM()
changed_dgm1.add_edges_from(dgm.edges())
changed_dgm1.add_edge('JohnCalls', 'MaryCalls')
changed_dgm1.draw()


Out[8]:

Checks for I-equivalence and I-maps


In [9]:
changed_dgm1.is_I_equivalent(dgm)


Out[9]:
False

In [10]:
changed_dgm1.is_I_map(dgm)


Out[10]:
True

In [11]:
dgm.is_I_map(changed_dgm1)


Out[11]:
False

Find which nodes are reachable in the sense of d-separation


In [12]:
dgm.reachable('JohnCalls', observed=[])


Out[12]:
{'Alarm', 'Burglary', 'Earthquake', 'MaryCalls'}

In [13]:
dgm.reachable('JohnCalls', observed=['Alarm'])


Out[13]:
set()

Factors

Currently, only TableFactor class is implemented. TableFactor represens an unnormalized discrete CPD.


In [14]:
dgm.factor('JohnCalls') # get a factor assigned to variable JohnCalls


Out[14]:
Alarm JohnCalls Prob.
True True 0.900
True False 0.100
False True 0.050
False False 0.950

In [54]:
dgm.cpd('Alarm') # get a cpd of variable Alarm (i.e. normalized factor, see below)


Out[54]:
Burglary Earthquake Alarm Prob.
True True True 0.950
True True False 0.050
True False True 0.940
True False False 0.060
False True True 0.290
False True False 0.710
False False True 0.001
False False False 0.999

Operations

All factor operations are either vectorized or written in C++, so they should be quite fast.


In [16]:
alarm = dgm.factor('Alarm')
john_calls = dgm.factor('JohnCalls')
burglary = dgm.factor('Burglary')

In [57]:
print(alarm.arguments) # all variables in the network
print(alarm.scope) # variables in the factor


['Burglary', 'Earthquake', 'Alarm', 'JohnCalls', 'MaryCalls']
['Burglary', 'Earthquake', 'Alarm']

Factor Multiplication


In [17]:
alarm * john_calls # note that the result is NOT a normalized CPD


Out[17]:
Burglary Earthquake Alarm JohnCalls Prob.
True True True True 0.855
True True True False 0.095
True True False True 0.003
True True False False 0.048
True False True True 0.846
True False True False 0.094
True False False True 0.003
True False False False 0.057
False True True True 0.261
False True True False 0.029
False True False True 0.035
False True False False 0.674
False False True True 0.001
False False True False 0.000
False False False True 0.050
False False False False 0.949

Normalization

There is no special CPD class, all CPDs are implemented as Factors. This is intentional: often there is no need to normalize probability distributions on intermediate steps of an algorithm; doing that would hurt performance.

.normalize([variables, ]) function normalized the factor so that the probabilities over variables sum to 1.


In [18]:
(alarm * john_calls).normalize('JohnCalls') # now the result is a valid CPD: P(JohnCalls | Burglary, Earthquake, Alarm)


Out[18]:
Burglary Earthquake Alarm JohnCalls Prob.
True True True True 0.900
True True True False 0.100
True True False True 0.050
True True False False 0.950
True False True True 0.900
True False True False 0.100
True False False True 0.050
True False False False 0.950
False True True True 0.900
False True True False 0.100
False True False True 0.050
False True False False 0.950
False False True True 0.900
False False True False 0.100
False False False True 0.050
False False False False 0.950

Marginalization

Sums out one or several variables. Again, note that the result is not necessarily normalized.


In [19]:
factor = burglary * alarm
factor


Out[19]:
Burglary Earthquake Alarm Prob.
True True True 0.009
True True False 0.001
True False True 0.009
True False False 0.001
False True True 0.287
False True False 0.703
False False True 0.001
False False False 0.989

In [20]:
factor.marginalize('Burglary')


Out[20]:
Earthquake Alarm Prob.
True True 0.297
True False 0.703
False True 0.010
False False 0.990

Factor division


In [21]:
alarm


Out[21]:
Burglary Earthquake Alarm Prob.
True True True 0.950
True True False 0.050
True False True 0.940
True False False 0.060
False True True 0.290
False True False 0.710
False False True 0.001
False False False 0.999

In [22]:
factor = burglary * alarm
factor


Out[22]:
Burglary Earthquake Alarm Prob.
True True True 0.009
True True False 0.001
True False True 0.009
True False False 0.001
False True True 0.287
False True False 0.703
False False True 0.001
False False False 0.989

In [23]:
factor / burglary


Out[23]:
Burglary Earthquake Alarm Prob.
True True True 0.950
True True False 0.050
True False True 0.940
True False False 0.060
False True True 0.290
False True False 0.710
False False True 0.001
False False False 0.999

Random sampling

Sampling from a Factor


In [24]:
f = gm.TableFactor(arguments=['a', 'b'], scope=['a', 'b'])
f.table = np.array([[0.5, 0.5], [0.8, 0.2]])
f.normalize(*f.scope, copy=False)
f


Out[24]:
a b Prob.
0 0 0.250
0 1 0.250
1 0 0.400
1 1 0.100

In [25]:
# sampling
rv = f.rvs(size=1e+6)

In [26]:
# Check if the sampling is correct
f1 = gm.TableFactor(['a', 'b'], ['a', 'b']).fit(rv)
f1


Out[26]:
a b Prob.
0 0 0.250
0 1 0.251
1 0 0.399
1 1 0.100

In [27]:
# conditional sampling
f.rvs(observed={'a': [0, 1, 0]})


Out[27]:
b
0 1
1 0
2 1

In [28]:
dgm.factor('Alarm').rvs(observed={
        'Earthquake': ['True', 'False', 'True'], 
        'Burglary': ['False', 'False', 'True']
    })


Out[28]:
Alarm
0 True
1 False
2 True

Sampling from DGM


In [29]:
dgm.draw()


Out[29]:

In [30]:
dgm.rvs(size=10)


Out[30]:
Earthquake Burglary Alarm MaryCalls JohnCalls
0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
5 False False False False False
6 False False False False False
7 False False False False False
8 True False False False False
9 False False False False False

Fitting

DGMs


In [31]:
# generating data
data = dgm.rvs(size=1000000)

In [32]:
# copying graph
dgm1 = gm.DGM()
dgm1.add_nodes_from(dgm.nodes(), cpd=gm.TableFactor) # here we also need to specify cpd type (TableFactor)
dgm1.add_edges_from(dgm.edges())
dgm1.draw()


Out[32]:

In [33]:
# fitting
dgm1.fit(data)
dgm1.draw()


Out[33]:

Distributions in this DGM must be close to the real distributions (those in dgm).


In [34]:
dgm.draw()


Out[34]:

Factors


In [35]:
factor = dgm.factor('Alarm') * dgm.factor('Burglary') * dgm.factor('Earthquake')

In [36]:
# generating data
data = factor.rvs(size=1000000)

In [37]:
new_factor = gm.TableFactor(['Alarm', 'Burglary', 'Earthquake'], ['Alarm', 'Burglary', 'Earthquake'])

In [38]:
# fitting
new_factor.fit(data)


Out[38]:
Alarm Burglary Earthquake Prob.
False False False 0.969
False False True 0.014
False True False 0.001
False True True 0.000
True False False 0.001
True False True 0.006
True True False 0.009
True True True 0.000

In [39]:
factor


Out[39]:
Burglary Earthquake Alarm Prob.
True True True 0.000
True True False 0.000
True False True 0.009
True False False 0.001
False True True 0.006
False True False 0.014
False False True 0.001
False False False 0.969

Inference


In [40]:
# creating inference object
inference = gm.SumProductInference(dgm)

In [41]:
# making inference queries
inference(query=['JohnCalls']) # probability of John calling


Out[41]:
JohnCalls Prob.
True 0.064
False 0.936

In [42]:
# probability of John calling in case there was a burglary
inference(query=['JohnCalls'], observed=dict(Burglary='True'))


Out[42]:
JohnCalls Prob.
True 0.849
False 0.151

In [43]:
# probability of John/Mary calling in case there was a burglary
inference(query=['JohnCalls', 'MaryCalls'], observed=dict(Burglary='True'))


Out[43]:
JohnCalls MaryCalls Prob.
True True 0.592
True False 0.257
False True 0.066
False False 0.084

In [44]:
# probability of Earthquake given that both John and Mary called
inference(query=['Earthquake'], observed=dict(JohnCalls='True', MaryCalls='True'))


Out[44]:
Earthquake Prob.
True 0.352
False 0.648

Structure Learning

Currently only Chow-Liu algorithm is implemented. It allows to find a tree structure with maximum likelihood.


In [45]:
# generating data
data = dgm.rvs(size=1000000)

In [46]:
learned = gm.chow_liu(data)

In [47]:
learned.draw()


Out[47]:

You can see that the Chow-Liu algorithm didn't find the true structure (since it is not a tree). However, the result is quite close to it.

Random DGM Generation


In [48]:
gen = gm.ErdosRenyiDGMGen(factor_gen=gm.DirichletTableFactorGen())

In [50]:
gen().draw()


Out[50]:

In [51]:
gen = gm.TreeDGMGen(factor_gen=gm.DirichletTableFactorGen())

In [52]:
gen().draw()


Out[52]:

In [ ]: