In [53]:

    
import graphmodels as gm
import numpy as np
import networkx as nx
from scipy import stats
%matplotlib inline

Directed Graphical Models

DGM is a class implementing Directed Graphical Models.

Loading DGMs

You can load bayesian networks in .bif format. For example, here we load a simple 'Earthquake' network.

Some networks can be downloaded here: http://www.bnlearn.com/bnrepository/



In [2]:

    
dgm = gm.DGM.read('../networks/earthquake.bif')



In [3]:

    
dgm.draw() # you can move the cursor on a node to see it's CPD









    Out[3]:

DGM is a subclass of networkx.DiGraph, so you can use any networkx functions on it.



In [4]:

    
nx.dag_longest_path(dgm)









    Out[4]:





['Earthquake', 'Alarm', 'MaryCalls']



In [5]:

    
nx.average_neighbor_degree(dgm)









    Out[5]:





{'Alarm': 0.0,
 'Burglary': 2.0,
 'Earthquake': 2.0,
 'JohnCalls': 0.0,
 'MaryCalls': 0.0}

Also, some DGM-specific queries about the graph are implemented:



In [6]:

    
list(dgm.immoralities) # list of all immoralities in graph









    Out[6]:





[('Alarm', 'Burglary', 'Earthquake')]



In [7]:

    
list(dgm.v_structures) # list of all v_structures in graph









    Out[7]:





[('Alarm', 'Burglary', 'Earthquake')]



In [8]:

    
changed_dgm1 = gm.DGM()
changed_dgm1.add_edges_from(dgm.edges())
changed_dgm1.add_edge('JohnCalls', 'MaryCalls')
changed_dgm1.draw()









    Out[8]:

Checks for I-equivalence and I-maps



In [9]:

    
changed_dgm1.is_I_equivalent(dgm)









    Out[9]:





False



In [10]:

    
changed_dgm1.is_I_map(dgm)









    Out[10]:





True



In [11]:

    
dgm.is_I_map(changed_dgm1)









    Out[11]:





False

Find which nodes are reachable in the sense of d-separation



In [12]:

    
dgm.reachable('JohnCalls', observed=[])









    Out[12]:





{'Alarm', 'Burglary', 'Earthquake', 'MaryCalls'}



In [13]:

    
dgm.reachable('JohnCalls', observed=['Alarm'])









    Out[13]:





set()

Factors

Currently, only TableFactor class is implemented. TableFactor represens an unnormalized discrete CPD.



In [14]:

    
dgm.factor('JohnCalls') # get a factor assigned to variable JohnCalls









    Out[14]:






Alarm
JohnCalls
Prob.


True
True
0.900


True
False
0.100


False
True
0.050


False
False
0.950



In [54]:

    
dgm.cpd('Alarm') # get a cpd of variable Alarm (i.e. normalized factor, see below)









    Out[54]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.950


True
True
False
0.050


True
False
True
0.940


True
False
False
0.060


False
True
True
0.290


False
True
False
0.710


False
False
True
0.001


False
False
False
0.999

Operations

All factor operations are either vectorized or written in C++, so they should be quite fast.



In [16]:

    
alarm = dgm.factor('Alarm')
john_calls = dgm.factor('JohnCalls')
burglary = dgm.factor('Burglary')



In [57]:

    
print(alarm.arguments) # all variables in the network
print(alarm.scope) # variables in the factor









    



['Burglary', 'Earthquake', 'Alarm', 'JohnCalls', 'MaryCalls']
['Burglary', 'Earthquake', 'Alarm']

Factor Multiplication



In [17]:

    
alarm * john_calls # note that the result is NOT a normalized CPD









    Out[17]:






Burglary
Earthquake
Alarm
JohnCalls
Prob.


True
True
True
True
0.855


True
True
True
False
0.095


True
True
False
True
0.003


True
True
False
False
0.048


True
False
True
True
0.846


True
False
True
False
0.094


True
False
False
True
0.003


True
False
False
False
0.057


False
True
True
True
0.261


False
True
True
False
0.029


False
True
False
True
0.035


False
True
False
False
0.674


False
False
True
True
0.001


False
False
True
False
0.000


False
False
False
True
0.050


False
False
False
False
0.949

Normalization

There is no special CPD class, all CPDs are implemented as Factors. This is intentional: often there is no need to normalize probability distributions on intermediate steps of an algorithm; doing that would hurt performance.

.normalize([variables, ]) function normalized the factor so that the probabilities over variables sum to 1.



In [18]:

    
(alarm * john_calls).normalize('JohnCalls') # now the result is a valid CPD: P(JohnCalls | Burglary, Earthquake, Alarm)









    Out[18]:






Burglary
Earthquake
Alarm
JohnCalls
Prob.


True
True
True
True
0.900


True
True
True
False
0.100


True
True
False
True
0.050


True
True
False
False
0.950


True
False
True
True
0.900


True
False
True
False
0.100


True
False
False
True
0.050


True
False
False
False
0.950


False
True
True
True
0.900


False
True
True
False
0.100


False
True
False
True
0.050


False
True
False
False
0.950


False
False
True
True
0.900


False
False
True
False
0.100


False
False
False
True
0.050


False
False
False
False
0.950

Marginalization

Sums out one or several variables. Again, note that the result is not necessarily normalized.



In [19]:

    
factor = burglary * alarm
factor









    Out[19]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.009


True
True
False
0.001


True
False
True
0.009


True
False
False
0.001


False
True
True
0.287


False
True
False
0.703


False
False
True
0.001


False
False
False
0.989



In [20]:

    
factor.marginalize('Burglary')









    Out[20]:






Earthquake
Alarm
Prob.


True
True
0.297


True
False
0.703


False
True
0.010


False
False
0.990

Factor division



In [21]:

    
alarm









    Out[21]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.950


True
True
False
0.050


True
False
True
0.940


True
False
False
0.060


False
True
True
0.290


False
True
False
0.710


False
False
True
0.001


False
False
False
0.999



In [22]:

    
factor = burglary * alarm
factor









    Out[22]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.009


True
True
False
0.001


True
False
True
0.009


True
False
False
0.001


False
True
True
0.287


False
True
False
0.703


False
False
True
0.001


False
False
False
0.989



In [23]:

    
factor / burglary









    Out[23]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.950


True
True
False
0.050


True
False
True
0.940


True
False
False
0.060


False
True
True
0.290


False
True
False
0.710


False
False
True
0.001


False
False
False
0.999

Random sampling

Sampling from a Factor



In [24]:

    
f = gm.TableFactor(arguments=['a', 'b'], scope=['a', 'b'])
f.table = np.array([[0.5, 0.5], [0.8, 0.2]])
f.normalize(*f.scope, copy=False)
f









    Out[24]:






a
b
Prob.


0
0
0.250


0
1
0.250


1
0
0.400


1
1
0.100



In [25]:

    
# sampling
rv = f.rvs(size=1e+6)



In [26]:

    
# Check if the sampling is correct
f1 = gm.TableFactor(['a', 'b'], ['a', 'b']).fit(rv)
f1









    Out[26]:






a
b
Prob.


0
0
0.250


0
1
0.251


1
0
0.399


1
1
0.100



In [27]:

    
# conditional sampling
f.rvs(observed={'a': [0, 1, 0]})



In [28]:

    
dgm.factor('Alarm').rvs(observed={
        'Earthquake': ['True', 'False', 'True'], 
        'Burglary': ['False', 'False', 'True']
    })

Sampling from DGM



In [29]:

    
dgm.draw()









    Out[29]:



In [30]:

    
dgm.rvs(size=10)









    Out[30]:






  
    
      
      Earthquake
      Burglary
      Alarm
      MaryCalls
      JohnCalls
    
  
  
    
      0
      False
      False
      False
      False
      False
    
    
      1
      False
      False
      False
      False
      False
    
    
      2
      False
      False
      False
      False
      False
    
    
      3
      False
      False
      False
      False
      False
    
    
      4
      False
      False
      False
      False
      False
    
    
      5
      False
      False
      False
      False
      False
    
    
      6
      False
      False
      False
      False
      False
    
    
      7
      False
      False
      False
      False
      False
    
    
      8
      True
      False
      False
      False
      False
    
    
      9
      False
      False
      False
      False
      False

Fitting

DGMs



In [31]:

    
# generating data
data = dgm.rvs(size=1000000)



In [32]:

    
# copying graph
dgm1 = gm.DGM()
dgm1.add_nodes_from(dgm.nodes(), cpd=gm.TableFactor) # here we also need to specify cpd type (TableFactor)
dgm1.add_edges_from(dgm.edges())
dgm1.draw()









    Out[32]:



In [33]:

    
# fitting
dgm1.fit(data)
dgm1.draw()









    Out[33]:

Distributions in this DGM must be close to the real distributions (those in dgm).



In [34]:

    
dgm.draw()









    Out[34]:

Factors



In [35]:

    
factor = dgm.factor('Alarm') * dgm.factor('Burglary') * dgm.factor('Earthquake')



In [36]:

    
# generating data
data = factor.rvs(size=1000000)



In [37]:

    
new_factor = gm.TableFactor(['Alarm', 'Burglary', 'Earthquake'], ['Alarm', 'Burglary', 'Earthquake'])



In [38]:

    
# fitting
new_factor.fit(data)









    Out[38]:






Alarm
Burglary
Earthquake
Prob.


False
False
False
0.969


False
False
True
0.014


False
True
False
0.001


False
True
True
0.000


True
False
False
0.001


True
False
True
0.006


True
True
False
0.009


True
True
True
0.000



In [39]:

    
factor









    Out[39]:






Burglary
Earthquake
Alarm
Prob.


True
True
True
0.000


True
True
False
0.000


True
False
True
0.009


True
False
False
0.001


False
True
True
0.006


False
True
False
0.014


False
False
True
0.001


False
False
False
0.969

Inference



In [40]:

    
# creating inference object
inference = gm.SumProductInference(dgm)



In [41]:

    
# making inference queries
inference(query=['JohnCalls']) # probability of John calling









    Out[41]:






JohnCalls
Prob.


True
0.064


False
0.936



In [42]:

    
# probability of John calling in case there was a burglary
inference(query=['JohnCalls'], observed=dict(Burglary='True'))









    Out[42]:






JohnCalls
Prob.


True
0.849


False
0.151



In [43]:

    
# probability of John/Mary calling in case there was a burglary
inference(query=['JohnCalls', 'MaryCalls'], observed=dict(Burglary='True'))









    Out[43]:






JohnCalls
MaryCalls
Prob.


True
True
0.592


True
False
0.257


False
True
0.066


False
False
0.084



In [44]:

    
# probability of Earthquake given that both John and Mary called
inference(query=['Earthquake'], observed=dict(JohnCalls='True', MaryCalls='True'))









    Out[44]:






Earthquake
Prob.


True
0.352


False
0.648

Structure Learning

Currently only Chow-Liu algorithm is implemented. It allows to find a tree structure with maximum likelihood.



In [45]:

    
# generating data
data = dgm.rvs(size=1000000)



In [46]:

    
learned = gm.chow_liu(data)



In [47]:

    
learned.draw()









    Out[47]:

You can see that the Chow-Liu algorithm didn't find the true structure (since it is not a tree). However, the result is quite close to it.

Random DGM Generation



In [48]:

    
gen = gm.ErdosRenyiDGMGen(factor_gen=gm.DirichletTableFactorGen())



In [50]:

    
gen().draw()









    Out[50]:



In [51]:

    
gen = gm.TreeDGMGen(factor_gen=gm.DirichletTableFactorGen())



In [52]:

    
gen().draw()









    Out[52]:



In [ ]:

Burglary	Earthquake	Alarm	Prob.
True	True	True	0.950
True	True	False	0.050
True	False	True	0.940
True	False	False	0.060
False	True	True	0.290
False	True	False	0.710
False	False	True	0.001
False	False	False	0.999