Please see the full documentation of pgmpy here: http://pgmpy.org/

Exercise 1 (20 points)

  • Download the 'insurance' network file from the bnlearn repo: http://www.bnlearn.com/bnrepository/insurance/insurance.bif.gz

    You'll need to decompress the file and then use the BIFreader module from pgmpy to read the file in as a BayesianNetwork instance.

  • Print the list of nodes and the list of edges.

  • Print the cpds for 'GoodStudent' and 'CarValue'.
  • Is 'CarValue' conditionally independent of 'Age' given 'MakeModel' (use pgmpy to find out)

In [ ]:

Exercise 2 (40 points)

The first cell below imports the packages that aid in visualization of graphs. The second cell provides sample code to print the entire network. It's not very pretty, so the third cell displays the network using different code (not shown). In this exercise, you will explore trails, and create a subgraph that you can illustrate using the visualization tools.

  • Note that an 'immorality' is a v-structure X -> Z <- Y, if there is no edge between X and Y. Use the get_immoralities method, along with is_active_trail to find all the v-structures in the insurance model

  • Create a subgraph that includes the nodes: 'Antilock','Mileage','CarValue','RuggedAuto','Accident','Airbag','Cushioning' and 'MakeModel'

  • Use nx.draw to draw the subgraph.
  • There are two v-structures in the subgraph. Which are they, and how can you make them active?

In [4]:
import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import pygraphviz as pgv

In [5]:
nx.draw(insurance_model,node_size=1000,font_size=8,with_labels='TRUE',
        pos=graphviz_layout(insurance_model,prog='dot'))


/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

In [5]:
from IPython.display import Image
Image("insurance.png")


Out[5]:

In [ ]:

Exercise 3 (40 points)

Get the cancer network from bnlearn: http://www.bnlearn.com/bnrepository/cancer/cancer.bif.gz.

  • Draw this network using nx.draw
  • Assuming all variables are Bernoulli, generate a random dataset of 1000 observations (as we did in class with the student model) and learn the parameters from 2/3 of that data.
  • Predict the cancer variable for the remaining 1/3 of observations
  • What is your out-of-sample error (mean-squared)

In [ ]: