Learning the Structure of Generative Models

In this notebook, we'll use structure learning to find the dependency structure of a generative model. You can do this for any label matrix!

See the blog post or the paper for more details.

Generating Some Data

We'll generate some data from a known model of noisy labels in which two pairs of labeling functions are correlated.



In [ ]:

    
from snorkel.learning import GenerativeModelWeights
from snorkel.learning.structure import generate_label_matrix

weights = GenerativeModelWeights(10)
for i in range(10):
    weights.lf_accuracy[i] = 1.0
weights.dep_similar[0, 1] = 0.5
weights.dep_similar[2, 3] = 0.5

y, L = generate_label_matrix(weights, 10000)

Structure Learning

L is the label matrix produced by a LabelManager. A few notes:

The deps object is a collection of tuples specifying which labeling functions are related by which types of dependencies.
The keyword argument threshold is a positive float that indicates how strong the dependency has to be for it to be returned in the collection. Too many dependencies? Turn it up. Too few? Turn it down.
By default, the DependencySelector looks for pairwise correlations between labeling functions. Pass the keyword argument higher_order=True to the select method to also look for reinforcing and fixing dependencies (described in the data programming paper).



In [ ]:

    
from snorkel.learning.structure import DependencySelector
ds = DependencySelector()
deps = ds.select(L, threshold=0.05)
print(deps)
assert deps == set([(0, 1, 0), (2, 3, 0)])

Using the Learned Structure

To incorporate the selected dependencies into your generative model, just pass them in as a keyword argument:



In [ ]:

    
from snorkel.learning import GenerativeModel
gen_model = GenerativeModel()
gen_model.train(L, deps=deps)



In [ ]:

    
print(gen_model.weights.lf_accuracy)