Learning the Structure of Generative Models

In this notebook, we'll use structure learning to find the dependency structure of a generative model. You can do this for any label matrix!

See the blog post or the paper for more details.

Generating Some Data

We'll generate some data from a known model of noisy labels in which two pairs of labeling functions are correlated.


In [ ]:
from snorkel.learning import GenerativeModelWeights
from snorkel.learning.structure import generate_label_matrix

weights = GenerativeModelWeights(10)
for i in range(10):
    weights.lf_accuracy[i] = 1.0
weights.dep_similar[0, 1] = 0.5
weights.dep_similar[2, 3] = 0.5

y, L = generate_label_matrix(weights, 10000)

Structure Learning

L is the label matrix produced by a LabelManager. A few notes:

  • The deps object is a collection of tuples specifying which labeling functions are related by which types of dependencies.
  • The keyword argument threshold is a positive float that indicates how strong the dependency has to be for it to be returned in the collection. Too many dependencies? Turn it up. Too few? Turn it down.
  • By default, the DependencySelector looks for pairwise correlations between labeling functions. Pass the keyword argument higher_order=True to the select method to also look for reinforcing and fixing dependencies (described in the data programming paper).

In [ ]:
from snorkel.learning.structure import DependencySelector
ds = DependencySelector()
deps = ds.select(L, threshold=0.05)
print(deps)
assert deps == set([(0, 1, 0), (2, 3, 0)])

Using the Learned Structure

To incorporate the selected dependencies into your generative model, just pass them in as a keyword argument:


In [ ]:
from snorkel.learning import GenerativeModel
gen_model = GenerativeModel()
gen_model.train(L, deps=deps)

In [ ]:
print(gen_model.weights.lf_accuracy)