In [ ]:
from snorkel.learning import GenerativeModelWeights
from snorkel.learning.structure import generate_label_matrix
weights = GenerativeModelWeights(10)
for i in range(10):
weights.lf_accuracy[i] = 1.0
weights.dep_similar[0, 1] = 0.5
weights.dep_similar[2, 3] = 0.5
y, L = generate_label_matrix(weights, 10000)
L is the label matrix produced by a LabelManager.
A few notes:
deps object is a collection of tuples specifying which labeling functions are related by which types of dependencies.threshold is a positive float that indicates how strong the dependency has to be for it to be returned in the collection. Too many dependencies? Turn it up. Too few? Turn it down.DependencySelector looks for pairwise correlations between labeling functions. Pass the keyword argument higher_order=True to the select method to also look for reinforcing and fixing dependencies (described in the data programming paper).
In [ ]:
from snorkel.learning.structure import DependencySelector
ds = DependencySelector()
deps = ds.select(L, threshold=0.05)
print(deps)
assert deps == set([(0, 1, 0), (2, 3, 0)])
In [ ]:
from snorkel.learning import GenerativeModel
gen_model = GenerativeModel()
gen_model.train(L, deps=deps)
In [ ]:
print(gen_model.weights.lf_accuracy)