In [ ]:
from snorkel.learning import GenerativeModelWeights
from snorkel.learning.structure import generate_label_matrix
weights = GenerativeModelWeights(10)
for i in range(10):
weights.lf_accuracy[i] = 1.0
weights.dep_similar[0, 1] = 0.5
weights.dep_similar[2, 3] = 0.5
y, L = generate_label_matrix(weights, 10000)
L
is the label matrix produced by a LabelManager
.
A few notes:
deps
object is a collection of tuples specifying which labeling functions are related by which types of dependencies.threshold
is a positive float that indicates how strong the dependency has to be for it to be returned in the collection. Too many dependencies? Turn it up. Too few? Turn it down.DependencySelector
looks for pairwise correlations between labeling functions. Pass the keyword argument higher_order=True
to the select
method to also look for reinforcing and fixing dependencies (described in the data programming paper).
In [ ]:
from snorkel.learning.structure import DependencySelector
ds = DependencySelector()
deps = ds.select(L, threshold=0.05)
print(deps)
assert deps == set([(0, 1, 0), (2, 3, 0)])
In [ ]:
from snorkel.learning import GenerativeModel
gen_model = GenerativeModel()
gen_model.train(L, deps=deps)
In [ ]:
print(gen_model.weights.lf_accuracy)