CrowdTruth for Sparse Multiple Choice Tasks: Relation Extraction

In this tutorial, we will apply CrowdTruth metrics to a sparse multiple choice crowdsourcing task for Relation Extraction from sentences. The workers were asked to read a sentence with 2 highlighted terms, then pick from a multiple choice list what are the relations expressed between the 2 terms in the sentence. The options available in the multiple choice list change with the input sentence. The task was executed on FigureEight. For more crowdsourcing annotation task examples, click here.

To replicate this experiment, the code used to design and implement this crowdsourcing annotation template is available here: template, css, javascript.

This is a screenshot of the task as it appeared to workers:

A sample dataset for this task is available in this file, containing raw output from the crowd on FigureEight. Download the file and place it in a folder named data that has the same root as this notebook. Now you can check your data:



In [1]:

    
import pandas as pd

test_data = pd.read_csv("../data/relex-sparse-multiple-choice.csv")
test_data.head()









    Out[1]:







  
    
      
      _unit_id
      _created_at
      _id
      _started_at
      _tainted
      _channel
      _trust
      _worker_id
      _country
      _region
      ...
      term1
      b1
      e1
      b2
      term2
      e2
      sent_id
      sentence
      input_relations
      output_relations
    
  
  
    
      0
      897534786
      3/24/2016 17:57:02
      1933498788
      3/24/2016 17:56:23
      False
      prodege
      0.9724
      3587109
      NaN
      NaN
      ...
      Karim Benzema
      6
      8
      3
      Lyon
      4
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      employee_or_member_of top_member_employee_of_o...
      top_member_employee_of_org
    
    
      1
      897534786
      3/24/2016 18:05:23
      1933504209
      3/24/2016 18:05:01
      False
      clixsense
      0.9667
      21665495
      NaN
      NaN
      ...
      Karim Benzema
      6
      8
      3
      Lyon
      4
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      employee_or_member_of top_member_employee_of_o...
      place_of_death
    
    
      2
      897534786
      3/24/2016 18:07:31
      1933505542
      3/24/2016 18:04:46
      False
      neodev
      0.9443
      33110177
      NaN
      NaN
      ...
      Karim Benzema
      6
      8
      3
      Lyon
      4
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      employee_or_member_of top_member_employee_of_o...
      employee_or_member_of
    
    
      3
      897534786
      3/24/2016 19:02:00
      1933542791
      3/24/2016 18:58:54
      False
      neodev
      0.9417
      16854635
      NaN
      NaN
      ...
      Karim Benzema
      6
      8
      3
      Lyon
      4
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      employee_or_member_of top_member_employee_of_o...
      NONE
    
    
      4
      897534786
      3/24/2016 19:31:55
      1933572756
      3/24/2016 19:30:21
      False
      clixsense
      0.9543
      6344072
      NaN
      NaN
      ...
      Karim Benzema
      6
      8
      3
      Lyon
      4
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      employee_or_member_of top_member_employee_of_o...
      employee_or_member_of
    
  

5 rows × 22 columns

Declaring a pre-processing configuration

The pre-processing configuration defines how to interpret the raw crowdsourcing input. To do this, we need to define a configuration class. First, we import the default CrowdTruth configuration class:



In [2]:

    
import crowdtruth
from crowdtruth.configuration import DefaultConfig

Our test class inherits the default configuration DefaultConfig, while also declaring some additional attributes that are specific to the Relation Extraction task:

inputColumns: list of input columns from the .csv file with the input data
outputColumns: list of output columns from the .csv file with the answers from the workers
annotation_separator: string that separates between the crowd annotations in outputColumns
open_ended_task: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input); in the task that we are processing, workers pick the answers from a pre-defined list, therefore the task is not open ended, and this variable is set to False
annotation_vector: list of possible crowd answers, mandatory to declare when open_ended_task is False; for our task, this is the list of all relations that were given as input to the crowd in at least one sentence
processJudgments: method that defines processing of the raw crowd data; for this task, we process the crowd answers to correspond to the values in annotation_vector

The complete configuration class is declared below:



In [3]:

    
class TestConfig(DefaultConfig):
    inputColumns = ["sent_id", "term1", "b1", "e1", "term2", "b2", "e2", "sentence", "input_relations"]
    outputColumns = ["output_relations"]
    
    annotation_separator = "\n"
    
    # processing of a closed task
    open_ended_task = False
    annotation_vector = [
        "title", "founded_org", "place_of_birth", "children", "cause_of_death",
        "top_member_employee_of_org", "employee_or_member_of", "spouse",
        "alternate_names", "subsidiaries", "place_of_death", "schools_attended",
        "place_of_headquarters", "charges", "origin", "places_of_residence",
        "none"]
    
    def processJudgments(self, judgments):
        # pre-process output to match the values in annotation_vector
        for col in self.outputColumns:
            # transform to lowercase
            judgments[col] = judgments[col].apply(lambda x: str(x).lower())
        return judgments

Pre-processing the input data

After declaring the configuration of our input file, we are ready to pre-process the crowd data:



In [4]:

    
data, config = crowdtruth.load(
    file = "../data/relex-sparse-multiple-choice.csv",
    config = TestConfig()
)

data['judgments'].head()









    Out[4]:







  
    
      
      output.output_relations
      output.output_relations.count
      output.output_relations.unique
      submitted
      started
      worker
      unit
      duration
      job
    
    
      judgment
      
      
      
      
      
      
      
      
      
    
  
  
    
      1933498788
      {u'top_member_employee_of_org': 1, u'title': 0...
      1
      17
      2016-03-24 17:57:02
      2016-03-24 17:56:23
      3587109
      897534786
      39
      ../data/relex-sparse-multiple-choice
    
    
      1933504209
      {u'place_of_death': 1, u'title': 0, u'founded_...
      1
      17
      2016-03-24 18:05:23
      2016-03-24 18:05:01
      21665495
      897534786
      22
      ../data/relex-sparse-multiple-choice
    
    
      1933505542
      {u'employee_or_member_of': 1, u'title': 0, u'f...
      1
      17
      2016-03-24 18:07:31
      2016-03-24 18:04:46
      33110177
      897534786
      165
      ../data/relex-sparse-multiple-choice
    
    
      1933542791
      {u'none': 1, u'title': 0, u'founded_org': 0, u...
      1
      17
      2016-03-24 19:02:00
      2016-03-24 18:58:54
      16854635
      897534786
      186
      ../data/relex-sparse-multiple-choice
    
    
      1933572756
      {u'employee_or_member_of': 1, u'title': 0, u'f...
      1
      17
      2016-03-24 19:31:55
      2016-03-24 19:30:21
      6344072
      897534786
      94
      ../data/relex-sparse-multiple-choice

Computing the CrowdTruth metrics

The pre-processed data can then be used to calculate the CrowdTruth metrics:



In [5]:

    
results = crowdtruth.run(data, config)

results is a dict object that contains the quality metrics for sentences, relations and crowd workers.

The sentence metrics are stored in results["units"]:



In [6]:

    
results["units"].head()









    Out[6]:







  
    
      
      duration
      input.b1
      input.b2
      input.e1
      input.e2
      input.input_relations
      input.sent_id
      input.sentence
      input.term1
      input.term2
      job
      output.output_relations
      output.output_relations.annotations
      output.output_relations.unique_annotations
      worker
      uqs
      unit_annotation_score
      uqs_initial
      unit_annotation_score_initial
    
    
      unit
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      897534786
      140.800000
      6
      3
      8
      4
      employee_or_member_of top_member_employee_of_o...
      UAD-A-1535
      On Wednesday , Lyon led through Karim Benzema ...
      Karim Benzema
      Lyon
      ../data/relex-sparse-multiple-choice
      {u'origin': 0, u'none': 5, u'spouse': 0, u'tit...
      16
      5
      15
      0.338982
      {u'origin': 0.0, u'none': 0.284653188515, u'su...
      0.285236
      {u'origin': 0.0, u'none': 0.333333333333, u'su...
    
    
      897534787
      48.533333
      23
      30
      25
      32
      top_member_employee_of_org employee_or_member_...
      UAD-A-2322
      `` We have all this library content , and we '...
      Jeff Zucker
      NBC Universal
      ../data/relex-sparse-multiple-choice
      {u'origin': 0, u'cause_of_death': 0, u'subsidi...
      19
      2
      15
      0.956508
      {u'origin': 0.0, u'cause_of_death': 0.0, u'spo...
      0.877264
      {u'origin': 0.0, u'cause_of_death': 0.0, u'spo...
    
    
      897534788
      190.933333
      0
      14
      2
      17
      places_of_residence origin subsidiaries cause_...
      UAD-A-0024
      Addie Wagenknecht ( born Portland , Oregon ) i...
      Addie Wagenknecht
      New York City
      ../data/relex-sparse-multiple-choice
      {u'origin': 1, u'cause_of_death': 0, u'subsidi...
      16
      2
      15
      0.983052
      {u'origin': 0.0682240976102, u'cause_of_death'...
      0.960948
      {u'origin': 0.0666666666667, u'cause_of_death'...
    
    
      897534789
      51.800000
      2
      0
      4
      1
      top_member_employee_of_org employee_or_member_...
      UAD-A-2211
      Toyota President Katsuaki Watanabe said Thursd...
      Katsuaki Watanabe
      Toyota
      ../data/relex-sparse-multiple-choice
      {u'origin': 0, u'cause_of_death': 1, u'subsidi...
      19
      4
      15
      0.922168
      {u'origin': 0.0, u'cause_of_death': 0.01837446...
      0.662288
      {u'origin': 0.0, u'cause_of_death': 0.06666666...
    
    
      897534790
      128.600000
      0
      23
      2
      26
      places_of_residence place_of_birth origin top_...
      UAD-A-0115
      Andrea Bargnani , nicknamed `` Il Mago '' ( tr...
      Andrea Bargnani
      Rome , Italy
      ../data/relex-sparse-multiple-choice
      {u'origin': 6, u'cause_of_death': 0, u'subsidi...
      21
      3
      15
      0.625736
      {u'origin': 0.368788829677, u'cause_of_death':...
      0.516149
      {u'origin': 0.4, u'cause_of_death': 0.0, u'spo...

The uqs column in results["units"] contains the sentence quality scores, capturing the overall workers agreement over each sentence. Here we plot its histogram:



In [7]:

    
import matplotlib.pyplot as plt
%matplotlib inline

plt.hist(results["units"]["uqs"])
plt.xlabel("Sentence Quality Score")
plt.ylabel("Sentences")









    Out[7]:





Text(0,0.5,u'Sentences')

The unit_annotation_score column in results["units"] contains the sentence-relation scores, capturing the likelihood that a relation is expressed in a sentence. For each sentence, we store a dictionary mapping each relation to its sentence-relation score.



In [8]:

    
results["units"]["unit_annotation_score"].head(10)









    Out[8]:





unit
897534786    {u'origin': 0.0, u'none': 0.284653188515, u'su...
897534787    {u'origin': 0.0, u'cause_of_death': 0.0, u'spo...
897534788    {u'origin': 0.0682240976102, u'cause_of_death'...
897534789    {u'origin': 0.0, u'cause_of_death': 0.01837446...
897534790    {u'origin': 0.368788829677, u'cause_of_death':...
897534791    {u'origin': 0.336839775711, u'cause_of_death':...
897534792    {u'origin': 0.137244254252, u'cause_of_death':...
897534793    {u'origin': 0.00423979229263, u'none': 0.97627...
897534794    {u'origin': 0.0812306534817, u'none': 0.052716...
897534795    {u'origin': 0.000757014504229, u'cause_of_deat...
Name: unit_annotation_score, dtype: object

The worker metrics are stored in results["workers"]:



In [9]:

    
results["workers"].head()









    Out[9]:







  
    
      
      duration
      job
      judgment
      unit
      wqs
      wwa
      wsa
      wqs_initial
      wwa_initial
      wsa_initial
    
    
      worker
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      3587109
      25.333333
      1
      3
      3
      0.501284
      0.682185
      7.348208e-01
      0.216892
      0.428170
      0.506556
    
    
      4316379
      30.000000
      1
      1
      1
      0.962213
      0.970964
      9.909865e-01
      0.881060
      0.916316
      0.961524
    
    
      4688131
      136.000000
      1
      1
      1
      0.973723
      0.973766
      9.999561e-01
      0.733078
      0.785714
      0.933008
    
    
      4711962
      35.000000
      1
      1
      1
      0.000000
      0.000000
      1.000000e-08
      0.000000
      0.000000
      0.000000
    
    
      6336109
      122.000000
      1
      1
      1
      0.973723
      0.973766
      9.999561e-01
      0.733078
      0.785714
      0.933008

The wqs columns in results["workers"] contains the worker quality scores, capturing the overall agreement between one worker and all the other workers.



In [10]:

    
plt.hist(results["workers"]["wqs"])
plt.xlabel("Worker Quality Score")
plt.ylabel("Workers")









    Out[10]:





Text(0,0.5,u'Workers')

The relation metrics are stored in results["annotations"]. The aqs column contains the relation quality scores, capturing the overall worker agreement over one relation.



In [11]:

    
results["annotations"]









    Out[11]:







  
    
      
      output.output_relations
      aqs
      aqs_initial
    
  
  
    
      alternate_names
      150
      1.000000e-08
      1.000000e-08
    
    
      cause_of_death
      150
      1.000000e-08
      1.000000e-08
    
    
      charges
      150
      1.000000e-08
      1.000000e-08
    
    
      children
      150
      1.000000e-08
      1.000000e-08
    
    
      employee_or_member_of
      150
      2.481372e-01
      2.492401e-01
    
    
      founded_org
      150
      1.000000e-08
      1.000000e-08
    
    
      none
      150
      8.383719e-01
      6.056911e-01
    
    
      origin
      150
      2.340059e-01
      2.371795e-01
    
    
      place_of_birth
      150
      6.943889e-01
      5.794393e-01
    
    
      place_of_death
      150
      3.706113e-03
      1.071429e-01
    
    
      place_of_headquarters
      150
      1.584687e-01
      9.523810e-02
    
    
      places_of_residence
      150
      7.787079e-01
      6.767554e-01
    
    
      schools_attended
      150
      1.000000e-08
      1.000000e-08
    
    
      spouse
      150
      1.000000e-08
      1.000000e-08
    
    
      subsidiaries
      150
      1.000000e-08
      1.000000e-08
    
    
      title
      150
      1.000000e-08
      1.000000e-08
    
    
      top_member_employee_of_org
      150
      9.444763e-01
      7.951334e-01

	_unit_id	_created_at	_id	_started_at	_tainted	_channel	_trust	_worker_id	_country	_region	...	term1	b1	e1	b2	term2	e2	sent_id	sentence	input_relations	output_relations
0	897534786	3/24/2016 17:57:02	1933498788	3/24/2016 17:56:23	False	prodege	0.9724	3587109	NaN	NaN	...	Karim Benzema	6	8	3	Lyon	4	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	employee_or_member_of top_member_employee_of_o...	top_member_employee_of_org
1	897534786	3/24/2016 18:05:23	1933504209	3/24/2016 18:05:01	False	clixsense	0.9667	21665495	NaN	NaN	...	Karim Benzema	6	8	3	Lyon	4	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	employee_or_member_of top_member_employee_of_o...	place_of_death
2	897534786	3/24/2016 18:07:31	1933505542	3/24/2016 18:04:46	False	neodev	0.9443	33110177	NaN	NaN	...	Karim Benzema	6	8	3	Lyon	4	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	employee_or_member_of top_member_employee_of_o...	employee_or_member_of
3	897534786	3/24/2016 19:02:00	1933542791	3/24/2016 18:58:54	False	neodev	0.9417	16854635	NaN	NaN	...	Karim Benzema	6	8	3	Lyon	4	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	employee_or_member_of top_member_employee_of_o...	NONE
4	897534786	3/24/2016 19:31:55	1933572756	3/24/2016 19:30:21	False	clixsense	0.9543	6344072	NaN	NaN	...	Karim Benzema	6	8	3	Lyon	4	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	employee_or_member_of top_member_employee_of_o...	employee_or_member_of

	output.output_relations	output.output_relations.count	output.output_relations.unique	submitted	started	worker	unit	duration	job
judgment
1933498788	{u'top_member_employee_of_org': 1, u'title': 0...	1	17	2016-03-24 17:57:02	2016-03-24 17:56:23	3587109	897534786	39	../data/relex-sparse-multiple-choice
1933504209	{u'place_of_death': 1, u'title': 0, u'founded_...	1	17	2016-03-24 18:05:23	2016-03-24 18:05:01	21665495	897534786	22	../data/relex-sparse-multiple-choice
1933505542	{u'employee_or_member_of': 1, u'title': 0, u'f...	1	17	2016-03-24 18:07:31	2016-03-24 18:04:46	33110177	897534786	165	../data/relex-sparse-multiple-choice
1933542791	{u'none': 1, u'title': 0, u'founded_org': 0, u...	1	17	2016-03-24 19:02:00	2016-03-24 18:58:54	16854635	897534786	186	../data/relex-sparse-multiple-choice
1933572756	{u'employee_or_member_of': 1, u'title': 0, u'f...	1	17	2016-03-24 19:31:55	2016-03-24 19:30:21	6344072	897534786	94	../data/relex-sparse-multiple-choice

	duration	input.b1	input.b2	input.e1	input.e2	input.input_relations	input.sent_id	input.sentence	input.term1	input.term2	job	output.output_relations	output.output_relations.annotations	output.output_relations.unique_annotations	worker	uqs	unit_annotation_score	uqs_initial	unit_annotation_score_initial
unit
897534786	140.800000	6	3	8	4	employee_or_member_of top_member_employee_of_o...	UAD-A-1535	On Wednesday , Lyon led through Karim Benzema ...	Karim Benzema	Lyon	../data/relex-sparse-multiple-choice	{u'origin': 0, u'none': 5, u'spouse': 0, u'tit...	16	5	15	0.338982	{u'origin': 0.0, u'none': 0.284653188515, u'su...	0.285236	{u'origin': 0.0, u'none': 0.333333333333, u'su...
897534787	48.533333	23	30	25	32	top_member_employee_of_org employee_or_member_...	UAD-A-2322	`` We have all this library content , and we '...	Jeff Zucker	NBC Universal	../data/relex-sparse-multiple-choice	{u'origin': 0, u'cause_of_death': 0, u'subsidi...	19	2	15	0.956508	{u'origin': 0.0, u'cause_of_death': 0.0, u'spo...	0.877264	{u'origin': 0.0, u'cause_of_death': 0.0, u'spo...
897534788	190.933333	0	14	2	17	places_of_residence origin subsidiaries cause_...	UAD-A-0024	Addie Wagenknecht ( born Portland , Oregon ) i...	Addie Wagenknecht	New York City	../data/relex-sparse-multiple-choice	{u'origin': 1, u'cause_of_death': 0, u'subsidi...	16	2	15	0.983052	{u'origin': 0.0682240976102, u'cause_of_death'...	0.960948	{u'origin': 0.0666666666667, u'cause_of_death'...
897534789	51.800000	2	0	4	1	top_member_employee_of_org employee_or_member_...	UAD-A-2211	Toyota President Katsuaki Watanabe said Thursd...	Katsuaki Watanabe	Toyota	../data/relex-sparse-multiple-choice	{u'origin': 0, u'cause_of_death': 1, u'subsidi...	19	4	15	0.922168	{u'origin': 0.0, u'cause_of_death': 0.01837446...	0.662288	{u'origin': 0.0, u'cause_of_death': 0.06666666...
897534790	128.600000	0	23	2	26	places_of_residence place_of_birth origin top_...	UAD-A-0115	Andrea Bargnani , nicknamed `` Il Mago '' ( tr...	Andrea Bargnani	Rome , Italy	../data/relex-sparse-multiple-choice	{u'origin': 6, u'cause_of_death': 0, u'subsidi...	21	3	15	0.625736	{u'origin': 0.368788829677, u'cause_of_death':...	0.516149	{u'origin': 0.4, u'cause_of_death': 0.0, u'spo...

	duration	job	judgment	unit	wqs	wwa	wsa	wqs_initial	wwa_initial	wsa_initial
worker
3587109	25.333333	1	3	3	0.501284	0.682185	7.348208e-01	0.216892	0.428170	0.506556
4316379	30.000000	1	1	1	0.962213	0.970964	9.909865e-01	0.881060	0.916316	0.961524
4688131	136.000000	1	1	1	0.973723	0.973766	9.999561e-01	0.733078	0.785714	0.933008
4711962	35.000000	1	1	1	0.000000	0.000000	1.000000e-08	0.000000	0.000000	0.000000
6336109	122.000000	1	1	1	0.973723	0.973766	9.999561e-01	0.733078	0.785714	0.933008

	output.output_relations	aqs	aqs_initial
alternate_names	150	1.000000e-08	1.000000e-08
cause_of_death	150	1.000000e-08	1.000000e-08
charges	150	1.000000e-08	1.000000e-08
children	150	1.000000e-08	1.000000e-08
employee_or_member_of	150	2.481372e-01	2.492401e-01
founded_org	150	1.000000e-08	1.000000e-08
none	150	8.383719e-01	6.056911e-01
origin	150	2.340059e-01	2.371795e-01
place_of_birth	150	6.943889e-01	5.794393e-01
place_of_death	150	3.706113e-03	1.071429e-01
place_of_headquarters	150	1.584687e-01	9.523810e-02
places_of_residence	150	7.787079e-01	6.767554e-01
schools_attended	150	1.000000e-08	1.000000e-08
spouse	150	1.000000e-08	1.000000e-08
subsidiaries	150	1.000000e-08	1.000000e-08
title	150	1.000000e-08	1.000000e-08
top_member_employee_of_org	150	9.444763e-01	7.951334e-01