A simple (ie. no error checking or sensible engineering) notebook to extract the student answer data from an xml file.
The semeval data here is obtained from the semeval 2013 website
I'm not 100% sure what we actually need for the moment, so I'm just going to extract the student answer data from a single file. That is, I'm not at first going to use the reference answer etc.
In [1]:
filename='semeval2013-task7/semeval2013-Task7-5way/beetle/train/Core/FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY1.xml'
In [2]:
import pandas as pd
In [3]:
from xml.etree import ElementTree as ET
In [4]:
tree=ET.parse(filename)
The reference answers are the third daughter node of the tree:
In [5]:
r=tree.getroot()
r[2]
Out[5]:
<Element 'studentAnswers' at 0x114b01598>
Now iterate over the student answers to get the specific responses. For the moment, we'll just stick to the text and the accuracy. I'll also add an index term to make it a bit easier to convert to a dataframe.
In [6]:
responses_ls=[{'accuracy':a.attrib['accuracy'], 'text':a.text, 'idx':i} for (i, a) in enumerate(r[2])]
responses_ls
Out[6]:
[{'accuracy': 'correct',
'idx': 0,
'text': 'positive battery terminal is separated by a gap from terminal 1'},
{'accuracy': 'correct',
'idx': 1,
'text': 'terminal 1 is not connected to the positive terminal'},
{'accuracy': 'contradictory',
'idx': 2,
'text': 'Because terminal 1 is connected to the positive battery terminal'},
{'accuracy': 'contradictory',
'idx': 3,
'text': 'because terminal 1 is not seperated by any gaps'},
{'accuracy': 'contradictory',
'idx': 4,
'text': 'because terminal one is connected to both the negative and positive battery terminal'},
{'accuracy': 'non_domain', 'idx': 5, 'text': 'no'},
{'accuracy': 'non_domain', 'idx': 6, 'text': 'i do not understand'},
{'accuracy': 'correct',
'idx': 7,
'text': 'Terminal 1 is seperated from the positive terminal'},
{'accuracy': 'correct',
'idx': 8,
'text': 'the positive battery terminal is not connected to terminal 1.'},
{'accuracy': 'contradictory',
'idx': 9,
'text': 'because terminal one and the positive terminal are connected'},
{'accuracy': 'contradictory',
'idx': 10,
'text': 'because terminal one is connected to the positive battery terminal'},
{'accuracy': 'contradictory',
'idx': 11,
'text': 'because terminal one and the positive battery terminal are on a closed path'},
{'accuracy': 'correct',
'idx': 12,
'text': 'because there is a gap between terminal one and the positive battery terminal'},
{'accuracy': 'correct',
'idx': 13,
'text': 'termianl 1 is not connected to the positive battery terminal.'},
{'accuracy': 'contradictory',
'idx': 14,
'text': 'because there was no separation in the positive battery terminal and terminal 1.'},
{'accuracy': 'contradictory',
'idx': 15,
'text': 'because there was no gap in the positive battery terminal and terminal 1.'},
{'accuracy': 'correct',
'idx': 16,
'text': 'because there is a gap between the positive battery terminal and terminal 1.'},
{'accuracy': 'correct',
'idx': 17,
'text': 'Because there was a gap between the positive battery terminal and terminal 1.'},
{'accuracy': 'correct',
'idx': 18,
'text': 'because there was not direct connection between the positive terminal and bulb terminal 1'},
{'accuracy': 'partially_correct_incomplete',
'idx': 19,
'text': 'becaquse there was a gap in the connection'},
{'accuracy': 'partially_correct_incomplete',
'idx': 20,
'text': 'there was a gap'},
{'accuracy': 'correct',
'idx': 21,
'text': 'Terminal 1 is not connected to the positive terminal.'},
{'accuracy': 'contradictory',
'idx': 22,
'text': 'because it was connected to the positive terminal'},
{'accuracy': 'contradictory',
'idx': 23,
'text': 'terminal 2 was connected to the negative terminal'},
{'accuracy': 'contradictory', 'idx': 24, 'text': 'its connected'},
{'accuracy': 'contradictory',
'idx': 25,
'text': 'because the terminals are in the same state'},
{'accuracy': 'partially_correct_incomplete',
'idx': 26,
'text': 'the terminal is connected to the negative terminal of the battery'},
{'accuracy': 'contradictory',
'idx': 27,
'text': 'Terminal 1 is connected to the positive terminal.'},
{'accuracy': 'correct',
'idx': 28,
'text': 'Terminal 1 is connected to the negative terminal.'},
{'accuracy': 'contradictory', 'idx': 29, 'text': 'no damaged bulb'},
{'accuracy': 'contradictory',
'idx': 30,
'text': 'I get a 1.5 V reading because the negative terminal of battery is connected to terminal of bulb and then bulb is then connected to postive terminal of battery'},
{'accuracy': 'contradictory', 'idx': 31, 'text': '1 has no gap'},
{'accuracy': 'partially_correct_incomplete',
'idx': 32,
'text': 'the terminals are seperated'},
{'accuracy': 'partially_correct_incomplete',
'idx': 33,
'text': 'there is a gap'},
{'accuracy': 'partially_correct_incomplete',
'idx': 34,
'text': 'there is no connection'},
{'accuracy': 'partially_correct_incomplete',
'idx': 35,
'text': 'there is a separation'},
{'accuracy': 'correct',
'idx': 36,
'text': 'the positive battery terminal and terminal 1 are not connected'},
{'accuracy': 'partially_correct_incomplete',
'idx': 37,
'text': 'Because there was a gap'},
{'accuracy': 'contradictory',
'idx': 38,
'text': 'it was connected to a positive battery terminal'},
{'accuracy': 'partially_correct_incomplete',
'idx': 39,
'text': 'positive battery terminal'},
{'accuracy': 'correct',
'idx': 40,
'text': 'terminal one is connected to the negative terminal and terminal 1 is seperated from the positive terminal by a gap'},
{'accuracy': 'contradictory',
'idx': 41,
'text': 'there is no gap between the posittive terminal and terminal 1'},
{'accuracy': 'non_domain', 'idx': 42, 'text': 'tell me the answer'},
{'accuracy': 'contradictory',
'idx': 43,
'text': 'Terminal 1 is connected tot he positive terminal.'},
{'accuracy': 'irrelevant',
'idx': 44,
'text': 'Voltage is the difference between a positive and negative end on a battery.'},
{'accuracy': 'correct',
'idx': 45,
'text': 'the terminal is connected to the positive battery terminal'},
{'accuracy': 'partially_correct_incomplete',
'idx': 46,
'text': 'the terminal is connected to the battery'},
{'accuracy': 'partially_correct_incomplete',
'idx': 47,
'text': 'the terminal is connected to the battery terminal'},
{'accuracy': 'contradictory',
'idx': 48,
'text': 'there is not gap between the terminals'},
{'accuracy': 'correct',
'idx': 49,
'text': 'the terminal is not connected to the positive battery terminal'},
{'accuracy': 'correct',
'idx': 50,
'text': 'terminal 1 is not connected to the positive battery terminal'},
{'accuracy': 'contradictory',
'idx': 51,
'text': 'the battery is not connected to the terminal'},
{'accuracy': 'correct',
'idx': 52,
'text': 'the positive terminal is not connected to terminal 1'},
{'accuracy': 'contradictory',
'idx': 53,
'text': 'the positve battery terminal is separated by a gap at terminal 1'},
{'accuracy': 'irrelevant',
'idx': 54,
'text': 'Because there is no gap at terminal 1'},
{'accuracy': 'contradictory',
'idx': 55,
'text': 'there is a gap at terminal 1'},
{'accuracy': 'partially_correct_incomplete',
'idx': 56,
'text': 'the positive battery terminal is separated by a gap'},
{'accuracy': 'contradictory',
'idx': 57,
'text': 'The voltage is 1.5 because thebulb terminal is connected to the battery terminal.'},
{'accuracy': 'contradictory',
'idx': 58,
'text': 'the voltage is 1.5 because he bulb terminal is connected to the battery terminal.'},
{'accuracy': 'contradictory',
'idx': 59,
'text': 'there is no gap between terminal 1 and the psoitive terminal'},
{'accuracy': 'non_domain', 'idx': 60, 'text': 'i dint know.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 61,
'text': 'the two components are seperated.'},
{'accuracy': 'contradictory',
'idx': 62,
'text': 'The positive terminal is not seperated by a gap.'},
{'accuracy': 'contradictory',
'idx': 63,
'text': 'A terminal is connected to a bulb.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 64,
'text': 'A terminal is not connected to the positive battery terminal.'},
{'accuracy': 'correct',
'idx': 65,
'text': 'the positive terminal and terminal 1 are seperated by a gap'},
{'accuracy': 'correct',
'idx': 66,
'text': 'because the positive terminal is separated with a gap from terminal 1'},
{'accuracy': 'correct',
'idx': 67,
'text': 'the gap separates the positive battery terminal from terminal 1'},
{'accuracy': 'contradictory',
'idx': 68,
'text': 'positive connected to negative terminal'},
{'accuracy': 'contradictory', 'idx': 69, 'text': 'positive charge'},
{'accuracy': 'partially_correct_incomplete',
'idx': 70,
'text': 'terminal connected to negative'},
{'accuracy': 'correct',
'idx': 71,
'text': 'terminal 1 is connected to the negative terminal'},
{'accuracy': 'contradictory',
'idx': 72,
'text': 'The terminal is connected to a positive circuit.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 73,
'text': 'It was separted by a gap.'},
{'accuracy': 'correct', 'idx': 74, 'text': 'the terminals are not connected'},
{'accuracy': 'correct',
'idx': 75,
'text': 'The positive battery was not connected to terminal one.'},
{'accuracy': 'irrelevant', 'idx': 76, 'text': 'Different electrical states.'},
{'accuracy': 'correct',
'idx': 77,
'text': 'Terminal 1 is not connected to the positive battery terminal.'},
{'accuracy': 'contradictory',
'idx': 78,
'text': 'because there was a positive and negative connection'},
{'accuracy': 'partially_correct_incomplete',
'idx': 79,
'text': 'Because they have different electrical states'},
{'accuracy': 'correct',
'idx': 80,
'text': 'The positive battery terminal is separated by a gap from terminal 1.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 81,
'text': 'because it is not connected to the positive battery'},
{'accuracy': 'contradictory', 'idx': 82, 'text': 'the bulb was not damaged'},
{'accuracy': 'non_domain', 'idx': 83, 'text': 'I dont know'},
{'accuracy': 'contradictory',
'idx': 84,
'text': 'Terminal 1 is connected to the positive battery terminal.'},
{'accuracy': 'contradictory',
'idx': 85,
'text': 'Terminal 1 was connected to the positive terminal.'},
{'accuracy': 'contradictory',
'idx': 86,
'text': 'it is connected to the batteries positive terminal'},
{'accuracy': 'contradictory',
'idx': 87,
'text': 'Because they are not damaged.'},
{'accuracy': 'correct', 'idx': 88, 'text': 'Because they are not connected.'},
{'accuracy': 'correct',
'idx': 89,
'text': 'terminal 1 is connected to the negative battery terminal'},
{'accuracy': 'correct', 'idx': 90, 'text': 'the two aren"t connected'},
{'accuracy': 'correct',
'idx': 91,
'text': 'the terminals are not connected to each other'},
{'accuracy': 'correct',
'idx': 92,
'text': 'the terminals are not connected to each other.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 93,
'text': 'because there is a gap'},
{'accuracy': 'contradictory',
'idx': 94,
'text': 'Because their was a connected circuit.'},
{'accuracy': 'partially_correct_incomplete',
'idx': 95,
'text': 'Terminal 1 and the positive terminal had different electrical states.'},
{'accuracy': 'contradictory',
'idx': 96,
'text': 'Becuase the + was making contact with terminal 1, closing the battery"s circuit.'},
{'accuracy': 'contradictory',
'idx': 97,
'text': 'Because bulb c created a gap so then it had both a positive and negative charge.'},
{'accuracy': 'contradictory',
'idx': 98,
'text': 'Its connected to the positive terminal'},
{'accuracy': 'correct',
'idx': 99,
'text': 'The positive battery is separated by a gap with terminal 1.'},
{'accuracy': 'contradictory',
'idx': 100,
'text': 'Terminal 1 is not connected to the battery.'},
{'accuracy': 'contradictory',
'idx': 101,
'text': 'There was a connection between terminal 1 and the positive terminal'},
{'accuracy': 'correct',
'idx': 102,
'text': 'Terminal one is not connected to the positive battery terminal'}]
Next, we need to carry out whatever analysis we want on the answers. In this case, we'll split on whitespace, convert to lower case, and strip punctuation. Feel free to redefine the to_tokens
function to do whatever analysis you prefer.
In [7]:
from string import punctuation
def to_tokens(textIn):
'''Convert the input textIn to a list of tokens'''
tokens_ls=[t.lower().strip(punctuation) for t in textIn.split()]
# remove any empty tokens
return [t for t in tokens_ls if t]
str='"Help!" yelped the banana, who was obviously scared out of his skin.'
print(str)
print(to_tokens(str))
"Help!" yelped the banana, who was obviously scared out of his skin.
['help', 'yelped', 'the', 'banana', 'who', 'was', 'obviously', 'scared', 'out', 'of', 'his', 'skin']
So now we can apply the to_tokens
function to each of the student responses:
In [8]:
for resp_dict in responses_ls:
resp_dict['tokens']=to_tokens(resp_dict['text'])
responses_ls
Out[8]:
[{'accuracy': 'correct',
'idx': 0,
'text': 'positive battery terminal is separated by a gap from terminal 1',
'tokens': ['positive',
'battery',
'terminal',
'is',
'separated',
'by',
'a',
'gap',
'from',
'terminal',
'1']},
{'accuracy': 'correct',
'idx': 1,
'text': 'terminal 1 is not connected to the positive terminal',
'tokens': ['terminal',
'1',
'is',
'not',
'connected',
'to',
'the',
'positive',
'terminal']},
{'accuracy': 'contradictory',
'idx': 2,
'text': 'Because terminal 1 is connected to the positive battery terminal',
'tokens': ['because',
'terminal',
'1',
'is',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 3,
'text': 'because terminal 1 is not seperated by any gaps',
'tokens': ['because',
'terminal',
'1',
'is',
'not',
'seperated',
'by',
'any',
'gaps']},
{'accuracy': 'contradictory',
'idx': 4,
'text': 'because terminal one is connected to both the negative and positive battery terminal',
'tokens': ['because',
'terminal',
'one',
'is',
'connected',
'to',
'both',
'the',
'negative',
'and',
'positive',
'battery',
'terminal']},
{'accuracy': 'non_domain', 'idx': 5, 'text': 'no', 'tokens': ['no']},
{'accuracy': 'non_domain',
'idx': 6,
'text': 'i do not understand',
'tokens': ['i', 'do', 'not', 'understand']},
{'accuracy': 'correct',
'idx': 7,
'text': 'Terminal 1 is seperated from the positive terminal',
'tokens': ['terminal',
'1',
'is',
'seperated',
'from',
'the',
'positive',
'terminal']},
{'accuracy': 'correct',
'idx': 8,
'text': 'the positive battery terminal is not connected to terminal 1.',
'tokens': ['the',
'positive',
'battery',
'terminal',
'is',
'not',
'connected',
'to',
'terminal',
'1']},
{'accuracy': 'contradictory',
'idx': 9,
'text': 'because terminal one and the positive terminal are connected',
'tokens': ['because',
'terminal',
'one',
'and',
'the',
'positive',
'terminal',
'are',
'connected']},
{'accuracy': 'contradictory',
'idx': 10,
'text': 'because terminal one is connected to the positive battery terminal',
'tokens': ['because',
'terminal',
'one',
'is',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 11,
'text': 'because terminal one and the positive battery terminal are on a closed path',
'tokens': ['because',
'terminal',
'one',
'and',
'the',
'positive',
'battery',
'terminal',
'are',
'on',
'a',
'closed',
'path']},
{'accuracy': 'correct',
'idx': 12,
'text': 'because there is a gap between terminal one and the positive battery terminal',
'tokens': ['because',
'there',
'is',
'a',
'gap',
'between',
'terminal',
'one',
'and',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'correct',
'idx': 13,
'text': 'termianl 1 is not connected to the positive battery terminal.',
'tokens': ['termianl',
'1',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 14,
'text': 'because there was no separation in the positive battery terminal and terminal 1.',
'tokens': ['because',
'there',
'was',
'no',
'separation',
'in',
'the',
'positive',
'battery',
'terminal',
'and',
'terminal',
'1']},
{'accuracy': 'contradictory',
'idx': 15,
'text': 'because there was no gap in the positive battery terminal and terminal 1.',
'tokens': ['because',
'there',
'was',
'no',
'gap',
'in',
'the',
'positive',
'battery',
'terminal',
'and',
'terminal',
'1']},
{'accuracy': 'correct',
'idx': 16,
'text': 'because there is a gap between the positive battery terminal and terminal 1.',
'tokens': ['because',
'there',
'is',
'a',
'gap',
'between',
'the',
'positive',
'battery',
'terminal',
'and',
'terminal',
'1']},
{'accuracy': 'correct',
'idx': 17,
'text': 'Because there was a gap between the positive battery terminal and terminal 1.',
'tokens': ['because',
'there',
'was',
'a',
'gap',
'between',
'the',
'positive',
'battery',
'terminal',
'and',
'terminal',
'1']},
{'accuracy': 'correct',
'idx': 18,
'text': 'because there was not direct connection between the positive terminal and bulb terminal 1',
'tokens': ['because',
'there',
'was',
'not',
'direct',
'connection',
'between',
'the',
'positive',
'terminal',
'and',
'bulb',
'terminal',
'1']},
{'accuracy': 'partially_correct_incomplete',
'idx': 19,
'text': 'becaquse there was a gap in the connection',
'tokens': ['becaquse',
'there',
'was',
'a',
'gap',
'in',
'the',
'connection']},
{'accuracy': 'partially_correct_incomplete',
'idx': 20,
'text': 'there was a gap',
'tokens': ['there', 'was', 'a', 'gap']},
{'accuracy': 'correct',
'idx': 21,
'text': 'Terminal 1 is not connected to the positive terminal.',
'tokens': ['terminal',
'1',
'is',
'not',
'connected',
'to',
'the',
'positive',
'terminal']},
{'accuracy': 'contradictory',
'idx': 22,
'text': 'because it was connected to the positive terminal',
'tokens': ['because',
'it',
'was',
'connected',
'to',
'the',
'positive',
'terminal']},
{'accuracy': 'contradictory',
'idx': 23,
'text': 'terminal 2 was connected to the negative terminal',
'tokens': ['terminal',
'2',
'was',
'connected',
'to',
'the',
'negative',
'terminal']},
{'accuracy': 'contradictory',
'idx': 24,
'text': 'its connected',
'tokens': ['its', 'connected']},
{'accuracy': 'contradictory',
'idx': 25,
'text': 'because the terminals are in the same state',
'tokens': ['because',
'the',
'terminals',
'are',
'in',
'the',
'same',
'state']},
{'accuracy': 'partially_correct_incomplete',
'idx': 26,
'text': 'the terminal is connected to the negative terminal of the battery',
'tokens': ['the',
'terminal',
'is',
'connected',
'to',
'the',
'negative',
'terminal',
'of',
'the',
'battery']},
{'accuracy': 'contradictory',
'idx': 27,
'text': 'Terminal 1 is connected to the positive terminal.',
'tokens': ['terminal',
'1',
'is',
'connected',
'to',
'the',
'positive',
'terminal']},
{'accuracy': 'correct',
'idx': 28,
'text': 'Terminal 1 is connected to the negative terminal.',
'tokens': ['terminal',
'1',
'is',
'connected',
'to',
'the',
'negative',
'terminal']},
{'accuracy': 'contradictory',
'idx': 29,
'text': 'no damaged bulb',
'tokens': ['no', 'damaged', 'bulb']},
{'accuracy': 'contradictory',
'idx': 30,
'text': 'I get a 1.5 V reading because the negative terminal of battery is connected to terminal of bulb and then bulb is then connected to postive terminal of battery',
'tokens': ['i',
'get',
'a',
'1.5',
'v',
'reading',
'because',
'the',
'negative',
'terminal',
'of',
'battery',
'is',
'connected',
'to',
'terminal',
'of',
'bulb',
'and',
'then',
'bulb',
'is',
'then',
'connected',
'to',
'postive',
'terminal',
'of',
'battery']},
{'accuracy': 'contradictory',
'idx': 31,
'text': '1 has no gap',
'tokens': ['1', 'has', 'no', 'gap']},
{'accuracy': 'partially_correct_incomplete',
'idx': 32,
'text': 'the terminals are seperated',
'tokens': ['the', 'terminals', 'are', 'seperated']},
{'accuracy': 'partially_correct_incomplete',
'idx': 33,
'text': 'there is a gap',
'tokens': ['there', 'is', 'a', 'gap']},
{'accuracy': 'partially_correct_incomplete',
'idx': 34,
'text': 'there is no connection',
'tokens': ['there', 'is', 'no', 'connection']},
{'accuracy': 'partially_correct_incomplete',
'idx': 35,
'text': 'there is a separation',
'tokens': ['there', 'is', 'a', 'separation']},
{'accuracy': 'correct',
'idx': 36,
'text': 'the positive battery terminal and terminal 1 are not connected',
'tokens': ['the',
'positive',
'battery',
'terminal',
'and',
'terminal',
'1',
'are',
'not',
'connected']},
{'accuracy': 'partially_correct_incomplete',
'idx': 37,
'text': 'Because there was a gap',
'tokens': ['because', 'there', 'was', 'a', 'gap']},
{'accuracy': 'contradictory',
'idx': 38,
'text': 'it was connected to a positive battery terminal',
'tokens': ['it',
'was',
'connected',
'to',
'a',
'positive',
'battery',
'terminal']},
{'accuracy': 'partially_correct_incomplete',
'idx': 39,
'text': 'positive battery terminal',
'tokens': ['positive', 'battery', 'terminal']},
{'accuracy': 'correct',
'idx': 40,
'text': 'terminal one is connected to the negative terminal and terminal 1 is seperated from the positive terminal by a gap',
'tokens': ['terminal',
'one',
'is',
'connected',
'to',
'the',
'negative',
'terminal',
'and',
'terminal',
'1',
'is',
'seperated',
'from',
'the',
'positive',
'terminal',
'by',
'a',
'gap']},
{'accuracy': 'contradictory',
'idx': 41,
'text': 'there is no gap between the posittive terminal and terminal 1',
'tokens': ['there',
'is',
'no',
'gap',
'between',
'the',
'posittive',
'terminal',
'and',
'terminal',
'1']},
{'accuracy': 'non_domain',
'idx': 42,
'text': 'tell me the answer',
'tokens': ['tell', 'me', 'the', 'answer']},
{'accuracy': 'contradictory',
'idx': 43,
'text': 'Terminal 1 is connected tot he positive terminal.',
'tokens': ['terminal',
'1',
'is',
'connected',
'tot',
'he',
'positive',
'terminal']},
{'accuracy': 'irrelevant',
'idx': 44,
'text': 'Voltage is the difference between a positive and negative end on a battery.',
'tokens': ['voltage',
'is',
'the',
'difference',
'between',
'a',
'positive',
'and',
'negative',
'end',
'on',
'a',
'battery']},
{'accuracy': 'correct',
'idx': 45,
'text': 'the terminal is connected to the positive battery terminal',
'tokens': ['the',
'terminal',
'is',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'partially_correct_incomplete',
'idx': 46,
'text': 'the terminal is connected to the battery',
'tokens': ['the', 'terminal', 'is', 'connected', 'to', 'the', 'battery']},
{'accuracy': 'partially_correct_incomplete',
'idx': 47,
'text': 'the terminal is connected to the battery terminal',
'tokens': ['the',
'terminal',
'is',
'connected',
'to',
'the',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 48,
'text': 'there is not gap between the terminals',
'tokens': ['there', 'is', 'not', 'gap', 'between', 'the', 'terminals']},
{'accuracy': 'correct',
'idx': 49,
'text': 'the terminal is not connected to the positive battery terminal',
'tokens': ['the',
'terminal',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'correct',
'idx': 50,
'text': 'terminal 1 is not connected to the positive battery terminal',
'tokens': ['terminal',
'1',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 51,
'text': 'the battery is not connected to the terminal',
'tokens': ['the',
'battery',
'is',
'not',
'connected',
'to',
'the',
'terminal']},
{'accuracy': 'correct',
'idx': 52,
'text': 'the positive terminal is not connected to terminal 1',
'tokens': ['the',
'positive',
'terminal',
'is',
'not',
'connected',
'to',
'terminal',
'1']},
{'accuracy': 'contradictory',
'idx': 53,
'text': 'the positve battery terminal is separated by a gap at terminal 1',
'tokens': ['the',
'positve',
'battery',
'terminal',
'is',
'separated',
'by',
'a',
'gap',
'at',
'terminal',
'1']},
{'accuracy': 'irrelevant',
'idx': 54,
'text': 'Because there is no gap at terminal 1',
'tokens': ['because', 'there', 'is', 'no', 'gap', 'at', 'terminal', '1']},
{'accuracy': 'contradictory',
'idx': 55,
'text': 'there is a gap at terminal 1',
'tokens': ['there', 'is', 'a', 'gap', 'at', 'terminal', '1']},
{'accuracy': 'partially_correct_incomplete',
'idx': 56,
'text': 'the positive battery terminal is separated by a gap',
'tokens': ['the',
'positive',
'battery',
'terminal',
'is',
'separated',
'by',
'a',
'gap']},
{'accuracy': 'contradictory',
'idx': 57,
'text': 'The voltage is 1.5 because thebulb terminal is connected to the battery terminal.',
'tokens': ['the',
'voltage',
'is',
'1.5',
'because',
'thebulb',
'terminal',
'is',
'connected',
'to',
'the',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 58,
'text': 'the voltage is 1.5 because he bulb terminal is connected to the battery terminal.',
'tokens': ['the',
'voltage',
'is',
'1.5',
'because',
'he',
'bulb',
'terminal',
'is',
'connected',
'to',
'the',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 59,
'text': 'there is no gap between terminal 1 and the psoitive terminal',
'tokens': ['there',
'is',
'no',
'gap',
'between',
'terminal',
'1',
'and',
'the',
'psoitive',
'terminal']},
{'accuracy': 'non_domain',
'idx': 60,
'text': 'i dint know.',
'tokens': ['i', 'dint', 'know']},
{'accuracy': 'partially_correct_incomplete',
'idx': 61,
'text': 'the two components are seperated.',
'tokens': ['the', 'two', 'components', 'are', 'seperated']},
{'accuracy': 'contradictory',
'idx': 62,
'text': 'The positive terminal is not seperated by a gap.',
'tokens': ['the',
'positive',
'terminal',
'is',
'not',
'seperated',
'by',
'a',
'gap']},
{'accuracy': 'contradictory',
'idx': 63,
'text': 'A terminal is connected to a bulb.',
'tokens': ['a', 'terminal', 'is', 'connected', 'to', 'a', 'bulb']},
{'accuracy': 'partially_correct_incomplete',
'idx': 64,
'text': 'A terminal is not connected to the positive battery terminal.',
'tokens': ['a',
'terminal',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'correct',
'idx': 65,
'text': 'the positive terminal and terminal 1 are seperated by a gap',
'tokens': ['the',
'positive',
'terminal',
'and',
'terminal',
'1',
'are',
'seperated',
'by',
'a',
'gap']},
{'accuracy': 'correct',
'idx': 66,
'text': 'because the positive terminal is separated with a gap from terminal 1',
'tokens': ['because',
'the',
'positive',
'terminal',
'is',
'separated',
'with',
'a',
'gap',
'from',
'terminal',
'1']},
{'accuracy': 'correct',
'idx': 67,
'text': 'the gap separates the positive battery terminal from terminal 1',
'tokens': ['the',
'gap',
'separates',
'the',
'positive',
'battery',
'terminal',
'from',
'terminal',
'1']},
{'accuracy': 'contradictory',
'idx': 68,
'text': 'positive connected to negative terminal',
'tokens': ['positive', 'connected', 'to', 'negative', 'terminal']},
{'accuracy': 'contradictory',
'idx': 69,
'text': 'positive charge',
'tokens': ['positive', 'charge']},
{'accuracy': 'partially_correct_incomplete',
'idx': 70,
'text': 'terminal connected to negative',
'tokens': ['terminal', 'connected', 'to', 'negative']},
{'accuracy': 'correct',
'idx': 71,
'text': 'terminal 1 is connected to the negative terminal',
'tokens': ['terminal',
'1',
'is',
'connected',
'to',
'the',
'negative',
'terminal']},
{'accuracy': 'contradictory',
'idx': 72,
'text': 'The terminal is connected to a positive circuit.',
'tokens': ['the',
'terminal',
'is',
'connected',
'to',
'a',
'positive',
'circuit']},
{'accuracy': 'partially_correct_incomplete',
'idx': 73,
'text': 'It was separted by a gap.',
'tokens': ['it', 'was', 'separted', 'by', 'a', 'gap']},
{'accuracy': 'correct',
'idx': 74,
'text': 'the terminals are not connected',
'tokens': ['the', 'terminals', 'are', 'not', 'connected']},
{'accuracy': 'correct',
'idx': 75,
'text': 'The positive battery was not connected to terminal one.',
'tokens': ['the',
'positive',
'battery',
'was',
'not',
'connected',
'to',
'terminal',
'one']},
{'accuracy': 'irrelevant',
'idx': 76,
'text': 'Different electrical states.',
'tokens': ['different', 'electrical', 'states']},
{'accuracy': 'correct',
'idx': 77,
'text': 'Terminal 1 is not connected to the positive battery terminal.',
'tokens': ['terminal',
'1',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 78,
'text': 'because there was a positive and negative connection',
'tokens': ['because',
'there',
'was',
'a',
'positive',
'and',
'negative',
'connection']},
{'accuracy': 'partially_correct_incomplete',
'idx': 79,
'text': 'Because they have different electrical states',
'tokens': ['because', 'they', 'have', 'different', 'electrical', 'states']},
{'accuracy': 'correct',
'idx': 80,
'text': 'The positive battery terminal is separated by a gap from terminal 1.',
'tokens': ['the',
'positive',
'battery',
'terminal',
'is',
'separated',
'by',
'a',
'gap',
'from',
'terminal',
'1']},
{'accuracy': 'partially_correct_incomplete',
'idx': 81,
'text': 'because it is not connected to the positive battery',
'tokens': ['because',
'it',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery']},
{'accuracy': 'contradictory',
'idx': 82,
'text': 'the bulb was not damaged',
'tokens': ['the', 'bulb', 'was', 'not', 'damaged']},
{'accuracy': 'non_domain',
'idx': 83,
'text': 'I dont know',
'tokens': ['i', 'dont', 'know']},
{'accuracy': 'contradictory',
'idx': 84,
'text': 'Terminal 1 is connected to the positive battery terminal.',
'tokens': ['terminal',
'1',
'is',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']},
{'accuracy': 'contradictory',
'idx': 85,
'text': 'Terminal 1 was connected to the positive terminal.',
'tokens': ['terminal',
'1',
'was',
'connected',
'to',
'the',
'positive',
'terminal']},
{'accuracy': 'contradictory',
'idx': 86,
'text': 'it is connected to the batteries positive terminal',
'tokens': ['it',
'is',
'connected',
'to',
'the',
'batteries',
'positive',
'terminal']},
{'accuracy': 'contradictory',
'idx': 87,
'text': 'Because they are not damaged.',
'tokens': ['because', 'they', 'are', 'not', 'damaged']},
{'accuracy': 'correct',
'idx': 88,
'text': 'Because they are not connected.',
'tokens': ['because', 'they', 'are', 'not', 'connected']},
{'accuracy': 'correct',
'idx': 89,
'text': 'terminal 1 is connected to the negative battery terminal',
'tokens': ['terminal',
'1',
'is',
'connected',
'to',
'the',
'negative',
'battery',
'terminal']},
{'accuracy': 'correct',
'idx': 90,
'text': 'the two aren"t connected',
'tokens': ['the', 'two', 'aren"t', 'connected']},
{'accuracy': 'correct',
'idx': 91,
'text': 'the terminals are not connected to each other',
'tokens': ['the',
'terminals',
'are',
'not',
'connected',
'to',
'each',
'other']},
{'accuracy': 'correct',
'idx': 92,
'text': 'the terminals are not connected to each other.',
'tokens': ['the',
'terminals',
'are',
'not',
'connected',
'to',
'each',
'other']},
{'accuracy': 'partially_correct_incomplete',
'idx': 93,
'text': 'because there is a gap',
'tokens': ['because', 'there', 'is', 'a', 'gap']},
{'accuracy': 'contradictory',
'idx': 94,
'text': 'Because their was a connected circuit.',
'tokens': ['because', 'their', 'was', 'a', 'connected', 'circuit']},
{'accuracy': 'partially_correct_incomplete',
'idx': 95,
'text': 'Terminal 1 and the positive terminal had different electrical states.',
'tokens': ['terminal',
'1',
'and',
'the',
'positive',
'terminal',
'had',
'different',
'electrical',
'states']},
{'accuracy': 'contradictory',
'idx': 96,
'text': 'Becuase the + was making contact with terminal 1, closing the battery"s circuit.',
'tokens': ['becuase',
'the',
'was',
'making',
'contact',
'with',
'terminal',
'1',
'closing',
'the',
'battery"s',
'circuit']},
{'accuracy': 'contradictory',
'idx': 97,
'text': 'Because bulb c created a gap so then it had both a positive and negative charge.',
'tokens': ['because',
'bulb',
'c',
'created',
'a',
'gap',
'so',
'then',
'it',
'had',
'both',
'a',
'positive',
'and',
'negative',
'charge']},
{'accuracy': 'contradictory',
'idx': 98,
'text': 'Its connected to the positive terminal',
'tokens': ['its', 'connected', 'to', 'the', 'positive', 'terminal']},
{'accuracy': 'correct',
'idx': 99,
'text': 'The positive battery is separated by a gap with terminal 1.',
'tokens': ['the',
'positive',
'battery',
'is',
'separated',
'by',
'a',
'gap',
'with',
'terminal',
'1']},
{'accuracy': 'contradictory',
'idx': 100,
'text': 'Terminal 1 is not connected to the battery.',
'tokens': ['terminal',
'1',
'is',
'not',
'connected',
'to',
'the',
'battery']},
{'accuracy': 'contradictory',
'idx': 101,
'text': 'There was a connection between terminal 1 and the positive terminal',
'tokens': ['there',
'was',
'a',
'connection',
'between',
'terminal',
'1',
'and',
'the',
'positive',
'terminal']},
{'accuracy': 'correct',
'idx': 102,
'text': 'Terminal one is not connected to the positive battery terminal',
'tokens': ['terminal',
'one',
'is',
'not',
'connected',
'to',
'the',
'positive',
'battery',
'terminal']}]
OK, good. So now let's see how big the vocabulary is for the complete set:
In [9]:
vocab_set=set()
for resp_dict in responses_ls:
vocab_set=vocab_set.union(set(resp_dict['tokens']))
len(vocab_set)
Out[9]:
97
Now we can set up a document frequency dict:
In [10]:
docFreq_dict={}
for t in vocab_set:
docFreq_dict[t]=len([resp_dict for resp_dict in responses_ls if t in resp_dict['tokens']])
docFreq_dict
Out[10]:
{'1': 40,
'1.5': 3,
'2': 1,
'a': 31,
'and': 20,
'answer': 1,
'any': 1,
'are': 12,
'aren"t': 1,
'at': 3,
'batteries': 1,
'battery': 39,
'battery"s': 1,
'becaquse': 1,
'because': 28,
'becuase': 1,
'between': 9,
'both': 2,
'bulb': 7,
'by': 10,
'c': 1,
'charge': 2,
'circuit': 3,
'closed': 1,
'closing': 1,
'components': 1,
'connected': 50,
'connection': 5,
'contact': 1,
'created': 1,
'damaged': 3,
'difference': 1,
'different': 3,
'dint': 1,
'direct': 1,
'do': 1,
'dont': 1,
'each': 2,
'electrical': 3,
'end': 1,
'from': 6,
'gap': 27,
'gaps': 1,
'get': 1,
'had': 2,
'has': 1,
'have': 1,
'he': 2,
'i': 4,
'in': 4,
'is': 54,
'it': 6,
'its': 2,
'know': 2,
'making': 1,
'me': 1,
'negative': 13,
'no': 9,
'not': 26,
'of': 2,
'on': 2,
'one': 8,
'other': 2,
'path': 1,
'positive': 52,
'posittive': 1,
'positve': 1,
'postive': 1,
'psoitive': 1,
'reading': 1,
'same': 1,
'separated': 6,
'separates': 1,
'separation': 2,
'separted': 1,
'seperated': 7,
'so': 1,
'state': 1,
'states': 3,
'tell': 1,
'termianl': 1,
'terminal': 68,
'terminals': 6,
'the': 71,
'thebulb': 1,
'their': 1,
'then': 2,
'there': 20,
'they': 3,
'to': 42,
'tot': 1,
'two': 2,
'understand': 1,
'v': 1,
'voltage': 3,
'was': 18,
'with': 3}
Now add a tf.idf dict to each of the responses:
In [11]:
for resp_dict in responses_ls:
resp_dict['tfidf']={t:resp_dict['tokens'].count(t)/docFreq_dict[t] for t in resp_dict['tokens']}
responses_ls[6]
Out[11]:
{'accuracy': 'non_domain',
'idx': 6,
'text': 'i do not understand',
'tfidf': {'do': 1.0,
'i': 0.25,
'not': 0.038461538461538464,
'understand': 1.0},
'tokens': ['i', 'do', 'not', 'understand']}
Finally, convert the response data into a dataframe:
In [14]:
out_df=pd.DataFrame(index=docFreq_dict.keys())
for resp_dict in responses_ls:
out_df[resp_dict['idx']]=pd.Series(resp_dict['tfidf'], index=out_df.index)
out_df=out_df.fillna(0).T
out_df.head()
Out[14]:
its
the
from
components
one
are
other
bulb
any
different
...
understand
no
v
postive
he
circuit
to
get
at
making
0
0.0
0.000000
0.166667
0.0
0.000
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.00000
0.0
0.0
0.0
1
0.0
0.014085
0.000000
0.0
0.000
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.02381
0.0
0.0
0.0
2
0.0
0.014085
0.000000
0.0
0.000
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.02381
0.0
0.0
0.0
3
0.0
0.000000
0.000000
0.0
0.000
0.0
0.0
0.0
1.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.00000
0.0
0.0
0.0
4
0.0
0.014085
0.000000
0.0
0.125
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.02381
0.0
0.0
0.0
5 rows × 97 columns
In [15]:
accuracy_ss=pd.Series({r['idx']:r['accuracy'] for r in responses_ls})
accuracy_ss.head()
Out[15]:
0 correct
1 correct
2 contradictory
3 contradictory
4 contradictory
dtype: object
In [ ]:
Content source: undercertainty/ou_nlp
Similar notebooks: