A simple (ie. no error checking or sensible engineering) notebook to extract the student answer data from a single xml file.

I'll also export the data to a csv file at the end of this, so that it's easy to read in at the beginning of another notebook.

Following discussions with Suraj, we want the representation to take into account the student's response, the official answer, and the grade. So there'll be a little fiddliness linking the student response back to the gold standard response.

So, first read the file:



In [25]:

    
filename='semeval2013-task7/semeval2013-Task7-5way/beetle/train/Core/FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY1.xml'

It's an xml file, so we'll need the xml.etree parser, and pandas so that we can import into a dataframe:



In [26]:

    
import pandas as pd

from xml.etree import ElementTree as ET



In [27]:

    
tree=ET.parse(filename)

r=tree.getroot()

Now, the reference answers are in the second daughter node of the tree. We can extract these and store them in a dictionary. To distinguish between reference answer tokens and student response tokens, I'm going to append each token in the reference answers with _RA, and each of the tokens in a student response with _SR.



In [28]:

    
from string import punctuation

def to_tokens(textIn):
    '''Convert the input textIn to a list of tokens'''
    tokens_ls=[t.lower().strip(punctuation) for t in textIn.split()]
    # remove any empty tokens
    return [t for t in tokens_ls if t]

str='"Help!" yelped the banana, who was obviously scared out of his skin.'
print(str)
print(to_tokens(str))









    



"Help!" yelped the banana, who was obviously scared out of his skin.
['help', 'yelped', 'the', 'banana', 'who', 'was', 'obviously', 'scared', 'out', 'of', 'his', 'skin']



In [29]:

    
refAnswers_dict={refAnswer.attrib['id']:[t+'_RA' for t in to_tokens(refAnswer.text)] 
                 for refAnswer in r[1]}    
refAnswers_dict









    Out[29]:





{'answer204': ['terminal_RA',
  '1_RA',
  'and_RA',
  'the_RA',
  'positive_RA',
  'terminal_RA',
  'are_RA',
  'separated_RA',
  'by_RA',
  'the_RA',
  'gap_RA'],
 'answer205': ['terminal_RA',
  '1_RA',
  'and_RA',
  'the_RA',
  'positive_RA',
  'terminal_RA',
  'are_RA',
  'not_RA',
  'connected_RA'],
 'answer206': ['terminal_RA',
  '1_RA',
  'is_RA',
  'connected_RA',
  'to_RA',
  'the_RA',
  'negative_RA',
  'battery_RA',
  'terminal_RA'],
 'answer207': ['terminal_RA',
  '1_RA',
  'is_RA',
  'not_RA',
  'separated_RA',
  'from_RA',
  'the_RA',
  'negative_RA',
  'battery_RA',
  'terminal_RA'],
 'answer207.NEW': ['terminal_RA',
  '1_RA',
  'and_RA',
  'the_RA',
  'positive_RA',
  'battery_RA',
  'terminal_RA',
  'are_RA',
  'in_RA',
  'different_RA',
  'electrical_RA',
  'states_RA']}

Next, we need to extract each of the student responses. These are in the third daughter node:



In [30]:

    
print(r[2][0].text)
r[2][0].attrib









    



positive battery terminal is separated by a gap from terminal 1






    Out[30]:





{'accuracy': 'correct',
 'answerMatch': 'answer204',
 'count': '1',
 'id': 'FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY1.sbj3-l1.qa193'}



In [31]:

    
responses_ls=[]
for (i, studentResponse) in enumerate(r[2]):
    if 'answerMatch' in studentResponse.attrib:
        matchTokens_ls=refAnswers_dict[studentResponse.attrib['answerMatch']]
    else:
        matchTokens_ls=[]
    responses_ls.append({'accuracy':studentResponse.attrib['accuracy'],
                         'text':studentResponse.text,
                         'tokens':[t+'_SR' for t in to_tokens(studentResponse.text)] + matchTokens_ls})

responses_ls[36]









    Out[31]:





{'accuracy': 'correct',
 'text': 'the positive battery terminal and terminal 1 are not connected',
 'tokens': ['the_SR',
  'positive_SR',
  'battery_SR',
  'terminal_SR',
  'and_SR',
  'terminal_SR',
  '1_SR',
  'are_SR',
  'not_SR',
  'connected_SR',
  'terminal_RA',
  '1_RA',
  'and_RA',
  'the_RA',
  'positive_RA',
  'terminal_RA',
  'are_RA',
  'not_RA',
  'connected_RA']}

OK, that seems to work OK. Now, let's define a function that takes a filename, and returns the list of token dictionaries:



In [32]:

    
def extract_token_dictionaries(filenameIn):
    
    # Localise the to_tokens function
    def to_tokens_local(textIn):
        '''Convert the input textIn to a list of tokens'''
        tokens_ls=[t.lower().strip(punctuation) for t in textIn.split()]
        # remove any empty tokens
        return [t for t in tokens_ls if t]

    tree=ET.parse(filenameIn)
    root=tree.getroot()
    
    refAnswers_dict={refAnswer.attrib['id']:[t+'_RA' for t in to_tokens_local(refAnswer.text)]
                     for refAnswer in root[1]}

    responsesOut_ls=[]
    for (i, studentResponse) in enumerate(root[2]):
        if 'answerMatch' in studentResponse.attrib:
            matchTokens_ls=refAnswers_dict[studentResponse.attrib['answerMatch']]
        else:
            matchTokens_ls=[]
        responsesOut_ls.append({'accuracy':studentResponse.attrib['accuracy'],
                                'text':studentResponse.text,
                                'tokens':[t+'_SR' for t in to_tokens_local(studentResponse.text)] \
                                          + matchTokens_ls})
    return responsesOut_ls

We now have a function which takes a filename and returns a list of tokenised student responses and reference answers:



In [33]:

    
extract_token_dictionaries(filename)[:2]









    Out[33]:





[{'accuracy': 'correct',
  'text': 'positive battery terminal is separated by a gap from terminal 1',
  'tokens': ['positive_SR',
   'battery_SR',
   'terminal_SR',
   'is_SR',
   'separated_SR',
   'by_SR',
   'a_SR',
   'gap_SR',
   'from_SR',
   'terminal_SR',
   '1_SR',
   'terminal_RA',
   '1_RA',
   'and_RA',
   'the_RA',
   'positive_RA',
   'terminal_RA',
   'are_RA',
   'separated_RA',
   'by_RA',
   'the_RA',
   'gap_RA']},
 {'accuracy': 'correct',
  'text': 'terminal 1 is not connected to the positive terminal',
  'tokens': ['terminal_SR',
   '1_SR',
   'is_SR',
   'not_SR',
   'connected_SR',
   'to_SR',
   'the_SR',
   'positive_SR',
   'terminal_SR',
   'terminal_RA',
   '1_RA',
   'and_RA',
   'the_RA',
   'positive_RA',
   'terminal_RA',
   'are_RA',
   'not_RA',
   'connected_RA']}]

So next we need to be able to build a document frequency dictionary from a list of tokenised documents.



In [34]:

    
def document_frequencies(listOfTokenLists):
    # Build the dictionary of all tokens used:
    token_set=set()
    for tokenList in listOfTokenLists:
        token_set=token_set.union(set(tokenList))
        
    # Then return the document frequency counts for each token
    
    return {t:len([l for l in listOfTokenLists if t in l])
            for t in token_set}



In [35]:

    
tokenLists_ls=[x['tokens'] for x in extract_token_dictionaries(filename)]
document_frequencies(tokenLists_ls)









    Out[35]:





{'1.5_SR': 3,
 '1_RA': 55,
 '1_SR': 40,
 '2_SR': 1,
 'a_SR': 31,
 'and_RA': 48,
 'and_SR': 20,
 'answer_SR': 1,
 'any_SR': 1,
 'are_RA': 48,
 'are_SR': 12,
 'aren"t_SR': 1,
 'at_SR': 3,
 'batteries_SR': 1,
 'battery"s_SR': 1,
 'battery_RA': 7,
 'battery_SR': 39,
 'becaquse_SR': 1,
 'because_SR': 28,
 'becuase_SR': 1,
 'between_SR': 9,
 'both_SR': 2,
 'bulb_SR': 7,
 'by_RA': 26,
 'by_SR': 10,
 'c_SR': 1,
 'charge_SR': 2,
 'circuit_SR': 3,
 'closed_SR': 1,
 'closing_SR': 1,
 'components_SR': 1,
 'connected_RA': 29,
 'connected_SR': 50,
 'connection_SR': 5,
 'contact_SR': 1,
 'created_SR': 1,
 'damaged_SR': 3,
 'difference_SR': 1,
 'different_SR': 3,
 'dint_SR': 1,
 'direct_SR': 1,
 'do_SR': 1,
 'dont_SR': 1,
 'each_SR': 2,
 'electrical_SR': 3,
 'end_SR': 1,
 'from_SR': 6,
 'gap_RA': 26,
 'gap_SR': 27,
 'gaps_SR': 1,
 'get_SR': 1,
 'had_SR': 2,
 'has_SR': 1,
 'have_SR': 1,
 'he_SR': 2,
 'i_SR': 4,
 'in_SR': 4,
 'is_RA': 7,
 'is_SR': 54,
 'it_SR': 6,
 'its_SR': 2,
 'know_SR': 2,
 'making_SR': 1,
 'me_SR': 1,
 'negative_RA': 7,
 'negative_SR': 13,
 'no_SR': 9,
 'not_RA': 22,
 'not_SR': 26,
 'of_SR': 2,
 'on_SR': 2,
 'one_SR': 8,
 'other_SR': 2,
 'path_SR': 1,
 'positive_RA': 48,
 'positive_SR': 52,
 'posittive_SR': 1,
 'positve_SR': 1,
 'postive_SR': 1,
 'psoitive_SR': 1,
 'reading_SR': 1,
 'same_SR': 1,
 'separated_RA': 26,
 'separated_SR': 6,
 'separates_SR': 1,
 'separation_SR': 2,
 'separted_SR': 1,
 'seperated_SR': 7,
 'so_SR': 1,
 'state_SR': 1,
 'states_SR': 3,
 'tell_SR': 1,
 'termianl_SR': 1,
 'terminal_RA': 55,
 'terminal_SR': 68,
 'terminals_SR': 6,
 'the_RA': 55,
 'the_SR': 71,
 'thebulb_SR': 1,
 'their_SR': 1,
 'then_SR': 2,
 'there_SR': 20,
 'they_SR': 3,
 'to_RA': 7,
 'to_SR': 42,
 'tot_SR': 1,
 'two_SR': 2,
 'understand_SR': 1,
 'v_SR': 1,
 'voltage_SR': 3,
 'was_SR': 18,
 'with_SR': 3}

Next, define a function which takes a list of tokens and a document frequency dictionary, and returns a dictionary of the tf.idf values for each of the tokens in the list. Note: for this function, if a token isn't in the document frequency dictionary, then it won't be returned in the tf.idf dictionary.

We can use the collections.Counter object to get the tf values.



In [36]:

    
from collections import Counter



In [37]:

    
def get_tfidf(tokens_ls, docFreq_dict):
    tf_dict=Counter(tokens_ls)
    return {t:tf_dict[t]/docFreq_dict[t] for t in tf_dict if t in docFreq_dict}



In [38]:

    
get_tfidf('the cat sat on the mat'.split(), {'cat':2, 'the':1})









    Out[38]:





{'cat': 0.5, 'the': 2.0}

Finally, we want to convert the outputs for all of the responses into a dataframe.



In [39]:

    
# Extract the data from the file:
tokenDictionaries_ls=extract_token_dictionaries(filename)

# Build the lists of responses:
tokenLists_ls=[x['tokens'] for x in extract_token_dictionaries(filename)]

# Build the document frequency dict
docFreq_dict=document_frequencies(tokenLists_ls)

# Create the tf.idf for each response:
tfidf_ls=[get_tfidf(tokens_ls, docFreq_dict) for tokens_ls in tokenLists_ls]

# Now, create a dataframe which is indexed by the token dictionary:
trainingText_df=pd.DataFrame(index=docFreq_dict.keys())

# Use the index of responses in the list as column headers:
for (i, tokens_ls) in enumerate(tfidf_ls):
    trainingText_df[i]=pd.Series(tokens_ls, index=trainingText_df.index)

# Finally, transpose, and replace the NaNs with 0:
trainingText_df.fillna(0).T









    Out[39]:






  
    
      
      positive_RA
      same_SR
      1_SR
      the_RA
      is_SR
      batteries_SR
      negative_RA
      connected_SR
      positve_SR
      psoitive_SR
      ...
      v_SR
      have_SR
      path_SR
      end_SR
      dont_SR
      so_SR
      tell_SR
      are_RA
      terminal_RA
      created_SR
    
  
  
    
      0
      0.020833
      0.0
      0.025
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      1
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      2
      0.000000
      0.0
      0.025
      0.000000
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      3
      0.000000
      0.0
      0.025
      0.000000
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      4
      0.000000
      0.0
      0.000
      0.000000
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      5
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      6
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      7
      0.020833
      0.0
      0.025
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      8
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      9
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      10
      0.000000
      0.0
      0.000
      0.000000
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      11
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      1.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      12
      0.020833
      0.0
      0.000
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      13
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      14
      0.000000
      0.0
      0.025
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      15
      0.000000
      0.0
      0.025
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      16
      0.020833
      0.0
      0.025
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      17
      0.020833
      0.0
      0.025
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      18
      0.020833
      0.0
      0.025
      0.018182
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      19
      0.020833
      0.0
      0.000
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      20
      0.020833
      0.0
      0.000
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      21
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      22
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      23
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      24
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      25
      0.000000
      1.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      26
      0.000000
      0.0
      0.000
      0.018182
      0.018519
      0.0
      0.142857
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.036364
      0.0
    
    
      27
      0.000000
      0.0
      0.025
      0.000000
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      28
      0.000000
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.142857
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.036364
      0.0
    
    
      29
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      73
      0.020833
      0.0
      0.000
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      74
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      75
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      76
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      77
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      78
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      79
      0.020833
      0.0
      0.000
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      1.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      80
      0.020833
      0.0
      0.025
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      81
      0.020833
      0.0
      0.000
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      82
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      83
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      1.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      84
      0.000000
      0.0
      0.025
      0.000000
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      85
      0.000000
      0.0
      0.025
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      86
      0.000000
      0.0
      0.000
      0.000000
      0.018519
      1.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      87
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      88
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      89
      0.000000
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.142857
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.036364
      0.0
    
    
      90
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      91
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      92
      0.020833
      0.0
      0.000
      0.018182
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      93
      0.020833
      0.0
      0.000
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      94
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      95
      0.020833
      0.0
      0.025
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      96
      0.000000
      0.0
      0.025
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      97
      0.020833
      0.0
      0.000
      0.036364
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      1.0
      0.0
      0.020833
      0.036364
      1.0
    
    
      98
      0.000000
      0.0
      0.000
      0.000000
      0.000000
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      99
      0.020833
      0.0
      0.025
      0.036364
      0.018519
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      100
      0.020833
      0.0
      0.025
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
    
      101
      0.000000
      0.0
      0.025
      0.000000
      0.000000
      0.0
      0.000000
      0.00
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000000
      0.000000
      0.0
    
    
      102
      0.020833
      0.0
      0.000
      0.018182
      0.018519
      0.0
      0.000000
      0.02
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.020833
      0.036364
      0.0
    
  

103 rows × 112 columns

Cool, that seems to work. Now just need to do it for the complete set of files. Just use beetle/train/core for the time being.



In [40]:

    
!ls semeval2013-task7/semeval2013-Task7-5way/beetle/train/Core/









    



FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY1.xml
FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY2.xml
FaultFinding-BULB_C_VOLTAGE_EXPLAIN_WHY6.xml
FaultFinding-BULB_ONLY_EXPLAIN_WHY2.xml
FaultFinding-BULB_ONLY_EXPLAIN_WHY4.xml
FaultFinding-BULB_ONLY_EXPLAIN_WHY6.xml
FaultFinding-BURNED_BULB_LOCATE_EXPLAIN_Q.xml
FaultFinding-OTHER_TERMINAL_STATE_EXPLAIN_Q.xml
FaultFinding-TERMINAL_STATE_EXPLAIN_Q.xml
FaultFinding-VOLTAGE_AND_GAP_DISCUSS_Q.xml
FaultFinding-VOLTAGE_DEFINE_Q.xml
FaultFinding-VOLTAGE_DIFF_DISCUSS_1_Q.xml
FaultFinding-VOLTAGE_DIFF_DISCUSS_2_Q.xml
FaultFinding-VOLTAGE_GAP_EXPLAIN_WHY1.xml
FaultFinding-VOLTAGE_GAP_EXPLAIN_WHY3.xml
FaultFinding-VOLTAGE_GAP_EXPLAIN_WHY4.xml
FaultFinding-VOLTAGE_GAP_EXPLAIN_WHY5.xml
FaultFinding-VOLTAGE_GAP_EXPLAIN_WHY6.xml
FaultFinding-VOLTAGE_INCOMPLETE_CIRCUIT_2_Q.xml
SwitchesBulbsParallel-BURNED_BULB_PARALLEL_EXPLAIN_Q1.xml
SwitchesBulbsParallel-BURNED_BULB_PARALLEL_EXPLAIN_Q2.xml
SwitchesBulbsParallel-BURNED_BULB_PARALLEL_EXPLAIN_Q3.xml
SwitchesBulbsParallel-BURNED_BULB_PARALLEL_WHY_Q.xml
SwitchesBulbsParallel-GIVE_CIRCUIT_TYPE_HYBRID_EXPLAIN_Q2.xml
SwitchesBulbsParallel-GIVE_CIRCUIT_TYPE_HYBRID_EXPLAIN_Q3.xml
SwitchesBulbsParallel-GIVE_CIRCUIT_TYPE_PARALLEL_EXPLAIN_Q2.xml
SwitchesBulbsParallel-HYBRID_BURNED_OUT_EXPLAIN_Q1.xml
SwitchesBulbsParallel-HYBRID_BURNED_OUT_EXPLAIN_Q3.xml
SwitchesBulbsParallel-HYBRID_BURNED_OUT_WHY_Q2.xml
SwitchesBulbsParallel-HYBRID_BURNED_OUT_WHY_Q3.xml
SwitchesBulbsParallel-OPT1_EXPLAIN_Q2.xml
SwitchesBulbsParallel-OPT2_EXPLAIN_Q.xml
SwitchesBulbsParallel-PARALLEL_SWITCH_EXPLAIN_Q1.xml
SwitchesBulbsParallel-PARALLEL_SWITCH_EXPLAIN_Q2.xml
SwitchesBulbsParallel-PARALLEL_SWITCH_EXPLAIN_Q3.xml
SwitchesBulbsParallel-SWITCH_TABLE_EXPLAIN_Q1.xml
SwitchesBulbsParallel-SWITCH_TABLE_EXPLAIN_Q2.xml
SwitchesBulbsParallel-SWITCH_TABLE_EXPLAIN_Q3.xml
SwitchesBulbsSeries-CONDITIONS_FOR_BULB_TO_LIGHT.xml
SwitchesBulbsSeries-DAMAGED_BUILD_EXPLAIN_Q.xml
SwitchesBulbsSeries-DAMAGED_BULB_EXPLAIN_2_Q.xml
SwitchesBulbsSeries-GIVE_CIRCUIT_TYPE_SERIES_EXPLAIN_Q.xml
SwitchesBulbsSeries-SHORT_CIRCUIT_EXPLAIN_Q_2.xml
SwitchesBulbsSeries-SHORT_CIRCUIT_EXPLAIN_Q_4.xml
SwitchesBulbsSeries-SHORT_CIRCUIT_EXPLAIN_Q_5.xml
SwitchesBulbsSeries-SHORT_CIRCUIT_X_Q.xml
SwitchesBulbsSeries-SWITCH_OPEN_EXPLAIN_Q.xml

Use os.walk to get the files:



In [41]:

    
import os

We can now do the same as before, but this time using all the files to construct the final dataframe. We also need a series containing the accuracy measures.



In [42]:

    
tokenDictionaries_ls=[]

# glob would have been easier...
for (root, dirs, files) in os.walk('semeval2013-task7/semeval2013-Task7-5way/beetle/train/Core/'):
    for filename in files:
        if filename[-4:]=='.xml':
            tokenDictionaries_ls.extend(extract_token_dictionaries(os.path.join(root, filename)))

# Now we've extracted the information from all the files. We can now construct the dataframe
# in the same way as before:

# Build the lists of responses:
tokenLists_ls=[x['tokens'] for x in tokenDictionaries_ls]

# Build the document frequency dict
docFreq_dict=document_frequencies(tokenLists_ls)

# Now, create a dataframe which is indexed by the tokens
# in the token frequency dictionary:
trainingText_df=pd.DataFrame(index=docFreq_dict.keys())

# Populate the dataframe with the tf.idf for each response. Also,
# create a dictionary of the accuracy values while we're at it.
accuracy_dict={}
for (i, response_dict) in enumerate(tokenDictionaries_ls):
    trainingText_df[i]=pd.Series(get_tfidf(response_dict['tokens'], docFreq_dict), 
                                 index=trainingText_df.index)
    accuracy_dict[i]=response_dict['accuracy']

# Finally, transpose, and replace the NaNs with 0:
trainingText_df=trainingText_df.fillna(0).T

# Also, to make it easier to store in a single csv file, let's put the accuracy
# values in a column called "accuracy_txt":

trainingText_df['accuracy_txt']=pd.Series(accuracy_dict)

# And to have a final column containing a numerical equivalent of the
# accuracy_txt column (called accuracy_num ):

labels_dict={label:i for (i, label) in enumerate(set(trainingText_df['accuracy_txt']))}
trainingText_df['accuracy_num']=[labels_dict[l] for l in trainingText_df['accuracy_txt']]



In [43]:

    
trainingText_df.head()









    Out[43]:






  
    
      
      pr_SR
      anotehr_SR
      or_SR
      a_RA
      incomplete_RA
      is_SR
      wehere_SR
      postive_SR
      termials_SR
      is_RA
      ...
      seriously_SR
      cuts_SR
      allows_SR
      versa_SR
      seeing_SR
      copntained_SR
      though_SR
      opposing_SR
      accuracy_txt
      accuracy_num
    
  
  
    
      0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000648
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      correct
      3
    
    
      1
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000648
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      correct
      3
    
    
      2
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000648
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      contradictory
      2
    
    
      3
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000648
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      contradictory
      2
    
    
      4
      0.0
      0.0
      0.0
      0.0
      0.0
      0.000648
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      contradictory
      2
    
  

5 rows × 1118 columns

And finish by exporting to a csv file:



In [44]:

    
trainingText_df.to_csv('beetleTrainingData.csv', index=False)

Done! Now can import the data into a dataframe with:

pd.read_csv('beetleTrainingData.csv')

	positive_RA	same_SR	1_SR	the_RA	is_SR	batteries_SR	negative_RA	connected_SR	positve_SR	psoitive_SR	...	v_SR	have_SR	path_SR	end_SR	dont_SR	so_SR	tell_SR	are_RA	terminal_RA	created_SR
0	0.020833	0.0	0.025	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
1	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
2	0.000000	0.0	0.025	0.000000	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
3	0.000000	0.0	0.025	0.000000	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
4	0.000000	0.0	0.000	0.000000	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
5	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
6	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
7	0.020833	0.0	0.025	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
8	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
9	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
10	0.000000	0.0	0.000	0.000000	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
11	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
12	0.020833	0.0	0.000	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
13	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
14	0.000000	0.0	0.025	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
15	0.000000	0.0	0.025	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
16	0.020833	0.0	0.025	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
17	0.020833	0.0	0.025	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
18	0.020833	0.0	0.025	0.018182	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
19	0.020833	0.0	0.000	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
20	0.020833	0.0	0.000	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
21	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
22	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
23	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
24	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
25	0.000000	1.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
26	0.000000	0.0	0.000	0.018182	0.018519	0.0	0.142857	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.036364	0.0
27	0.000000	0.0	0.025	0.000000	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
28	0.000000	0.0	0.025	0.018182	0.018519	0.0	0.142857	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.036364	0.0
29	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
73	0.020833	0.0	0.000	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
74	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
75	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
76	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
77	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
78	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
79	0.020833	0.0	0.000	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
80	0.020833	0.0	0.025	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
81	0.020833	0.0	0.000	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
82	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
83	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.000000	0.000000	0.0
84	0.000000	0.0	0.025	0.000000	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
85	0.000000	0.0	0.025	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
86	0.000000	0.0	0.000	0.000000	0.018519	1.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
87	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
88	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
89	0.000000	0.0	0.025	0.018182	0.018519	0.0	0.142857	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.036364	0.0
90	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
91	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
92	0.020833	0.0	0.000	0.018182	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
93	0.020833	0.0	0.000	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
94	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
95	0.020833	0.0	0.025	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
96	0.000000	0.0	0.025	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
97	0.020833	0.0	0.000	0.036364	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.020833	0.036364	1.0
98	0.000000	0.0	0.000	0.000000	0.000000	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
99	0.020833	0.0	0.025	0.036364	0.018519	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
100	0.020833	0.0	0.025	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0
101	0.000000	0.0	0.025	0.000000	0.000000	0.0	0.000000	0.00	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.000000	0.0
102	0.020833	0.0	0.000	0.018182	0.018519	0.0	0.000000	0.02	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.020833	0.036364	0.0

	is_SR	...	accuracy_txt	accuracy_num
0	0.000648	...	correct	3
1	0.000648	...	correct	3
2	0.000648	...	contradictory	2
3	0.000648	...	contradictory	2
4	0.000648	...	contradictory	2