Applying stanford CoreNLP sentiment to the speechacts

Can only do sentiment in batch processing, so first move all the speechacts to files:



In [88]:

    
import os
import pickle
import collections
import pandas as pd
import numpy as np

After this, I split the file into one-line parts using the following command:

$ split -l 1 -a 7 speechacts.txt

and removing the original speechacts.txt.

This ensures that each speechact is its' own file, and it receives its' own corenlp object analysis.

then, we compute the overall sentiment for each speechact by batch-parsing the files. parsed is a generator, so no computation will be done until we start iterating over it.

the split files are in alphabetical order, and batch_parse iterates through the files alphabetically, so iterating over the parsed files should retain the order from the dataframe.

this will take around 4 hours to compute. be sure to save intermediary results in a file. (108sec for 26 speechacts, ~26*100 total speechacts, 10800 seconds, ~3hrs)

(see batch-parse-sentiment.py)



In [3]:

    
sent = pickle.load(open('pickles/corenlp_sentiment/corenlp_sentiment_FINAL.p', 'rb'))



In [5]:

    
sent[:10]









    Out[5]:





[[('xaaaaaaaaaa', u'Negative', 1), ('xaaaaaaaaaa', u'Neutral', 2)],
 [('xaaaaaaamaa', u'Neutral', 2)],
 [('xaaaaaaamab', u'Negative', 1)],
 [('xaaaaaaamac', u'Neutral', 2)],
 [('xaaaaaaamad', u'Neutral', 2)],
 [('xaaaaaaamae', u'Neutral', 2), ('xaaaaaaamae', u'Negative', 1)],
 [('xaaaaaaamaf', u'Neutral', 2)],
 [('xaaaaaaamag', u'Negative', 1)],
 [('xaaaaaaamah', u'Negative', 1)],
 [('xaaaaaaamai', u'Neutral', 2)]]

re-format the list



In [22]:

    
s = []
for l in sent:
    if type(l) != list:
        s.append(l)
        continue
    fname = l[0][0]
    newl = []
    for tup in l:
        newl.append(tup[1:])
    s.append((fname, newl))

s[:10]









    Out[22]:





[('xaaaaaaaaaa', [(u'Negative', 1), (u'Neutral', 2)]),
 ('xaaaaaaamaa', [(u'Neutral', 2)]),
 ('xaaaaaaamab', [(u'Negative', 1)]),
 ('xaaaaaaamac', [(u'Neutral', 2)]),
 ('xaaaaaaamad', [(u'Neutral', 2)]),
 ('xaaaaaaamae', [(u'Neutral', 2), (u'Negative', 1)]),
 ('xaaaaaaamaf', [(u'Neutral', 2)]),
 ('xaaaaaaamag', [(u'Negative', 1)]),
 ('xaaaaaaamah', [(u'Negative', 1)]),
 ('xaaaaaaamai', [(u'Neutral', 2)])]

Make sure we only have unique filenames



In [15]:

    
s_no_dupes = [l for n,l in enumerate(s) if l not in s[:n] and l not in s[n+1:]]



In [17]:

    
print len(s_no_dupes)
print len(s)



In [18]:

    
a = [1,2,3,4, 4, 8, 9, 3, 1, 10]
print a[:2]
print a[2:]
a_no_dupes = [l for n,l in enumerate(a) if l not in a[:n] and l not in a[n+1:]]
print a_no_dupes









    



[1, 2]
[3, 4, 4, 8, 9, 3, 1, 10]
[2, 8, 9, 10]



In [43]:

    
filenames = [t[0] for t in s if type(t) == tuple]



In [44]:

    
len(list(set(filenames)))









    Out[44]:





4255

make a dict of filename -> sentiment



In [46]:

    
s = [item for item in s if type(item) == tuple]
sent_filename_dict = dict(s)

So, now we have all the filenames. Now what?

make a dataframe with the speechacts, try to correlate with the speechact data from previous analyses.

Joining with previous data

Get the speechacts for each filename



In [47]:

    
fpath = '/Users/dan/classes/research/huac-testimony/pickles/speechacts_old/'
speech_dict = {}
for filename in filenames:
    filepath = os.path.join(fpath, filename)
    speechact = ""
    with open(filepath, 'rb') as f:
        speechact = f.readline()
    
    speech_dict[speechact] = sent_filename_dict[filename]



In [48]:

    
speech_dict.items()[:5]









    Out[48]:





[(' Absolutely, sir. If someone had said to me, "Come on to a meeting of the Republican Party and meet a Democrat," I would have gone.\n',
  [(u'Negative', 1), (u'Negative', 1)]),
 (" That is the way it looked to me. \x0c526 COMMUNIrT! ACTIVITIES IN THlE LOS ANa1RLES'AREA \n",
  [(u'Neutral', 2), (u'Neutral', 2), (u'Negative', 1)]),
 (' Now they set minority against majority to create those things to enable them to move in, Isthat a correct statement I\n',
  [(u'Positive', 3)]),
 (' May I ask a question at this point. Mr. Blankfort, upon what do you base your statement that Communists were not permitted to associate with anti-Communists when there is ample testimony in the record before this committee that Communists were directed to maintain entirely cordial relations both in church, in 95829-52-pt. 7- \x0c2336 COMMUNISM IN HOLLYWOOD MOTION-PICTURE INDUSTRY lodges, in political registration with non-Communists for the purpose of influencing ?\n',
  [(u'Neutral', 2), (u'Neutral', 2), (u'Negative', 1)]),
 (' Well, I would like to add this: Since my disassociation from the Communist Party I feel much freer, as though a burden were taken off of my mind, because as I said, for some time the struggle had been going on within me, whether I was doing the right thing by still being attached to something that was so definitely opposed to American democratic tradition, and that having severed all connec- \x0c934 COMMUNIST ACTIVITIES IN THE LOS ANGELES AREA tions and bonds with the Communist Party, I can think and conduct myself, I believe, more in the American tradition. \n',
  [(u'Verynegative', 0)])]

make the new column in the dataframe

read the pickle



In [180]:

    
df = pd.read_pickle('pickles/final/final_analysis.p')

add the column for corenlp/sentiment data.



In [49]:

    
import Levenshtein
corenlp_sentiment = []
for n, row in df.iterrows():
    speechact = row['speechact']
    found_match = False
    for sa, sent in speech_dict.items():
        if Levenshtein.ratio(sa, speechact) > 0.85:
            corenlp_sentiment.append(sent)
            found_match = True
            break # don't want multiple. just take the first.
    if not found_match:
        corenlp_sentiment.append(np.nan)



In [50]:

    
corenlp_sentiment[:10]









    Out[50]:





[[(u'Negative', 1), (u'Neutral', 2)],
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan]



In [53]:

    
pickle.dump(corenlp_sentiment, open('pickles/final/corenlp_sentiment_on_speechacts_list.p', 'wb'))



In [156]:

    
sentiment = pickle.load(open('pickles/final/corenlp_sentiment_on_speechacts_list.p', 'rb'))



In [165]:

    
sentiment[:5]









    Out[165]:





[[(u'Negative', 1), (u'Neutral', 2)], nan, nan, nan, nan]

Encode the sentiment as a usable number

Neutral sentiment isn't interesting. We should remove neutral sentiment from the series.

So then the encoding looks like this:

p -> 0 - 5

n -> 0 - -5

vp -> 5 - 10

vn -> -5 - -10



In [171]:

    
def base_and_multiplier_from_category(category):
    categories = ["Verynegative", "Negative", "Positive", "Verypositive"]
    base = (categories.index(category) - 1)/2 * 5
    multiplier = categories.index(category) > 1
    if multiplier:
        return (base, 1)
    else:
        return (base, -1)

def category_and_score_to_number(t):
    "produces a number that represents the given intensity and score."
    category, score = t
    base, multiplier = base_and_multiplier_from_category(category)
    return base + multiplier*score



In [172]:

    
coded_sentiment = []

for s in sentiment:
    new_s = []
    if type(s) == float:
        coded_sentiment.append(np.nan)
        continue
    for pair in s:
        if pair[0] == "Neutral":
            new_s.append(np.nan)
        else:
            new_s.append(category_and_score_to_number(pair))
    coded_sentiment.append(new_s)



In [173]:

    
coded_sentiment[:5]









    Out[173]:





[[-1, nan], nan, nan, nan, nan]



In [174]:

    
pickle.dump(coded_sentiment, open('pickles/final/corenlp_sentiment_on_speechacts_list_coded.p', 'wb'))



In [181]:

    
sentiment = pickle.load(open('pickles/final/corenlp_sentiment_on_speechacts_list_coded.p', 'rb'))



In [182]:

    
df['corenlp_sentiment_by_sentence'] = sentiment

Now, we have the corenlp sentiment by-sentence for each speechact.

Time to create the graph.



In [183]:

    
df.head()









    Out[183]:






  
    
      
      is_interviewee
      is_response
      liwc_categories_by_sentence
      liwc_categories_for_speechact
      liwc_sentiment_by_sentence
      liwc_sentiment_for_speechact
      liwc_sentiment_towards_entities_with_anaphora
      liwc_sentiment_towards_entities_without_anaphora
      liwc_sentiment_towards_only_anaphora
      mention_list_by_sentence_with_anaphora
      mention_list_by_sentence_without_anaphora
      mention_list_for_speechact_without_anaphora
      speechact
      speaker
      mention_list_by_sentence_only_anaphora
      corenlp_sentiment_by_sentence
    
  
  
    
      0
       False
       False
       [{}, {u'School': 0.111111111111, u'Pronoun': 0...
       {u'School': 0.0555555555556, u'Pronoun': 0.055...
                                           [0.0, 0.0]
       0.000000
       NaN
                   {}
       NaN
       NaN
                                   [[], []]
           []
       'I'AvENNEit. Have both counsel identified them...
         macia
       NaN
       [-1, nan]
    
    
      1
       False
       False
       [{}, {u'Time': 0.25, u'Incl': 0.25, u'Space': ...
       {u'Space': 0.125, u'Incl': 0.125, u'Time': 0.125}
                                           [0.0, 0.0]
       0.000000
       NaN
                   {}
       NaN
       NaN
                                   [[], []]
           []
                            AI'AENNEI1. When and where- 
         macia
       NaN
             NaN
    
    
      2
       False
       False
       [{}, {u'Negate': 0.5}, {u'Pronoun': 0.125, u'F...
       {u'Pronoun': 0.0416666666667, u'Future': 0.041...
                                     [0.0, 0.0, -0.2]
      -0.066667
       NaN
       {u'JACK': 0.0}
       NaN
       NaN
                           [[JACK], [], []]
       [JACK]
                  ,JACK ON. No. I am sorry, you cannot. 
         macia
       NaN
             NaN
    
    
      3
       False
       False
       [{u'Cogmech': 0.0588235294118, u'Posemo': 0.05...
       {u'Cogmech': 0.0444147355912, u'Tentat': 0.020...
       [-0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
      -0.022222
       NaN
                   {}
       NaN
       NaN
       [[], [], [], [], [], [], [], [], []]
           []
        The request, in line with the rules of the co...
       jackson
       NaN
             NaN
    
    
      4
       False
       False
       [{}, {u'Eating': 0.0833333333333, u'Pronoun': ...
       {u'Eating': 0.0416666666667, u'Pronoun': 0.083...
                                           [0.0, 0.0]
       0.000000
       NaN
                   {}
       NaN
       NaN
                                   [[], []]
           []
       ''M NER. illt volt (Otill emtoyvl its italelo ...
         macia
       NaN
             NaN



In [184]:

    
df.to_pickle('pickles/final/with_corenlp_sentiment_df.p')

Graphs

Constructing the graph with category scores

We want to construct a graph that contains every edge corenlp picked up as having sentiment. Each edge should have the corenlp sentiment measure as well as the liwc sentiment measure, and the liwc pos/neg categories.

Then, we can compare the two in a graph.

name disambiguation



In [185]:

    
disambiguated_names = pickle.load(open('pickles/final/disambiguated_names.p', 'rb'))

def get_key(mention, l):
    "returns the numerical key for the given mention"
    mention = mention.lower()
    for chunk in l:
        if mention in chunk:
            return l.index(chunk)

some utilities



In [ ]:

    
import difflib
transcript_dir = os.path.join("testimony/text/hearings")

interviewee_names = [f.replace(".txt", "") for f in os.listdir(transcript_dir)]
interviewee_names = map(lambda s: s.replace("-", " "), interviewee_names)

def is_interviewer(name):
    return not difflib.get_close_matches(name, interviewee_names)


graph_data = sentiment_graph_data = collections.defaultdict(lambda : collections.defaultdict(int))



In [192]:

    
import difflib
transcript_dir = os.path.join("testimony/text/hearings")

interviewee_names = [f.replace(".txt", "") for f in os.listdir(transcript_dir)]
interviewee_names = map(lambda s: s.replace("-", " "), interviewee_names)

relevant_categories = ['Posemo', 'Negemo', 'Anger', 'Posfeel']

def is_interviewer(name):
    return not difflib.get_close_matches(name, interviewee_names)

# the graph data has to be stored in a separate dict until we
# construct the graph; adding multiple edges between two nodes just
# replaces the attributes. We want to average/accumulate them.

nn_categories_by_sentence = []

nn_graph_data = collections.defaultdict(lambda : collections.defaultdict(dict))
sentiment_graph_data = collections.defaultdict(lambda : collections.defaultdict(int))
liwc_sentiment_graph_data = collections.defaultdict(lambda : collections.defaultdict(int))
count_data = collections.defaultdict(int)

skipped = 0
for n, row in df.iterrows():
    if n % 500 == 0:
        print n, "rows analyzed"

    speaker = row['speaker']
    if is_interviewer(speaker):
        skipped += 1
        continue
        
    # there's anaphora
    mention_list_w_anaphora = row['mention_list_by_sentence_with_anaphora']
    mention_list_wo_anaphora = row['mention_list_by_sentence_without_anaphora']
    corenlp_sentiment_by_sentence = row['corenlp_sentiment_by_sentence']
    liwc_sentiment_by_sentence = row['liwc_sentiment_by_sentence']
    
    if type(corenlp_sentiment_by_sentence) == float:
        continue
        skipped += 1
        
        

    speaker = get_key(speaker, disambiguated_names)

    if type(mention_list_w_anaphora) == list:
       sentiment_towards_mentions = {}
       for n, mentions in enumerate(mention_list_w_anaphora):
           for mention in mentions:
               mention = get_key(mention, disambiguated_names)                
               if speaker == mention or not mention or type(corenlp_sentiment_by_sentence[n]) == float:
                   skipped += 1
                   continue
               sentiment_graph_data[speaker][mention] += corenlp_sentiment_by_sentence[n]
               liwc_sentiment_graph_data[speaker][mention] += liwc_sentiment_by_sentence[n]
               count_data[(speaker, mention)] += 1
        
    elif type(mention_list_wo_anaphora) == list:
        categories_towards_mentions = {}
        for n, mentions in enumerate(mention_list_wo_anaphora):
            for mention in mentions:
                mention = get_key(mention, disambiguated_names)
                if speaker == mention or not mention or type(corenlp_sentiment_by_sentence[n]) == float:
                    skipped += 1
                    continue
                sentiment_graph_data[speaker][mention] += corenlp_sentiment_by_sentence[n]
                liwc_sentiment_graph_data[speaker][mention] += liwc_sentiment_by_sentence[n]
                count_data[(speaker, mention)] += 1
                
print "skipped", skipped
print "sentiment_graph_data and liwc_sentiment_graph_data are now populated."









    



0 rows analyzed
500 rows analyzed
1000 rows analyzed
1500 rows analyzed
2000 rows analyzed
2500 rows analyzed
3000 rows analyzed
3500 rows analyzed
4000 rows analyzed
4500 rows analyzed
5000 rows analyzed
5500 rows analyzed
6000 rows analyzed
6500 rows analyzed
7000 rows analyzed
7500 rows analyzed
8000 rows analyzed
8500 rows analyzed
9000 rows analyzed
9500 rows analyzed
10000 rows analyzed
10500 rows analyzed
11000 rows analyzed
11500 rows analyzed
12000 rows analyzed
12500 rows analyzed
skipped 7081
sentiment_graph_data and liwc_sentiment_graph_data are now populated.



In [206]:

    
sentiment_graph_data.items()[:5]









    Out[206]:





[(135,
  defaultdict(<type 'int'>, {416: -1, 1221: -1, 643: -1, 836: -4, 453: -2, 6: -2, 263: -5, 777: -1, 1227: -2, 143: -2, 1170: 3, 451: -3, 948: -1, 1013: -1, 1079: -1, 1242: 3, 986: -1, 191: -1})),
 (136,
  defaultdict(<type 'int'>, {515: -2, 534: -2, 583: 0, 489: -2, 394: -5, 940: 3, 1358: -1, 381: -4, 1328: -10, 721: -1, 946: -1, 179: 3, 1300: -2, 1046: -1, 271: -1, 925: -1})),
 (144,
  defaultdict(<type 'int'>, {386: -1, 1315: -1, 700: -1, 42: -2, 555: -1, 514: -1, 142: -1, 52: -3, 756: -1, 1238: -1, 1332: -1, 508: -2, 20: -1, 191: -6})),
 (148,
  defaultdict(<type 'int'>, {226: -1, 1143: -1, 453: -1, 1351: -5, 167: -1, 233: -1, 10: -1, 395: -2, 492: -1, 898: -5, 686: -5, 624: -1, 594: -5, 404: -1, 297: -1, 698: -5, 767: -2, 1322: -1, 191: -5})),
 (670,
  defaultdict(<type 'int'>, {1024: -1, 262: -5, 264: -7, 907: -1, 531: -2, 879: -5, 412: -15, 30: -1, 1311: -5, 112: -3, 1186: -1, 806: -1, 1063: -1, 938: -4, 813: -1, 814: -2, 944: -2, 946: -2, 691: 6, 1081: -2, 58: -2, 415: -9, 1084: -3, 1342: -2, 191: -7, 842: -15, 1356: -1, 418: -3, 975: -3, 851: -1, 859: -1, 351: -9, 491: -6, 367: -5, 240: -5, 885: -1}))]



In [212]:

    
pos_stanford = 0
neg_stanford = 0
pos_liwc = 0
neg_liwc = 0
import networkx as nx
G = nx.DiGraph()
for source, targets in sentiment_graph_data.items():
    for accused, sent in targets.items():
        attrs = {'stanford_sent': sent, 'liwc_sent': liwc_sentiment_graph_data[source][accused]}
        if sent > 0:
            pos_stanford += 1
        elif sent < 0:
            neg_stanford += 1
        
        if liwc_sentiment_graph_data[source][accused] > 0:
            pos_liwc += 1
        elif liwc_sentiment_graph_data[source][accused] < 0:
            neg_liwc += 1
            
        G.add_edge(source, accused, attrs)
        
        
for speaker, mentions in anaphora_graph_data.items():
    if n % 10 == 0:
        print "analyzing", n, "mentions."
    for mentioned, attrs in mentions.items():
        count = anaphora_count_data[(speaker, mentioned)]
        normalized_attrs = normalize_dict(attrs, count)
        filtered = filter_categories(normalized_attrs, relevant_categories)
        n += 1
        try:
            dominant = max(filtered.items(), key=lambda p:p[1])
            dominant = {dominant[0] : dominant[1]}
            G_only_anaphora_with_dominant_categories.add_edge(speaker, mentioned, dominant)
        except ValueError:
            G_only_anaphora_with_dominant_categories.add_edge(speaker, mentioned)









    



  File "<ipython-input-212-189a35e0b1b9>", line 24
    if n % 10 == 0:
    ^
IndentationError: expected an indented block



In [211]:

    
nx.write_gml(G, 'graphs/final/corenlp_vs_liwc_sentiment.gml')



In [209]:

    
G.node









    Out[209]:





{3: {},
 6: {},
 7: {},
 10: {},
 20: {},
 26: {},
 29: {},
 30: {},
 38: {},
 39: {},
 40: {},
 42: {},
 47: {},
 52: {},
 58: {},
 66: {},
 67: {},
 69: {},
 71: {},
 74: {},
 84: {},
 99: {},
 102: {},
 104: {},
 107: {},
 109: {},
 110: {},
 112: {},
 118: {},
 120: {},
 132: {},
 135: {},
 136: {},
 142: {},
 143: {},
 144: {},
 148: {},
 157: {},
 158: {},
 162: {},
 167: {},
 169: {},
 178: {},
 179: {},
 182: {},
 184: {},
 185: {},
 191: {},
 193: {},
 195: {},
 198: {},
 199: {},
 204: {},
 206: {},
 208: {},
 210: {},
 220: {},
 226: {},
 233: {},
 236: {},
 240: {},
 246: {},
 247: {},
 258: {},
 262: {},
 263: {},
 264: {},
 271: {},
 272: {},
 277: {},
 279: {},
 289: {},
 297: {},
 304: {},
 312: {},
 323: {},
 326: {},
 329: {},
 334: {},
 342: {},
 348: {},
 351: {},
 352: {},
 354: {},
 364: {},
 367: {},
 368: {},
 379: {},
 381: {},
 385: {},
 386: {},
 387: {},
 390: {},
 391: {},
 394: {},
 395: {},
 397: {},
 400: {},
 401: {},
 404: {},
 408: {},
 412: {},
 414: {},
 415: {},
 416: {},
 418: {},
 419: {},
 423: {},
 428: {},
 439: {},
 440: {},
 445: {},
 446: {},
 449: {},
 451: {},
 453: {},
 458: {},
 460: {},
 463: {},
 470: {},
 482: {},
 483: {},
 486: {},
 489: {},
 491: {},
 492: {},
 496: {},
 497: {},
 503: {},
 504: {},
 508: {},
 510: {},
 512: {},
 514: {},
 515: {},
 522: {},
 525: {},
 529: {},
 531: {},
 534: {},
 535: {},
 548: {},
 552: {},
 553: {},
 554: {},
 555: {},
 558: {},
 559: {},
 569: {},
 573: {},
 577: {},
 583: {},
 591: {},
 592: {},
 594: {},
 603: {},
 604: {},
 606: {},
 607: {},
 613: {},
 616: {},
 617: {},
 618: {},
 624: {},
 629: {},
 633: {},
 643: {},
 653: {},
 658: {},
 660: {},
 669: {},
 670: {},
 674: {},
 678: {},
 684: {},
 686: {},
 690: {},
 691: {},
 698: {},
 700: {},
 706: {},
 707: {},
 708: {},
 709: {},
 710: {},
 712: {},
 721: {},
 728: {},
 738: {},
 739: {},
 744: {},
 755: {},
 756: {},
 767: {},
 776: {},
 777: {},
 787: {},
 789: {},
 791: {},
 804: {},
 805: {},
 806: {},
 811: {},
 813: {},
 814: {},
 834: {},
 836: {},
 842: {},
 844: {},
 851: {},
 859: {},
 860: {},
 868: {},
 879: {},
 885: {},
 891: {},
 892: {},
 894: {},
 898: {},
 904: {},
 905: {},
 907: {},
 921: {},
 922: {},
 925: {},
 929: {},
 938: {},
 940: {},
 944: {},
 946: {},
 948: {},
 952: {},
 959: {},
 964: {},
 969: {},
 971: {},
 972: {},
 975: {},
 977: {},
 978: {},
 981: {},
 986: {},
 989: {},
 990: {},
 1000: {},
 1001: {},
 1006: {},
 1013: {},
 1017: {},
 1023: {},
 1024: {},
 1032: {},
 1039: {},
 1040: {},
 1046: {},
 1050: {},
 1063: {},
 1067: {},
 1072: {},
 1077: {},
 1079: {},
 1081: {},
 1084: {},
 1093: {},
 1106: {},
 1107: {},
 1108: {},
 1112: {},
 1114: {},
 1117: {},
 1125: {},
 1131: {},
 1132: {},
 1134: {},
 1137: {},
 1143: {},
 1146: {},
 1148: {},
 1156: {},
 1160: {},
 1169: {},
 1170: {},
 1171: {},
 1173: {},
 1186: {},
 1189: {},
 1195: {},
 1196: {},
 1198: {},
 1212: {},
 1219: {},
 1221: {},
 1226: {},
 1227: {},
 1235: {},
 1236: {},
 1238: {},
 1240: {},
 1242: {},
 1245: {},
 1247: {},
 1256: {},
 1292: {},
 1299: {},
 1300: {},
 1301: {},
 1304: {},
 1306: {},
 1311: {},
 1315: {},
 1322: {},
 1328: {},
 1329: {},
 1332: {},
 1342: {},
 1351: {},
 1356: {},
 1358: {},
 1366: {},
 1367: {},
 1368: {},
 1371: {},
 1372: {}}



In [204]:

    
count_data[(229, 287)]









    Out[204]:





0



In [ ]:

	is_interviewee	is_response	liwc_categories_by_sentence	liwc_categories_for_speechact	liwc_sentiment_by_sentence	liwc_sentiment_for_speechact	liwc_sentiment_towards_entities_with_anaphora	liwc_sentiment_towards_entities_without_anaphora	liwc_sentiment_towards_only_anaphora	mention_list_by_sentence_with_anaphora	mention_list_by_sentence_without_anaphora	mention_list_for_speechact_without_anaphora	speechact	speaker	mention_list_by_sentence_only_anaphora	corenlp_sentiment_by_sentence
0	False	False	[{}, {u'School': 0.111111111111, u'Pronoun': 0...	{u'School': 0.0555555555556, u'Pronoun': 0.055...	[0.0, 0.0]	0.000000	NaN	{}	NaN	NaN	[[], []]	[]	'I'AvENNEit. Have both counsel identified them...	macia	NaN	[-1, nan]
1	False	False	[{}, {u'Time': 0.25, u'Incl': 0.25, u'Space': ...	{u'Space': 0.125, u'Incl': 0.125, u'Time': 0.125}	[0.0, 0.0]	0.000000	NaN	{}	NaN	NaN	[[], []]	[]	AI'AENNEI1. When and where-	macia	NaN	NaN
2	False	False	[{}, {u'Negate': 0.5}, {u'Pronoun': 0.125, u'F...	{u'Pronoun': 0.0416666666667, u'Future': 0.041...	[0.0, 0.0, -0.2]	-0.066667	NaN	{u'JACK': 0.0}	NaN	NaN	[[JACK], [], []]	[JACK]	,JACK ON. No. I am sorry, you cannot.	macia	NaN	NaN
3	False	False	[{u'Cogmech': 0.0588235294118, u'Posemo': 0.05...	{u'Cogmech': 0.0444147355912, u'Tentat': 0.020...	[-0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]	-0.022222	NaN	{}	NaN	NaN	[[], [], [], [], [], [], [], [], []]	[]	The request, in line with the rules of the co...	jackson	NaN	NaN
4	False	False	[{}, {u'Eating': 0.0833333333333, u'Pronoun': ...	{u'Eating': 0.0416666666667, u'Pronoun': 0.083...	[0.0, 0.0]	0.000000	NaN	{}	NaN	NaN	[[], []]	[]	''M NER. illt volt (Otill emtoyvl its italelo ...	macia	NaN	NaN