In [6]:
import pandas as pd

In [2]:
import pymongo
from pymongo import InsertOne, DeleteMany, ReplaceOne, UpdateOne
from pymongo.errors import BulkWriteError
client = pymongo.MongoClient('192.168.31.87:27017')
db = client.tweet

In [26]:
for i in db.current_event.find({}).limit(10):
    print i['event']['description']
    print pd.DataFrame.from_dict(i['ie']['openie'])
    print best_openie(i['ie']['openie'])


South Thailand insurgency:
Empty DataFrame
Columns: []
Index: []
{}
France plans to make vaccines mandatory for children in 2018. (Newsweek)
                                    object relation   subject
0                                 vaccines     make    France
1            vaccines for children in 2018     make    France
2                    vaccines for children     make    France
3          vaccines mandatory for children     make    France
4  vaccines mandatory for children in 2018     make    France
5                       vaccines mandatory     make    France
6                                     2018    is in  children
{'object': u'vaccines mandatory for children in 2018', 'relation': u'make', 'subject': u'France'}
United States President Barack Obama orders 1500 more troops into Iraq. (The New York Times)
          object         relation       subject
0  United States  is President of  Barack Obama
1  United States               is  Barack Obama
2           Iraq          is into  Barack Obama
{'object': u'United States', 'relation': u'is into', 'subject': u'Barack Obama'}
At least eight people have been killed in violence across Iraq. (AFP via Al-Aribya)
                                object relation subject
0              have killed in violence     have  people
1  have killed in violence across Iraq     have  people
2                          have killed     have  people
3              have killed across Iraq     have  people
{'object': u'have killed in violence across Iraq', 'relation': u'have', 'subject': u'people'}
A former aide of Chris Christie accuses the governor of having knowledge of the event. (USA Today)
Empty DataFrame
Columns: []
Index: []
{}
Voters in Costa Rica go to the polls for a general election. (TicoTimes)
                       object relation subject
0                  Costa Rica    is in  Voters
1          polls for election    go to  Voters
2                       polls    go to  Voters
3  polls for general election    go to  Voters
{'object': u'polls for general election', 'relation': u'go to', 'subject': u'Voters'}
The event is concluded. (One India)
      object relation subject
0  concluded       is   event
{'object': u'concluded', 'relation': u'is', 'subject': u'event'}
Algeria officially lifts its 19-year-old state of emergency. (CNN)
                               object          relation  subject
0              its state of emergency  officially lifts  Algeria
1                           its state  officially lifts  Algeria
2  its 19-year-old state of emergency  officially lifts  Algeria
3  its 19-year-old state of emergency             lifts  Algeria
4              its state of emergency             lifts  Algeria
5                           its state             lifts  Algeria
6               its 19-year-old state             lifts  Algeria
7               its 19-year-old state  officially lifts  Algeria
{'object': u'its 19-year-old state of emergency', 'relation': u'lifts', 'subject': u'Algeria'}
The death toll from the wildfire that hit the U.S. town of Gatlinburg, Tennessee, rises to 13. (AP)
  object  relation     subject
0     13  rises to  death toll
{'object': u'13', 'relation': u'rises to', 'subject': u'death toll'}
A series of attacks across Iraq kills 22 Shiite pilgrims. (The Daily Star)
               object relation  subject
0         22 pilgrims    kills  attacks
1  22 Shiite pilgrims    kills  attacks
{'object': u'22 Shiite pilgrims', 'relation': u'kills', 'subject': u'attacks'}

In [17]:
def get_most_common(items):
    count = Counter(items)
    return count.most_common()[0][0]

In [12]:
from collections import Counter

In [13]:
count = Counter(openie_df.subject)

In [16]:
count.most_common()


Out[16]:
[(u'attacks', 2)]

In [18]:
get_most_common(openie_df.subject)


Out[18]:
u'attacks'

In [22]:
def best_openie(openies):
    if len(openies) == 0:
        return {}
    openie_df = pd.DataFrame.from_dict(openies)
    object_lsit = openie_df['object'].tolist()
    object_,lenth  = object_lsit[0],len(object_lsit[0])
    for i in object_lsit[1:]:
        if len(i) > lenth:
            object_,lenth = i,len(i)
    return {'subject':get_most_common(openie_df.subject),
            'relation':get_most_common(openie_df.relation),
            'object':object_}

In [23]:
best_openie(i['ie']['openie'])


Out[23]:
{}

In [ ]: