('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')
and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective.
('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'), ('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')
refuse and permit both appear as a present tense verb (VBP) and a noun (NN). E.g. refUSE is a verb meaning "deny," while REFuse is a noun meaning "trash"
In [1]:
from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import NB, count
import sys
import time
twitter, classifier = Twitter(language="en"), NB(baseline="UNDEFINED")
def train_model(n_pts, search_terms, category, category_count):
print("Training " + str(n_pts*100) + " data points for " + str(category))
for i in range(1, n_pts):
for tweet in twitter.search(search_terms, start=i, count=100):
s = tweet.text.lower()
p = category
category_count+=1
v = tag(s)
v = [word for word, pos in v if (pos == "NN" or pos == "VB")]
v = count(v) # {'sweet': 1}
if v:
classifier.train(v, type=p)
sys.stdout.write('\r')
sys.stdout.write(str(int(category_count/(n_pts*100)*100)) + "% : " + str(v))
sys.stdout.write('\r')
print("Finished!")
print("Number of data points: " + str(category_count) + "\n")
In [2]:
n = 30
happy = 0
sad = 0
train_model(n, "shiok OR swee OR perfect OR #happy", "HAPPY", happy)
time.sleep(1)
train_model(n, "sian OR shag OR suay OR fml OR #sad", "SAD", sad)
In [3]:
def evaluate(word):
category = classifier.classify(word)
return ("The word " + str(word) + " is " + str(category))
words = ("pangseh","nasi lemak","food","breakfast","lunch","dinner","MRT","school","trip","work","home","family","garden","play","train","bus","KFC","SAF","book out","camp","army","navy","air force",)
for word in words:
print(evaluate(word))
In [ ]: