问题

什么是词汇分类，如何使用他们？
python什么数据结构适合存储词和词的类别？
如何自动标记词性？



In [9]:

    
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag, map_tag

tag的含义

CC :coordinating conjunction 并列连词
RB: adverbs 副词
IN: preposition 介词
NN ： noun 名词
JJ: adjective 形容词



In [6]:

    
text = word_tokenize("And now for something completely different")
print(text)
print(type(text))
nltk.pos_tag(text)









    



['And', 'now', 'for', 'something', 'completely', 'different']
<class 'list'>






    Out[6]:





[('And', 'CC'),
 ('now', 'RB'),
 ('for', 'IN'),
 ('something', 'NN'),
 ('completely', 'RB'),
 ('different', 'JJ')]



In [14]:

    
pos_tag(word_tokenize("John's big idea is not all that bad."), tagset='universal')









    Out[14]:





[('John', 'NOUN'),
 ("'s", 'PRT'),
 ('big', 'ADJ'),
 ('idea', 'NOUN'),
 ('is', 'VERB'),
 ('not', 'ADV'),
 ('all', 'DET'),
 ('that', 'ADP'),
 ('bad', 'ADJ'),
 ('.', '.')]



In [ ]: