Title: Tag Parts Of Speech
Slug: tag_parts_of_speech
Summary: How to tag parts of speech in unstructured text data for machine learning in Python.
Date: 2016-09-09 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [1]:
# Load libraries
from nltk import pos_tag
from nltk import word_tokenize
In [2]:
# Create text
text_data = "Chris loved outdoor running"
In [3]:
# Use pre-trained part of speech tagger
text_tagged = pos_tag(word_tokenize(text_data))
# Show parts of speech
text_tagged
Out[3]:
The output is a list of tuples with the word and the tag of the part of speech. NLTK uses the Penn Treebank parts for speech tags.
Tag | Part Of Speech |
---|---|
NNP | Proper noun, singular |
NN | Noun, singular or mass |
RB | Adverb |
VBD | Verb, past tense |
VBG | Verb, gerund or present participle |
JJ | Adjective |
PRP | Personal pronoun |