Title: Remove Stop Words
Slug: remove_stop_words
Summary: How to remove stop words from unstructured text data for machine learning in Python.
Date: 2016-09-09 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [1]:
# Load library
from nltk.corpus import stopwords
# You will have to download the set of stop words the first time
import nltk
nltk.download('stopwords')
Out[1]:
In [2]:
# Create word tokens
tokenized_words = ['i', 'am', 'going', 'to', 'go', 'to', 'the', 'store', 'and', 'park']
In [3]:
# Load stop words
stop_words = stopwords.words('english')
# Show stop words
stop_words[:5]
Out[3]:
In [4]:
# Remove stop words
[word for word in tokenized_words if word not in stop_words]
Out[4]: