Title: Tokenize Text
Slug: tokenize_text
Summary: How to tokenize text from unstructured text data for machine learning in Python.
Date: 2016-09-08 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [1]:
# Load library
from nltk.tokenize import word_tokenize, sent_tokenize
In [9]:
# Create text
string = "The science of today is the technology of tomorrow. Tomorrow is today."
In [10]:
# Tokenize words
word_tokenize(string)
Out[10]:
In [11]:
# Tokenize sentences
sent_tokenize(string)
Out[11]: