Title: Remove Punctuation
Slug: remove_punctuation
Summary: How to remove punctuation from unstructured text data for machine learning in Python.
Date: 2016-09-08 12:00
Category: Machine Learning
Tags: Preprocessing Text

Authors: Chris Albon

Preliminaries


In [1]:
# Load libraries
import string
import numpy as np

Create Text Data


In [2]:
# Create text
text_data = ['Hi!!!! I. Love. This. Song....', 
             '10000% Agree!!!! #LoveIT', 
             'Right?!?!']

Remove Punctuation


In [3]:
# Create function using string.punctuation to remove all punctuation
def remove_punctuation(sentence: str) -> str:
    return sentence.translate(str.maketrans('', '', string.punctuation))

# Apply function
[remove_punctuation(sentence) for sentence in text_data]


Out[3]:
['Hi I Love This Song', '10000 Agree LoveIT', 'Right']