Title: Replace Characters
Slug: replace_characters
Summary: How to remove characters to clean unstructured text data for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [1]:
# Import library
import re
In [2]:
# Create text
text_data = ['Interrobang. By Aishwarya Henriette',
'Parking And Going. By Karl Gautier',
'Today Is The night. By Jarek Prakash']
In [3]:
# Remove periods
remove_periods = [string.replace('.', '') for string in text_data]
# Show text
remove_periods
Out[3]:
In [4]:
# Create function
def replace_letters_with_X(string: str) -> str:
return re.sub(r'[a-zA-Z]', 'X', string)
# Apply function
[replace_letters_with_X(string) for string in remove_periods]
Out[4]: