Title: Replace Characters
Slug: replace_characters
Summary: How to remove characters to clean unstructured text data for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Text
Authors: Chris Albon
In [1]:
    
# Import library
import re
    
In [2]:
    
# Create text
text_data = ['Interrobang. By Aishwarya Henriette',
             'Parking And Going. By Karl Gautier',
             'Today Is The night. By Jarek Prakash']
    
In [3]:
    
# Remove periods
remove_periods = [string.replace('.', '') for string in text_data]
# Show text
remove_periods
    
    Out[3]:
In [4]:
    
# Create function
def replace_letters_with_X(string: str) -> str:
    return re.sub(r'[a-zA-Z]', 'X', string)
# Apply function
[replace_letters_with_X(string) for string in remove_periods]
    
    Out[4]: