Lesson 28:

Regex `.sub()` Method and Verbose Mode

The re.compile() returns a regex objects, which takes the .group() method to find the first match in a string, and the .findall() method to find a list of all text matches in a string.

These are analogous to a typical find feature.

.sub() is therefore analagous to the replace feature.



In [3]:

    
import re

namesRegex = re.compile(r'Agent \w+') # Match Agent and 1 or more words

print(namesRegex.findall('Agent Alice gave the secret documents to Agent Bob.')) # List matches

print(namesRegex.sub('REDACTED', 'Agent Alice gave the secret documents to Agent Bob.')) # Replace every match









    



['Agent Alice', 'Agent Bob']
REDACTED gave the secret documents to REDACTED.

You can also partially replace a match using a group, using placeholders like \1.



In [9]:

    
import re

namesRegex = re.compile(r'Agent (\w)\w*') # Seperate the first letter into its own group, and match 0 or more words

print(namesRegex.findall('Agent Alice gave the secret documents to Agent Bob.')) # Will only return group 1 matches, not searching for entire strings

print(namesRegex.sub(r'Agent \1***', 'Agent Alice gave the secret documents to Agent Bob.')) # Replace matches with group 1 matches









    



['A', 'B']
Agent A*** gave the secret documents to Agent B***.

This is basically a find and replace feature with regex.

Regex objects also have a re.verbose argument, to allow multline line comments for complicated regex patterns, helping readabilitiy.



In [10]:

    
phoneRegex = re.compile(r'''
(\d\d\d-)|(\(d\d\d\) )   # area code (without parenthesis with dash, with parenthesis without dash )
-                        # first dash
\d\d\d                   # first 3 digits
-                        # second dash
\d\d\d\d                 # last 4 digits
\sx\d{2,4}               # Extension, like x1234, with at least 2 and at most 4 digits
'''
, re.VERBOSE) # Allows multiline regex strings that ignore newlines, allowing for new comments/documentation on every line.

The re.compile() function can only take one additional parameter, so if you wanted to use re.I to ignore cases, re.DOTALL to allow .* to see newlines, and re.VERBOSE to use multiline regex, you have to apply them with bitwise OR;|.



In [ ]:

    
phoneRegex = re.compile(r'''
(\d\d\d-)|(\(d\d\d\) )   # area code (without parenthesis with dash, with parenthesis without dash )
-                        # first dash
\d\d\d                   # first 3 digits
-                        # second dash
\d\d\d\d                 # last 4 digits
\sx\d{2,4}               # Extension, like x1234, with at least 2 and at most 4 digits
'''
, re.I | re.DOTALL | re.VERBOSE) # Activites ignorecase, dotall, and verbose arguments simultaneously.

This syntax is from old code, and does not typically apply for other functions, just re.compile().

Recap

The .sub regex method will substitute matches with some other text.
Using \1, \2, and so on will substitute group 1, 2, etc into the regex pattern.
Passing re.VERBOSE lets you add whitespace and comments to the regex string passed to re.compile() (even in raw strings.)
If you want to pass multiple arguments to re.compile(), like re.DOTALL, re.IGNORECASE, and re.VERBOSE) combine them with the | bitwise operator.

Lesson 28:

Regex .sub() Method and Verbose Mode

Recap

Regex `.sub()` Method and Verbose Mode