The find.all()
method for regex objects finds all matching strings in a text.
In [3]:
import re
phoneRegex = re.compile(r'/d/d/d-/d/d/d-/d/d/d/d')
#phoneRegex.search() # finds first match
#phoneRegex.findall() # finds all matches
find.all()
returns a list of strings.
It behaves differently with groups.
In [4]:
import re
phoneRegex = re.compile(r'(/d/d/d)-(/d/d/d-/d/d/d/d)') # Two groups, so returns tuples
#phoneRegex.findall() # finds all matches in pairs; [('group1', 'group2'),...]
To get the total string, just wrap the total regex in its own group, so you get [(totalstring, group1, group2),...]
.
In [15]:
#digitRegex = re.compile(r'(1|2|3|4...|n)`) is equivalent to
#digitRegex = re.compile(r'\d\')
Other regex characters are:
\D
Any character that is NOT a numeric digit from 0 to 9.\w
Any letter, numeric digit, punctuation, or the underscore character (word characters.)\W
Any character that is NOT a letter, numeric digit, or the underscore character. \s
Any space, tab, or newline character (space characters.)
\S
Any character that is NOT a space character.
In [23]:
# Example using lyrics from The Twelve Days of Christmas
lyrics = '''
12 Drummers Drumming
11 Pipers Piping
10 Lords a Leaping
9 Ladies Dancing
8 Maids a Milking
7 Swans a Swimming
6 Geese a Laying
5 Golden Rings
4 Calling Birds
3 French Hens
2 Turtle Doves
and 1 Partridge in a Pear Tree
'''
xmasRegex = re.compile(r'\d+\s\w+') # 1 or more digits, space, 1 or more words
xmasRegex.findall(lyrics) # Returns all 'x gift', but stops at space because \w+ does not include spaces
Out[23]:
It is possible to create your own character classes, outside of these shorthand classes, using []
:
In [26]:
vowelRegex = re.compile(r'[aeiouAEIOU]') # RegEx for lowercase and uppercase vowels
alphabetRegex = re.compile(r'[a-zA-Z]') # RegEx for lowercase and uppercase alphabet using ranges
print(vowelRegex.findall('Robocop eats baby food.')) # Finds a list of all vowels in string
doublevowelRegex = re.compile(r'[aeiouAEIOU]{2}') # RegEx for two lowercase and uppercase vowels in a row; {2} repeats.
print(doublevowelRegex.findall('Robocop eats baby food.')) # Finds a list of all vowels in string
A useful feature of custom character classes are negative character classes:
In [30]:
consonantsRegex = re.compile(r'[^aeiouAEIOU]') # RegEx for finding all characters that are NOT vowels
print(consonantsRegex.findall('Robocop eats baby food.')) # Output will include spaces and words.
.findall()
method is passed a string, and returns a list of all matches in it, not just the first match..findall()
returns a list of strings..findall()
returns a list of tuples of strings.\d
is the shorthand character class that matches digits.\w
is the shorthand character class that matches words. \s
is the shorthand character class for whitespace.\D
is the shorthand character class that matches NOT digits.\W
is the shorthand character class that matches NOT words.\S
is the shorthand character class that matches NOT spaces.[aeiou]
^
caret symbol makes it a negative character class, matching anything NOT in the brackets: [^aeiou]