Regex can also be used in other ways, such as partial matching.
In [4]:
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
phoneNumRegex.search('My number is 415-555-4242') # returns a match object
mo = phoneNumRegex.search('My number is 415-555-4242') # store match object
mo.group() # print matched strings in match object
Out[4]:
To get different sections of a string, we can use groups via parenthesis:
In [12]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') # The first () is group 1, the second () is group 2
phoneNumRegex.search('My number is 415-555-4242') # returns a match object with subgroups
mo = phoneNumRegex.search('My number is 415-555-4242') # store match object with subgroups
print('The area code is ' + mo.group(1)) # print out the subgroup matching the parameter
print('The rest of it is ' + mo.group(2)) # print out the other subgroup
If parenthesis are actually part of the pattern, they need to be escaped:
In [18]:
phoneNumRegex = re.compile(r'\(\d\d\d\)-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242') # returns no match
print(mo)
mo =phoneNumRegex.search('My number is (415)-555-4242') # returns a match object
mo.group()
Out[18]:
In [27]:
batRegex = re.compile(r'Bat(man|mobile|copter|cat)') # The pipe character seperates the suffixes, and allows any match.
mo = batRegex.search('Batmobile lost a wheel.')
print(mo.group()) # Print matching string
print(mo.group(1)) # Pass variable 1 into the group function to find which suffix group actually matched
#mo2 = batRegex.search('Batmotorcycle lost a wheel.')
#mo.group() # will return error because no match
()
..group()
or .group(0)
returns the full matching string, .group(1)
returns the actual matching group.\(
and \)
to escape parenthesis in a regex string, otherwise they will be treated as groups. |
pipe character can match any one of many possible groups.