Lesson 24:

RegEx groups and the Pipe Character

Regex can also be used in other ways, such as partial matching.


In [4]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
phoneNumRegex.search('My number is 415-555-4242') # returns a match object
mo = phoneNumRegex.search('My number is 415-555-4242') # store match object
mo.group() # print matched strings in match object


Out[4]:
'415-555-4242'

To get different sections of a string, we can use groups via parenthesis:


In [12]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') # The first () is group 1, the second () is group 2
phoneNumRegex.search('My number is 415-555-4242') # returns a match object with subgroups
mo = phoneNumRegex.search('My number is 415-555-4242') # store match object with subgroups
print('The area code is ' + mo.group(1)) # print out the subgroup matching the parameter
print('The rest of it is ' + mo.group(2)) # print out the other subgroup


The area code is 415
The rest of it is 555-4242

If parenthesis are actually part of the pattern, they need to be escaped:


In [18]:
phoneNumRegex = re.compile(r'\(\d\d\d\)-\d\d\d-\d\d\d\d')

mo = phoneNumRegex.search('My number is 415-555-4242') # returns no match
print(mo)
mo =phoneNumRegex.search('My number is (415)-555-4242') # returns a match object
mo.group()


None
Out[18]:
'(415)-555-4242'

The '|' RegEx Operater

The pipe character, |, matches one of several patterns in a group.


In [27]:
batRegex = re.compile(r'Bat(man|mobile|copter|cat)') # The pipe character seperates the suffixes, and allows any match.

mo = batRegex.search('Batmobile lost a wheel.')
print(mo.group()) # Print matching string

print(mo.group(1)) # Pass variable 1 into the group function to find which suffix group actually matched

#mo2 = batRegex.search('Batmotorcycle lost a wheel.')
#mo.group() # will return error because no match


Batmobile
mobile

Recap

  • Groups are created in regex strings with parentheses ().
  • The first set of parenthesis is group 1, the second is 2, and so on.
  • Calling .group() or .group(0) returns the full matching string, .group(1) returns the actual matching group.
  • Use \( and \) to escape parenthesis in a regex string, otherwise they will be treated as groups.
  • The | pipe character can match any one of many possible groups.