Regular Expressions


In [22]:
import re
Match

This function attempts to match RE pattern to string with optional flags.


In [23]:
# re.match(pattern, string, flags=0)

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
    print "matchObj.group() : ", matchObj.group() #or matchObj.group(0)
    print "matchObj.group(1) : ", matchObj.group(1)
    print "matchObj.group(2) : ", matchObj.group(2)
else:
    print "No match!!"


matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

This function attempts to match RE pattern to string with optional flags.


In [24]:
# re.search(pattern, string, flags=0)

line = "Cats are smarter than dogs"

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
    print "searchObj.group() : ", searchObj.group() #or matchObj.group(0)
    print "searchObj.group(1) : ", searchObj.group(1)
    print "searchObj.group(2) : ", searchObj.group(2)
else:
    print "No match!!"


searchObj.group() :  Cats are smarter than dogs
searchObj.group(1) :  Cats
searchObj.group(2) :  smarter

Python offers two different primitive operations based on regular expressions: match checks for a
match only at the beginning of the string, while search checks for a match anywhere in the string.

Findall

Two pattern methods return all of the matches for a pattern. findall() returns a list of matching strings


In [25]:
# findall(pattern, string, flags=0)
line = "Cats are smarter than dogs"
re.findall( r'(.*) are (.*?) .*', line, re.M|re.I)


Out[25]:
[('Cats', 'smarter')]
Split

Split string by the occurrences of pattern. If capturing parentheses are used in pattern,
then the text of all groups in the pattern are also returned as part of the resulting list.


In [26]:
# re.split(pattern, string, maxsplit=0, flags=0)
print re.split('\W+', 'Words, words, words.')
print re.split('(\W+)', 'Words, words, words.')
print re.split('\W+', 'Words, words, words.', 1)
print re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)


['Words', 'words', 'words', '']
['Words', ', ', 'words', ', ', 'words', '.', '']
['Words', 'words, words.']
['0', '3', '9']
Sub

Returns the string obtained by replacing the leftmost non-overlapping occurrences of the RE in string by the replacement replacement. If the pattern isn’t found, string is returned unchanged.


In [27]:
# re.sub(pattern, repl, string, max=0)
re.sub('(blue|white|red)', 'colour', 'blue socks and red shoes')


Out[27]:
'colour socks and colour shoes'

In [28]:
re.subn('(blue|white|red)', 'colour', 'blue socks and red shoes')


Out[28]:
('colour socks and colour shoes', 2)
Compile

Compile a regular expression pattern, returning a pattern object


In [29]:
# compile(pattern, flags=0)
prog = re.compile('(blue|white|red)')
prog.sub('colour', 'blue socks and red shoes')


Out[29]:
'colour socks and colour shoes'

References