Besides just turning a class negative, the ^
character can also define the start of a string.
The $
character can be used in combination to define the end of a string.
In [11]:
import re
beginsWithTheHelloRegex = re.compile(r'^Hello') # String must start exactly with 'Hello'
print(beginsWithTheHelloRegex.findall('Hello there'))
print(beginsWithTheHelloRegex.findall('Wait, did he say Hello just now?'))
print(beginsWithTheHelloRegex.findall('He said Hello'))
endsWithTheHelloRegex = re.compile(r'Hello$') # String must end exactly with 'Hello'
print(endsWithTheHelloRegex.findall('Hello there'))
print(beginsWithTheHelloRegex.findall('Wait, did he say Hello just now?'))
print(endsWithTheHelloRegex.findall('He said Hello'))
They can be used in combination:
In [35]:
allDigitsRegex = re.compile(r'^\d+$') # Must start and end with a digit, with at least 1 or more digits inbetween
print(allDigitsRegex.findall('2153234623462561514')) # Matches entire string
print(allDigitsRegex.findall('21532346234letters!62561514')) # No match, doesn't end with string
Out[35]:
The .
character matches any character.
In [18]:
atRegex = re.compile(r'.at') # Any single character followed by at
print(atRegex.findall('The cat in the hat sat on the flat mat.')) # matches anything ending with at
atRegex = re.compile(r'.{2}at') # Any two characters followed by at
print(atRegex.findall('The cat in the hat sat on the flat mat.')) # matches anything ending with at, including spaces
The .*
is therefore used to match anything, any number of any character:
In [23]:
name = 'First Name: Al, Last Name: Sweigart' # To pull names from this string would require a lot of indexing code
name2 = 'First Name: Vivek, Last Name: Menon' # To pull names from this string would require a lot of indexing code
nameRegex = re.compile(r'First Name: (.*), Last Name: (.*)') # Matches anything in this groups formatted exactly like this
print(nameRegex.findall(name))
print(nameRegex.findall(name2))
.*
is greedy by default, but you can activate non-greedy mode with .*?
In [26]:
serve = '<To serve humans> for dinner.>'
greedyRegex = re.compile(r'<(.*)>') # Looking for any length match, between brackets.
nongreedyRegex = re.compile(r'<(.*?)>') # Looking for any length match, between brackets.
print(greedyRegex.findall(serve)) # Matches the longest string
print(nongreedyRegex.findall(serve)) # Matches the shortest string
.*
matches any character except the newline (\n
) character.
In [30]:
primeDirectives = 'Serve the public trust.\nProtect the innocent.\nUphold the law.'
print(primeDirectives)
dotStar = re.compile(r'.*')
print(dotStar.findall(primeDirectives))
We can use the paramater re.DOTALL
can set to truly match any character:
In [34]:
dotStar = re.compile(r'.*', re.DOTALL)
print(dotStar.findall(primeDirectives))
Out[34]:
Similiarily, re.IGNORECASE
or re.I
to ignore case:
In [38]:
vowelRegex = re.compile(r'[aeiou]', re.I) # Match any vowel, regardless of case
print(vowelRegex.findall('Al, why does your programming book talk about RoboCop so much?'))
^
regex character means the string must start with the pattern, $
means the string must end with the pattern. .
regex character is a wildcard; it matches anything except newlines.re.DOTALL
parameter can be used in re.compile()
to make the .
match newlines as well.re.I
to re.compile()
to make the matching case-insensitive.