Lesson 27:

RegEx `.*` Dot-Star, `^` Caret, & `$` Dollar Sign Characters

Besides just turning a class negative, the ^ character can also define the start of a string.

The $ character can be used in combination to define the end of a string.



In [11]:

    
import re

beginsWithTheHelloRegex = re.compile(r'^Hello') # String must start exactly with 'Hello'

print(beginsWithTheHelloRegex.findall('Hello there'))
print(beginsWithTheHelloRegex.findall('Wait, did he say Hello just now?'))
print(beginsWithTheHelloRegex.findall('He said Hello'))

endsWithTheHelloRegex = re.compile(r'Hello$') # String must end exactly with 'Hello'

print(endsWithTheHelloRegex.findall('Hello there'))
print(beginsWithTheHelloRegex.findall('Wait, did he say Hello just now?'))
print(endsWithTheHelloRegex.findall('He said Hello'))









    



['Hello']
[]
[]
[]
[]
['Hello']

They can be used in combination:



In [35]:

    
allDigitsRegex = re.compile(r'^\d+$') # Must start and end with a digit, with at least 1 or more digits inbetween

print(allDigitsRegex.findall('2153234623462561514')) # Matches entire string
print(allDigitsRegex.findall('21532346234letters!62561514')) # No match, doesn't end with string









    Out[35]:





[]

The . character matches any character.



In [18]:

    
atRegex = re.compile(r'.at') # Any single character followed by at

print(atRegex.findall('The cat in the hat sat on the flat mat.')) # matches anything ending with at

atRegex = re.compile(r'.{2}at') # Any two characters followed by at

print(atRegex.findall('The cat in the hat sat on the flat mat.')) # matches anything ending with at, including spaces









    



['cat', 'hat', 'sat', 'lat', 'mat']
[' cat', ' hat', ' sat', 'flat', ' mat']

The .* is therefore used to match anything, any number of any character:



In [23]:

    
name = 'First Name: Al, Last Name: Sweigart' # To pull names from this string would require a lot of indexing code
name2 = 'First Name: Vivek, Last Name: Menon' # To pull names from this string would require a lot of indexing code

nameRegex = re.compile(r'First Name: (.*), Last Name: (.*)') # Matches anything in this groups formatted exactly like this

print(nameRegex.findall(name))
print(nameRegex.findall(name2))









    



[('Al', 'Sweigart')]
[('Vivek', 'Menon')]

.* is greedy by default, but you can activate non-greedy mode with .*?



In [26]:

    
serve = '<To serve humans> for dinner.>'

greedyRegex = re.compile(r'<(.*)>') # Looking for any length match, between brackets. 
nongreedyRegex = re.compile(r'<(.*?)>') # Looking for any length match, between brackets. 

print(greedyRegex.findall(serve)) # Matches the longest string
print(nongreedyRegex.findall(serve)) # Matches the shortest string









    



['To serve humans> for dinner.']
['To serve humans']

.* matches any character except the newline (\n) character.



In [30]:

    
primeDirectives = 'Serve the public trust.\nProtect the innocent.\nUphold the law.'

print(primeDirectives)

dotStar = re.compile(r'.*')
print(dotStar.findall(primeDirectives))









    



Serve the public trust.
Protect the innocent.
Uphold the law.
['Serve the public trust.', '', 'Protect the innocent.', '', 'Uphold the law.', '']

We can use the paramater re.DOTALL can set to truly match any character:



In [34]:

    
dotStar = re.compile(r'.*', re.DOTALL)
print(dotStar.findall(primeDirectives))









    Out[34]:





['Serve the public trust.\nProtect the innocent.\nUphold the law.', '']

Similiarily, re.IGNORECASE or re.I to ignore case:



In [38]:

    
vowelRegex = re.compile(r'[aeiou]', re.I) # Match any vowel, regardless of case 

print(vowelRegex.findall('Al, why does your programming book talk about RoboCop so much?'))









    



['A', 'o', 'e', 'o', 'u', 'o', 'a', 'i', 'o', 'o', 'a', 'a', 'o', 'u', 'o', 'o', 'o', 'o', 'u']

Recap

The ^ regex character means the string must start with the pattern, $ means the string must end with the pattern.
Both means the string must match the pattern exactly.
The . regex character is a wildcard; it matches anything except newlines.
The re.DOTALL parameter can be used in re.compile() to make the . match newlines as well.
Pass re.I to re.compile() to make the matching case-insensitive.

Lesson 27:

RegEx .* Dot-Star, ^ Caret, & $ Dollar Sign Characters

Recap

RegEx `.*` Dot-Star, `^` Caret, & `$` Dollar Sign Characters