Regular expressions use pattern matching to find text. They are typically faster than the alternative.
In [2]:
def isPhoneNumber(text):
if len(text) != 12:
return False # not phone number-sized
for i in range (0,3):
if not text[i].isdecimal():
return False # no area code
if text[3] != '-':
return False # missing dash
for i in range(4,7):
if not text[i].isdecimal():
return False # no first 3 digits
if text[7] != '-':
return False # missing second dash
for i in range (8,12):
if not text[i].isdecimal():
return False # missing last 4 digits
return True
You can then test what strings count as phone numbers using this program.
In [4]:
print(isPhoneNumber('415-555-1234')) # False
print(isPhoneNumber('Hello')) # False
print(isPhoneNumber('415551234')) # False
Use to check if strings contain phone numbers:
In [22]:
message = 'Call me at 415-555-1011 tomorrow, or at 415-555-9999 any other day.'
message2 = 'There are no phone numbers in this message.'
def findNumber(message):
foundNumber = False # set False to start|
for i in range(len(message)):
chunk = message[i:i+12] # Take a phone number size 'chunk' of the string, character by character
#print(chunk) # debug
if isPhoneNumber(chunk):
print('Phone number found: ' + chunk)
foundNumber = True
if not foundNumber: # Run after loop, not during loop
print('Could not find any phone numbers.')
findNumber(message)
findNumber(message2)
This is a lot of code for text pattern matching; which is very common activity in most programming. Therefore, regular expressions are used to simplify this process.
The re
module stores RegEx functions. It usually takes raw strings (r''
).
In [33]:
import re
print(message)
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # Defines the pattern and converts to Regex
mo = phoneNumRegex.search(message) #re.search() searches string and returns a Match Object
print(mo.group()) # the .group() method contains the actual text in the Match Object
This is 3 lines of code in place of 30, which is significantly more effecient.
You can also use the .findall()
method to find all RegEx matches, not just the first.
In [38]:
mo = phoneNumRegex.findall(message) #re.findall() returns a Match Object List
print(mo) # mo.findall() returns a list, so it doesn't need .group()
r'\d'
re.compile()
function to create a regex object..search()
method to create a match object..group()
method to get the matched string..findall()
to get a list of matched objects.