Regular expressions use pattern matching to find text. They are typically faster than the alternative.
In [2]:
def isPhoneNumber(text):
if len(text) != 12:
return False # not phone number-sized
for i in range (0,3):
if not text[i].isdecimal():
return False # no area code
if text[3] != '-':
return False # missing dash
for i in range(4,7):
if not text[i].isdecimal():
return False # no first 3 digits
if text[7] != '-':
return False # missing second dash
for i in range (8,12):
if not text[i].isdecimal():
return False # missing last 4 digits
return True
You can then test what strings count as phone numbers using this program.
In [4]:
print(isPhoneNumber('415-555-1234')) # False
print(isPhoneNumber('Hello')) # False
print(isPhoneNumber('415551234')) # False
Use to check if strings contain phone numbers:
In [22]:
message = 'Call me at 415-555-1011 tomorrow, or at 415-555-9999 any other day.'
message2 = 'There are no phone numbers in this message.'
def findNumber(message):
foundNumber = False # set False to start|
for i in range(len(message)):
chunk = message[i:i+12] # Take a phone number size 'chunk' of the string, character by character
#print(chunk) # debug
if isPhoneNumber(chunk):
print('Phone number found: ' + chunk)
foundNumber = True
if not foundNumber: # Run after loop, not during loop
print('Could not find any phone numbers.')
findNumber(message)
findNumber(message2)
This is a lot of code for text pattern matching; which is very common activity in most programming. Therefore, regular expressions are used to simplify this process.
The re module stores RegEx functions. It usually takes raw strings (r'').
In [33]:
import re
print(message)
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # Defines the pattern and converts to Regex
mo = phoneNumRegex.search(message) #re.search() searches string and returns a Match Object
print(mo.group()) # the .group() method contains the actual text in the Match Object
This is 3 lines of code in place of 30, which is significantly more effecient.
You can also use the .findall() method to find all RegEx matches, not just the first.
In [38]:
mo = phoneNumRegex.findall(message) #re.findall() returns a Match Object List
print(mo) # mo.findall() returns a list, so it doesn't need .group()
r'\d're.compile() function to create a regex object..search() method to create a match object..group() method to get the matched string..findall() to get a list of matched objects.