In [3]:

    
import re



In [1]:

    
#Sample Msg
sample="This is roger. my contact no is 415-555-4242,& 415-555-4243, 416-555-4244"



In [12]:

    
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

#Search() return first match
mob=phoneNumRegex.search(sample)
print mob.group()









    



415-555-4242

mob=phoneNumRegex.findall(sample)



In [15]:

    
#find all return All matches
mob=phoneNumRegex.findall(sample)
print mob









    



['415-555-4242', '415-555-4243', '416-555-4244']

Understanding Group

let say we want to categrise our result , like above phone number consist of area code & actual number . using group we can find it & use it easily.



In [24]:

    
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mob=phoneNumRegex.findall(sample)
print mob
#for find all our result is set of tuple & each touple consist of 2 group based on our regression









    



[('415', '555-4242'), ('415', '555-4243'), ('416', '555-4244')]



In [26]:

    
mo=phoneNumRegex.search(sample)
print 'area code: ',mo.group(1)
print 'phone no :',mo.group(2)









    



area code:  415
phone no : 555-4242

Matching Multiple Groups with the Pipe

The regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey' . When both Batman and Tina Fey occur in the searched string, the first occurrence of matching text will be returned as the Match object.



In [27]:

    
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
print mo1.group()









    



Batman



In [29]:

    
mo2 = heroRegex.search('Tina Fey and Batman.')
print mo2.group()









    



Tina Fey



In [31]:

    
mo3=heroRegex.findall('Tina Fey and Batman.')
print mo3









    



['Tina Fey', 'Batman']

More Examples of pipe



In [34]:

    
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
print mo.group()









    



Batmobile



In [35]:

    
mo = batRegex.search('Batbat lost a wheel')
print mo.group()









    



Batbat

Optional Matching with the Question Mark



In [36]:

    
batRegex = re.compile(r'Bat(wo)?man')
#here "wo" is optional

mo1 = batRegex.search('The Adventures of Batman')
print mo1.group()









    



Batman



In [37]:

    
mo1 = batRegex.search('The Adventures of Batwoman')
print mo1.group()









    



Batwoman

Matching Zero or More with the Star

The * (called the star or asterisk) means “match zero or more”—the group that precedes the star can occur any number of times in the text



In [38]:

    
batRegex = re.compile(r'Bat(wo)*man')

mo1 = batRegex.search('The Adventures of Batman')
print mo1.group()









    



Batman



In [39]:

    
mo1 = batRegex.search('The Adventures of Batwoman')
print mo1.group()









    



Batwoman



In [40]:

    
mo1 = batRegex.search('The Adventures of Batwowowoman')
print mo1.group()









    



Batwowowoman

Note: While * means “match zero or more,” the + (or plus) means “match one or more.

Matching Specific Repetitions with Curly Brackets

Regex (Ha){3} will match the string 'HaHaHa <br > egex (Ha){3,5} will match 'HaHaHa' , 'HaHaHaHa' , and 'HaHaHaHaHa' .



In [41]:

    
haRegex = re.compile(r'(Ha){3}')
mo1 = haRegex.search('HaHaHa')
print mo1.group()









    



HaHaHa



In [42]:

    
haRegex = re.compile(r'(Ha){3,5}')
mo1 = haRegex.search('HaHaHaHa')
print mo1.group()









    



HaHaHaHa



In [ ]: