We have to use re module. This module have the following method:

  • split, split a string on regex
  • findall, find all patterns in a string
  • search, search for a pattern
  • match: match anentire string or substring based on a pattern

Common Pattern

  • Or is represented using |
  • define a group using ()
  • range with []

In [1]:
import re

In [2]:
my_string="Let's write RegEx!  Won't that be fun?  I sure think so.  Can you find 4 sentences?  Or perhaps, all 19 words?"

In [3]:
#Match Sentence
sentence_endings = r"[.?!]"
print(re.split(sentence_endings, my_string))


["Let's write RegEx", "  Won't that be fun", '  I sure think so', '  Can you find 4 sentences', '  Or perhaps, all 19 words', '']

In [4]:
#Find all capitalized word
capitalized_words = r"[A-Z]\w+"
print(re.findall(capitalized_words, my_string))


['Let', 'RegEx', 'Won', 'Can', 'Or']

In [5]:
#Split by space
spaces = r"\s+"
print(re.split(spaces, my_string))


["Let's", 'write', 'RegEx!', "Won't", 'that', 'be', 'fun?', 'I', 'sure', 'think', 'so.', 'Can', 'you', 'find', '4', 'sentences?', 'Or', 'perhaps,', 'all', '19', 'words?']

In [6]:
#Find Digit
digits = r"\d+"
print(re.findall(digits, my_string))


['4', '19']

In [9]:
#Find all the word
word=r"\w+"
re.findall(word,my_string)


Out[9]:
['Let',
 's',
 'write',
 'RegEx',
 'Won',
 't',
 'that',
 'be',
 'fun',
 'I',
 'sure',
 'think',
 'so',
 'Can',
 'you',
 'find',
 '4',
 'sentences',
 'Or',
 'perhaps',
 'all',
 '19',
 'words']

In [1]:
# Match Digit and Word
import re
match_digits_and_words = ('(\d+|\w+)')
re.findall(match_digits_and_words, 'He has 11 cats.')


Out[1]:
['He', 'has', '11', 'cats']

In [3]:
#Match Word
pattern=r"[A-Za-z]+"
re.findall(pattern,'He has 11 cats.')


Out[3]:
['He', 'has', 'cats']

In [7]:
pattern=r"[A-Za-z\-\.]+"
re.findall(pattern, 'My website is My-website.com')


Out[7]:
['My', 'website', 'is', 'My-website.com']

In [8]:
pattern=r"[A-Za-z]+"
re.findall(pattern, 'My website is My-website.com')


Out[8]:
['My', 'website', 'is', 'My', 'website', 'com']

In [15]:
pattern=r"(has)"
re.findall(pattern,'He has 11 cats.')


Out[15]:
['has']

In [16]:
space_or_comma=r"(\s+|,)"
re.findall(space_or_comma,'He has 11 cats.')


Out[16]:
[' ', ' ', ' ']

In [23]:
my_str = 'match lowercase spaces nums like 12, but no commas'
re.match('[a-z0-9\s]+' ,my_str)


Out[23]:
<_sre.SRE_Match object; span=(0, 35), match='match lowercase spaces nums like 12'>

In [27]:
my_string = "SOLDIER #1: Found them? In Mercea? The coconut's tropical!"
re.findall(r"(\w+|#\d|\?|!)",my_string)


Out[27]:
['SOLDIER',
 '#1',
 'Found',
 'them',
 '?',
 'In',
 'Mercea',
 '?',
 'The',
 'coconut',
 's',
 'tropical',
 '!']

Difference between search and match


In [2]:
import re
print(re.match('abc','abcde'))
print(re.search('abc','abcde'))


<_sre.SRE_Match object; span=(0, 3), match='abc'>
<_sre.SRE_Match object; span=(0, 3), match='abc'>

In [3]:
import re
print(re.match('cd','abcde'))
print(re.search('cd','abcde'))


None
<_sre.SRE_Match object; span=(2, 4), match='cd'>

In [4]:
scene_one='''SCENE 1: [wind] [clop clop clop] \nKING ARTHUR: Whoa there!  [clop clop clop] \nSOLDIER #1: Halt!  Who goes there?\nARTHUR: It is I, Arthur, son of Uther Pendragon, from the castle of Camelot.  King of the Britons, defeator of the Saxons, sovereign of all England!\nSOLDIER #1: Pull the other one!\nARTHUR: I am, ...  and this is my trusty servant Patsy.  We have ridden the length and breadth of the land in search of knights who will join me in my court at Camelot.  I must speak with your lord and master.\nSOLDIER #1: What?  Ridden on a horse?\nARTHUR: Yes!\nSOLDIER #1: You're using coconuts!\nARTHUR: What?\nSOLDIER #1: You've got two empty halves of coconut and you're bangin' 'em together.\nARTHUR: So?  We have ridden since the snows of winter covered this land, through the kingdom of Mercea, through--\nSOLDIER #1: Where'd you get the coconuts?\nARTHUR: We found them.\nSOLDIER #1: Found them?  In Mercea?  The coconut's tropical!\nARTHUR: What do you mean?\nSOLDIER #1: Well, this is a temperate zone.\nARTHUR: The swallow may fly south with the sun or the house martin or the plover may seek warmer climes in winter, yet these are not strangers to our land?\nSOLDIER #1: Are you suggesting coconuts migrate?\nARTHUR: Not at all.  They could be carried.\nSOLDIER #1: What?  A swallow carrying a coconut?\nARTHUR: It could grip it by the husk!\nSOLDIER #1: It's not a question of where he grips it!  It's a simple question of weight ratios!  A five ounce bird could not carry a one pound coconut.\nARTHUR: Well, it doesn't matter.  Will you go and tell your master that Arthur from the Court of Camelot is here.\nSOLDIER #1: Listen.  In order to maintain air-speed velocity, a swallow needs to beat its wings forty-three times every second, right?\nARTHUR: Please!\nSOLDIER #1: Am I right?\nARTHUR: I'm not interested!\nSOLDIER #2: It could be carried by an African swallow!\nSOLDIER #1: Oh, yeah, an African swallow maybe, but not a European swallow.  That's my point.\nSOLDIER #2: Oh, yeah, I agree with that.\nARTHUR: Will you ask your master if he wants to join my court at Camelot?!\nSOLDIER #1: But then of course a-- African swallows are non-migratory.\nSOLDIER #2: Oh, yeah...\nSOLDIER #1: So they couldn't bring a coconut back anyway...  [clop clop clop] \nSOLDIER #2: Wait a minute!  Supposing two swallows carried it together?\nSOLDIER #1: No, they'd have to have it on a line.\nSOLDIER #2: Well, simple!  They'd just use a strand of creeper!\nSOLDIER #1: What, held under the dorsal guiding feathers?
\nSOLDIER #2: Well, why not?\n'''

In [6]:
# Print the start and end indexes of match
match = re.search("coconuts",scene_one)
print(match.start(), match.end())


580 588

In [7]:
# Search anything in square brackets
pattern1 = r"\[.*\]"
# Use re.search to find the first text in square brackets
re.search(pattern1,scene_one)


Out[7]:
<_sre.SRE_Match object; span=(9, 32), match='[wind] [clop clop clop]'>

In [9]:
#Match script notation
sentence='''ARTHUR: It is I, Arthur, son of Uther Pendragon, from the castle of Camelot.'''
pattern2 = r"[\w\s]+:"
print(re.match(pattern2,sentence))


<_sre.SRE_Match object; span=(0, 7), match='ARTHUR:'>