Regular Expressions

An experiment in Jupyter slides.

J. Roberts

$E = mc^2$

Regular expressions built in via the re module. Super simple example:



In [7]:

    
import re
match = re.search(r'\d+', r'abc123def')  # note the "r" prefix
print match.span() # what do the numbers represent?









    



(3, 6)

Special Sequences

Several, special keys are used for sequences of importance in the re module.

name	description
`\d`	any digit, i.e., `[0-9]`
`\D`	any non-digit, i.e., `[^0-9]`
`\s`	any whitespace, i.e., `[ \t\n\r\f\v]`
`\S`	any non-whitespace, i.e., `[^ \t\n\r\f\v]`
`\w`	alphanumeric, i.e., `[a-zA-Z0-9_]`
`\W`	non alphanumeric, i.e., `[^ a-zA-Z0-9_]`

Metacharacters

Several, special "metacharacters" are used to define regular expressions with the re module.

name	description
`.`	any character but `\n`
`^`	match at beginning or class complement
`$`	match at ending
`*`	match 0 or more times
`?`	match 0 or 1 times
`\`	escape character
`\|`	"or"
`[]`	defines character class, e.g., `[a-z]`
`{}`	for repeated qualifier, e.g., `ab{2,3}`
`()`	for groups

Example 1

Consider the pattern ca*t. Does it match the following? If so, what is the match?

ct
cat
caaat
go cats!



In [8]:

    
pattern = r'ca*t'
print re.match(pattern, r'ct').span()
print re.match(pattern, r'cat').span()
print re.match(pattern, r'caaat').span()
print re.match(pattern, r'go cats!')









    



(0, 2)
(0, 3)
(0, 5)
None

Example 2

How about this slight modification? Consider ca*[\w ]+t applied to catenkerous cat. Is it a match? How much?



In [9]:

    
print re.match(r'ca*[\w ]+t', r'catenkerous cat!').span()

This highlights the fact that * is greedy. In other words, it grabs as large a match as possible.

Now, for the fun stuff. Do

  cd /path/to/ME701_examples
  git pull

You should now have a new folder re with some fun, real-world data to munge!