Title: Regular Expression By Example
Slug: regex_by_example
Summary: Regular Expression By Example
Date: 2016-05-01 12:00
Category: Python
Tags: Basics
Authors: Chris Albon

This tutorial is based on: http://www.tutorialspoint.com/python/python_reg_expressions.htm


In [1]:
# Import regex
import re

In [2]:
# Create some data
text = 'A flock of 120 quick brown foxes jumped over 30 lazy brown, bears.'

^ Matches beginning of line.


In [3]:
re.findall('^A', text)


Out[3]:
['A']

$ Matches end of line.


In [4]:
re.findall('bears.$', text)


Out[4]:
['bears.']

. Matches any single character except newline.


In [5]:
re.findall('f..es', text)


Out[5]:
['foxes']

[...] Matches any single character in brackets.


In [6]:
# Find all vowels
re.findall('[aeiou]', text)


Out[6]:
['o', 'o', 'u', 'i', 'o', 'o', 'e', 'u', 'e', 'o', 'e', 'a', 'o', 'e', 'a']

[# ^...] Matches any single character not in brackets


In [7]:
# Find all characters that are not lower-case vowels
re.findall('[^aeiou]', text)


Out[7]:
['A',
 ' ',
 'f',
 'l',
 'c',
 'k',
 ' ',
 'f',
 ' ',
 '1',
 '2',
 '0',
 ' ',
 'q',
 'c',
 'k',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ' ',
 'f',
 'x',
 's',
 ' ',
 'j',
 'm',
 'p',
 'd',
 ' ',
 'v',
 'r',
 ' ',
 '3',
 '0',
 ' ',
 'l',
 'z',
 'y',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ',',
 ' ',
 'b',
 'r',
 's',
 '.']

a | b Matches either a or b.


In [8]:
re.findall('a|A', text)


Out[8]:
['A', 'a', 'a']

(re) Groups regular expressions and remembers matched text.


In [9]:
# Find any instance of 'fox'
re.findall('(foxes)', text)


Out[9]:
['foxes']

\w Matches word characters.


In [10]:
# Break up string into five character blocks
re.findall('\w\w\w\w\w', text)


Out[10]:
['flock', 'quick', 'brown', 'foxes', 'jumpe', 'brown', 'bears']

\W Matches nonword characters.


In [11]:
re.findall('\W\W', text)


Out[11]:
[', ']

\s Matches whitespace. Equivalent to [\t\n\r\f].


In [12]:
re.findall('\s', text)


Out[12]:
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

\S Matches nonwhitespace.


In [13]:
re.findall('\S\S', text)


Out[13]:
['fl',
 'oc',
 'of',
 '12',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'xe',
 'ju',
 'mp',
 'ed',
 'ov',
 'er',
 '30',
 'la',
 'zy',
 'br',
 'ow',
 'n,',
 'be',
 'ar',
 's.']

\d Matches digits. Equivalent to [0-9].


In [14]:
re.findall('\d\d\d', text)


Out[14]:
['120']

\D Matches nondigits.


In [15]:
re.findall('\D\D\D\D\D', text)


Out[15]:
['A flo',
 'ck of',
 ' quic',
 'k bro',
 'wn fo',
 'xes j',
 'umped',
 ' over',
 ' lazy',
 ' brow',
 'n, be']

\A Matches beginning of string.


In [16]:
re.findall('\AA', text)


Out[16]:
['A']

\Z Matches end of string. If a newline exists, it matches just before newline.


In [17]:
re.findall('bears.\Z', text)


Out[17]:
['bears.']

\b Matches end of string.


In [19]:
re.findall('\b[foxes]', text)


Out[19]:
[]

\n, \t, etc. Matches newlines, carriage returns, tabs, etc.


In [20]:
re.findall('\n', text)


Out[20]:
[]

[Pp]ython Match "Python" or "python"


In [21]:
re.findall('[Ff]oxes', 'foxes Foxes Doxes')


Out[21]:
['foxes', 'Foxes']

[0-9] Match any digit; same as [0123456789]


In [22]:
re.findall('[Ff]oxes', 'foxes Foxes Doxes')


Out[22]:
['foxes', 'Foxes']

[a-z] Match any lowercase ASCII letter


In [23]:
re.findall('[a-z]', 'foxes Foxes')


Out[23]:
['f', 'o', 'x', 'e', 's', 'o', 'x', 'e', 's']

[A-Z] Match any uppercase ASCII letter


In [24]:
re.findall('[A-Z]', 'foxes Foxes')


Out[24]:
['F']

[a-zA-Z0-9] Match any of the above


In [25]:
re.findall('[a-zA-Z0-9]', 'foxes Foxes')


Out[25]:
['f', 'o', 'x', 'e', 's', 'F', 'o', 'x', 'e', 's']

[^aeiou] Match anything other than a lowercase vowel


In [26]:
re.findall('[^aeiou]', 'foxes Foxes')


Out[26]:
['f', 'x', 's', ' ', 'F', 'x', 's']

[^0-9] Match anything other than a digit


In [27]:
re.findall('[^0-9]', 'foxes Foxes')


Out[27]:
['f', 'o', 'x', 'e', 's', ' ', 'F', 'o', 'x', 'e', 's']

ruby? Match "rub" or "ruby": the y is optional


In [28]:
re.findall('foxes?', 'foxes Foxes')


Out[28]:
['foxes']

ruby* Match "rub" plus 0 or more ys


In [29]:
re.findall('ox*', 'foxes Foxes')


Out[29]:
['ox', 'ox']

ruby+ Match "rub" plus 1 or more ys


In [30]:
re.findall('ox+', 'foxes Foxes')


Out[30]:
['ox', 'ox']

\d{3} Match exactly 3 digits


In [31]:
re.findall('\d{3}', text)


Out[31]:
['120']

\d{3,} Match 3 or more digits


In [32]:
re.findall('\d{2,}', text)


Out[32]:
['120', '30']

\d{3,5} Match 3, 4, or 5 digits


In [33]:
re.findall('\d{2,3}', text)


Out[33]:
['120', '30']

^Python Match "Python" at the start of a string or internal line


In [34]:
re.findall('^A', text)


Out[34]:
['A']

Python$ Match "Python" at the end of a string or line


In [35]:
re.findall('bears.$', text)


Out[35]:
['bears.']

\APython Match "Python" at the start of a string


In [36]:
re.findall('\AA', text)


Out[36]:
['A']

Python\Z Match "Python" at the end of a string


In [37]:
re.findall('bears.\Z', text)


Out[37]:
['bears.']

Python(?=!) Match "Python", if followed by an exclamation point


In [38]:
re.findall('bears(?=.)', text)


Out[38]:
['bears']

Python(?!!) Match "Python", if not followed by an exclamation point


In [39]:
re.findall('foxes(?!!)', 'foxes foxes!')


Out[39]:
['foxes']

python|perl Match "python" or "perl"


In [40]:
re.findall('foxes|foxes!', 'foxes foxes!')


Out[40]:
['foxes', 'foxes']

rub(y|le)) Match "ruby" or "ruble"


In [41]:
re.findall('fox(es!)', 'foxes foxes!')


Out[41]:
['es!']

Python(!+|\?) "Python" followed by one or more ! or one ?


In [42]:
re.findall('foxes(!)', 'foxes foxes!')


Out[42]:
['!']