Title: Regular Expression By Example
Slug: regex_by_example
Summary: Regular Expression By Example
Date: 2016-05-01 12:00
Category: Python
Tags: Basics
Authors: Chris Albon

This tutorial is based on: http://www.tutorialspoint.com/python/python_reg_expressions.htm



In [1]:

    
# Import regex
import re



In [2]:

    
# Create some data
text = 'A flock of 120 quick brown foxes jumped over 30 lazy brown, bears.'

^ Matches beginning of line.



In [3]:

    
re.findall('^A', text)









    Out[3]:





['A']

$ Matches end of line.



In [4]:

    
re.findall('bears.$', text)









    Out[4]:





['bears.']

. Matches any single character except newline.



In [5]:

    
re.findall('f..es', text)









    Out[5]:





['foxes']

[...] Matches any single character in brackets.



In [6]:

    
# Find all vowels
re.findall('[aeiou]', text)









    Out[6]:





['o', 'o', 'u', 'i', 'o', 'o', 'e', 'u', 'e', 'o', 'e', 'a', 'o', 'e', 'a']

[# ^...] Matches any single character not in brackets



In [7]:

    
# Find all characters that are not lower-case vowels
re.findall('[^aeiou]', text)









    Out[7]:





['A',
 ' ',
 'f',
 'l',
 'c',
 'k',
 ' ',
 'f',
 ' ',
 '1',
 '2',
 '0',
 ' ',
 'q',
 'c',
 'k',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ' ',
 'f',
 'x',
 's',
 ' ',
 'j',
 'm',
 'p',
 'd',
 ' ',
 'v',
 'r',
 ' ',
 '3',
 '0',
 ' ',
 'l',
 'z',
 'y',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ',',
 ' ',
 'b',
 'r',
 's',
 '.']

a | b Matches either a or b.



In [8]:

    
re.findall('a|A', text)









    Out[8]:





['A', 'a', 'a']

(re) Groups regular expressions and remembers matched text.



In [9]:

    
# Find any instance of 'fox'
re.findall('(foxes)', text)









    Out[9]:





['foxes']

\w Matches word characters.



In [10]:

    
# Break up string into five character blocks
re.findall('\w\w\w\w\w', text)









    Out[10]:





['flock', 'quick', 'brown', 'foxes', 'jumpe', 'brown', 'bears']

\W Matches nonword characters.



In [11]:

    
re.findall('\W\W', text)









    Out[11]:





[', ']

\s Matches whitespace. Equivalent to [\t\n\r\f].



In [12]:

    
re.findall('\s', text)









    Out[12]:





[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

\S Matches nonwhitespace.



In [13]:

    
re.findall('\S\S', text)









    Out[13]:





['fl',
 'oc',
 'of',
 '12',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'xe',
 'ju',
 'mp',
 'ed',
 'ov',
 'er',
 '30',
 'la',
 'zy',
 'br',
 'ow',
 'n,',
 'be',
 'ar',
 's.']

\d Matches digits. Equivalent to [0-9].



In [14]:

    
re.findall('\d\d\d', text)









    Out[14]:





['120']

\D Matches nondigits.



In [15]:

    
re.findall('\D\D\D\D\D', text)









    Out[15]:





['A flo',
 'ck of',
 ' quic',
 'k bro',
 'wn fo',
 'xes j',
 'umped',
 ' over',
 ' lazy',
 ' brow',
 'n, be']

\A Matches beginning of string.



In [16]:

    
re.findall('\AA', text)









    Out[16]:





['A']

\Z Matches end of string. If a newline exists, it matches just before newline.



In [17]:

    
re.findall('bears.\Z', text)









    Out[17]:





['bears.']

\b Matches end of string.



In [19]:

    
re.findall('\b[foxes]', text)









    Out[19]:





[]

\n, \t, etc. Matches newlines, carriage returns, tabs, etc.



In [20]:

    
re.findall('\n', text)









    Out[20]:





[]

[Pp]ython Match "Python" or "python"



In [21]:

    
re.findall('[Ff]oxes', 'foxes Foxes Doxes')









    Out[21]:





['foxes', 'Foxes']

[0-9] Match any digit; same as [0123456789]



In [22]:

    
re.findall('[Ff]oxes', 'foxes Foxes Doxes')









    Out[22]:





['foxes', 'Foxes']

[a-z] Match any lowercase ASCII letter



In [23]:

    
re.findall('[a-z]', 'foxes Foxes')









    Out[23]:





['f', 'o', 'x', 'e', 's', 'o', 'x', 'e', 's']

[A-Z] Match any uppercase ASCII letter



In [24]:

    
re.findall('[A-Z]', 'foxes Foxes')









    Out[24]:





['F']

[a-zA-Z0-9] Match any of the above



In [25]:

    
re.findall('[a-zA-Z0-9]', 'foxes Foxes')









    Out[25]:





['f', 'o', 'x', 'e', 's', 'F', 'o', 'x', 'e', 's']

[^aeiou] Match anything other than a lowercase vowel



In [26]:

    
re.findall('[^aeiou]', 'foxes Foxes')









    Out[26]:





['f', 'x', 's', ' ', 'F', 'x', 's']

[^0-9] Match anything other than a digit



In [27]:

    
re.findall('[^0-9]', 'foxes Foxes')









    Out[27]:





['f', 'o', 'x', 'e', 's', ' ', 'F', 'o', 'x', 'e', 's']

ruby? Match "rub" or "ruby": the y is optional



In [28]:

    
re.findall('foxes?', 'foxes Foxes')









    Out[28]:





['foxes']

ruby* Match "rub" plus 0 or more ys



In [29]:

    
re.findall('ox*', 'foxes Foxes')









    Out[29]:





['ox', 'ox']

ruby+ Match "rub" plus 1 or more ys



In [30]:

    
re.findall('ox+', 'foxes Foxes')









    Out[30]:





['ox', 'ox']

\d{3} Match exactly 3 digits



In [31]:

    
re.findall('\d{3}', text)









    Out[31]:





['120']

\d{3,} Match 3 or more digits



In [32]:

    
re.findall('\d{2,}', text)









    Out[32]:





['120', '30']

\d{3,5} Match 3, 4, or 5 digits



In [33]:

    
re.findall('\d{2,3}', text)









    Out[33]:





['120', '30']

^Python Match "Python" at the start of a string or internal line



In [34]:

    
re.findall('^A', text)









    Out[34]:





['A']

Python$ Match "Python" at the end of a string or line



In [35]:

    
re.findall('bears.$', text)









    Out[35]:





['bears.']

\APython Match "Python" at the start of a string



In [36]:

    
re.findall('\AA', text)









    Out[36]:





['A']

Python\Z Match "Python" at the end of a string



In [37]:

    
re.findall('bears.\Z', text)









    Out[37]:





['bears.']

Python(?=!) Match "Python", if followed by an exclamation point



In [38]:

    
re.findall('bears(?=.)', text)









    Out[38]:





['bears']

Python(?!!) Match "Python", if not followed by an exclamation point



In [39]:

    
re.findall('foxes(?!!)', 'foxes foxes!')









    Out[39]:





['foxes']

python|perl Match "python" or "perl"



In [40]:

    
re.findall('foxes|foxes!', 'foxes foxes!')









    Out[40]:





['foxes', 'foxes']

rub(y|le)) Match "ruby" or "ruble"



In [41]:

    
re.findall('fox(es!)', 'foxes foxes!')









    Out[41]:





['es!']

Python(!+|\?) "Python" followed by one or more ! or one ?



In [42]:

    
re.findall('foxes(!)', 'foxes foxes!')









    Out[42]:





['!']