管道符号(|)匹配多个正则表达式: at | home 匹配 at,home
匹配任意单一字符(.): t.o 匹配 tao,tzo
字符串和单词开始和结尾位置匹配:
(^) 匹配字符串开始位置:^From 匹配 From 开始的字符串
(\$) 匹配字符串结尾的位置: /bin/tsch\$ 匹配/bin/tsch结束的字符串
(\b) 匹配单词的边界:\bthe 匹配the开头的单词
In [2]:
import re
m = re.match('foo', 'foo')
if m is not None: m.group()
In [3]:
m
Out[3]:
In [4]:
m = re.match('foo', 'bar')
if m is not None: m.group()
In [5]:
re.match('foo', 'foo on the table').group()
Out[5]:
In [9]:
# raise attributeError
re.match('bar', 'foo on the table').group()
In [11]:
m = re.match('foo','seafood')
if m is not None: m.group()
search 函数将返回字符串开始模式首次出现的位置
In [15]:
re.search('foo', 'seafood').group()
Out[15]:
In [16]:
bt = 'bat|bet|bit'
In [17]:
re.match(bt,'bat').group()
Out[17]:
In [18]:
re.match(bt, 'blt').group()
In [19]:
re.match(bt, 'He bit me!').group()
In [21]:
re.search(bt, 'He bit me!').group()
Out[21]:
In [22]:
anyend='.end'
In [23]:
re.match(anyend, 'bend').group()
Out[23]:
In [24]:
re.match(anyend, 'end').group()
In [25]:
re.search(anyend, '\nend').group()
In [26]:
pattern = '[cr][23][dp][o2]'
In [27]:
re.match(pattern, 'c3po').group()
Out[27]:
In [28]:
re.match(pattern, 'c3do').group()
Out[28]:
In [29]:
re.match('r2d2|c3po', 'c2do').group()
In [30]:
re.match('r2d2|c3po', 'r2d2').group()
Out[30]:
In [31]:
patt = '\w+@(\w+\.)?\w+\.com'
re.match(patt, 'nobady@xxx.com').group()
Out[31]:
In [32]:
re.match(patt, 'nobody@www.xxx.com').group()
Out[32]:
In [33]:
# 匹配多个子域名
patt = '\w+@(\w+\.)*\w+\.com'
re.match(patt, 'nobody@www.xxx.yyy.zzz.com').group()
Out[33]:
In [37]:
patt = '(\w\w\w)-(\d\d\d)'
m = re.match(patt, 'abc-123')
In [38]:
m.group()
Out[38]:
In [39]:
m.group(1)
Out[39]:
In [40]:
m.group(2)
Out[40]:
In [41]:
m.groups()
Out[41]:
In [42]:
m = re.match('ab', 'ab')
m.group()
Out[42]:
In [43]:
m.groups()
Out[43]:
In [44]:
m = re.match('(ab)','ab')
m.groups()
Out[44]:
In [45]:
m.group(1)
Out[45]:
In [46]:
m = re.match('(a(b))', 'ab')
m.group()
Out[46]:
In [48]:
m.group(1)
Out[48]:
In [49]:
m.group(2)
Out[49]:
In [50]:
m.groups()
Out[50]:
In [51]:
re.match('^The', 'The end.').group()
Out[51]:
In [52]:
# raise attributeError
re.match('^The', 'end. The').group()
In [53]:
re.search(r'\bthe', 'bite the dog').group()
Out[53]:
In [54]:
re.search(r'\bthe', 'bitethe dog').group()
In [55]:
re.search(r'\Bthe', 'bitthe dog').group()
Out[55]:
In [56]:
re.findall('car', 'car')
Out[56]:
In [57]:
re.findall('car', 'scary')
Out[57]:
In [58]:
re.findall('car', 'carry, the barcardi to the car')
Out[58]:
In [62]:
(re.sub('X', 'Mr. Smith', 'attn: X\n\nDear X, \n'))
Out[62]:
In [63]:
print re.subn('X', 'Mr. Smith', 'attn: X\n\nDear X, \n')
In [64]:
re.sub('[ae]', 'X', 'abcdedf')
Out[64]:
In [66]:
re.split(':','str1:str2:str3')
Out[66]:
In [68]:
from os import popen
from re import split
f = popen('who', 'r')
for eachLine in f.readlines():
print split('\s\s+|\t', eachLine.strip())
f.close()
In [70]:
string = 'Thu Feb 15 17:46:04 2007::gaufung@cumt.edu.cn::1171590364-6-8'
patt = '.+\d+-\d+-\d+'
re.match(patt, string).group()
Out[70]:
In [72]:
patt = '.+(\d+-\d+-\d+)'
re.match(patt, string).group(1)
Out[72]:
由于通配符“.”默认贪心的,所以'.+'将会匹配尽可能多的字符,所以
Thu Feb 15 17:46:04 2007::gaufung@cumt.edu.cn::117159036
将匹配'.+',而分组匹配的内容则是“4-6-8”,非贪婪算法则通过'?'解决
In [73]:
patt = '.+?(\d+-\d+-\d+)'
re.match(patt, string).group(1)
Out[73]:
In [ ]: