string and text



In [1]:

    
# 多行结果输出支持
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"



In [2]:

    
line = 'asdf fjdk; afed, fjek,asdf, foo'
import re
re.split(r'[;,\s]\s*', line)









    Out[2]:





['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

函数 re.split() 是非常实用的，因为它允许你为分隔符指定多个正则模式
检查字符串开头或结尾的一个简单方法是使用 str.startswith() 或者是 str.endswith() 方法
检查多种匹配可能，只需要将所有的匹配项放入到一个元组中去，然后传给 startswith() 或者 endswith() 方法
startswith() 和 endswith() 方法提供了一个非常方便的方式去做字符串开头和结尾的检查。类似的操作也可以使用切片来实现，但是代码看起来没有那么优雅



In [9]:

    
filename = 'spam.txt'
filename.endswith('.txt')
filename.startswith('file:')
url = 'http://www.python.org'
url.startswith('http:')









    Out[9]:





True






    Out[9]:





False






    Out[9]:





True



In [10]:

    
filenames = [ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]
[name.endswith('.py') for name in filenames]
# 是否存在 True
any(name.endswith('.py') for name in filenames)









    Out[10]:





[False, False, True, False, False]






    Out[10]:





True

用Shell通配符匹配字符串

如果你的代码需要做文件名的匹配，最好使用 glob 模块
对于复杂的匹配需要使用正则表达式和 re 模块



In [12]:

    
from fnmatch import fnmatch, fnmatchcase
fnmatch('foo.txt', '*.txt')









    Out[12]:





True



In [13]:

    
fnmatch('foo.txt', '?oo.txt')
fnmatch('Dat45.csv', 'Dat[0-9]*')









    Out[13]:





True






    Out[13]:





True



In [14]:

    
names = ['Dat1.csv', 'Dat2.csv', 'config.ini', 'foo.py']
[name for name in names if fnmatch(name, 'Dat*.csv')]









    Out[14]:





['Dat1.csv', 'Dat2.csv']

字符串搜索和替换

对于简单的字面模式，直接使用 str.replace() 方法即可
对于复杂的模式，请使用 re 模块中的 sub() 函数



In [18]:

    
text = 'yeah, but no, but yeah, but no, but yeah'
text
# 所有都会替换
text.replace('yeah', 'yep')
# 只替换前2个
text.replace('yeah', 'yep', 2)









    Out[18]:





'yeah, but no, but yeah, but no, but yeah'






    Out[18]:





'yep, but no, but yep, but no, but yep'






    Out[18]:





'yep, but no, but yep, but no, but yeah'

字符串忽略大小写的搜索替换

为了在文本操作时忽略大小写，你需要在使用 re 模块的时候给这些操作提供 re.IGNORECASE 标志参数



In [20]:

    
text = 'UPPER PYTHON, lower python, Mixed Python'
# 查找
re.findall('python', text, flags=re.IGNORECASE)
# 替换
re.sub('python', 'snake', text, flags=re.IGNORECASE)









    Out[20]:





['PYTHON', 'python', 'Python']






    Out[20]:





'UPPER snake, lower snake, Mixed snake'

删除字符串中不需要的字符

strip() 方法能用于删除开始或结尾的字符。 lstrip() 和 rstrip() 分别从左和从右执行删除操作。默认情况下，这些方法会去除空白字符，但是你也可以指定其他字符
如果你想处理中间的空格，那么你需要求助其他技术。比如使用 replace() 方法或者是用正则表达式替换



In [25]:

    
# Whitespace stripping
s = ' hello world \n'
s
s.strip()









    Out[25]:





' hello world \n'






    Out[25]:





'hello world'



In [27]:

    
# 从左边
s.lstrip()
s.rstrip()









    Out[27]:





'hello world \n'






    Out[27]:





' hello world'



In [29]:

    
# Character stripping
t = '-----hello====='
# 去掉左边的 '-'
t.lstrip('-')
# 去掉 '-='
t.strip('-=')









    Out[29]:





'hello====='






    Out[29]:





'hello'

字符串对齐

对于基本的字符串对齐操作，可以使用字符串的 ljust() , rjust() 和 center() 方法
函数 format() 同样可以用来很容易的对齐字符串。你要做的就是使用 <,> 或者 ^ 字符后面紧跟一个指定的宽度
指定一个非空格的填充字符，将它写到对齐字符的前面即可



In [34]:

    
text = 'Hello World'
text.ljust(20)
text.rjust(20)
text.center(20)
text.ljust(20, '*')
text.center(20, '*')









    Out[34]:





'Hello World         '






    Out[34]:





'         Hello World'






    Out[34]:





'    Hello World     '






    Out[34]:





'Hello World*********'






    Out[34]:





'****Hello World*****'



In [39]:

    
format(text, '>20')
format(text, '<20')
format(text, '^20')
format(text, '=>20s')









    Out[39]:





'         Hello World'






    Out[39]:





'Hello World         '






    Out[39]:





'    Hello World     '






    Out[39]:





'=========Hello World'



In [40]:

    
'{:>10s} {:>10s}'.format('Hello', 'World')









    Out[40]:





'     Hello      World'

合并拼接字符串

如果你想要合并的字符串是在一个序列或者 iterable 中，那么最快的方式就是使用 join() 方法
只是合并少数几个字符串，使用加号(+)通常已经足够了



In [50]:

    
str1 = 'hello'
str2 = 'world'
# 生成一个元组
*str1, *str2
(*str1, *str2)
# list
[*str1, *str2]
# set
{*str1, *str2}
#dict
{key: value for key, value in zip(str1, str2)}









    Out[50]:





('h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd')






    Out[50]:





('h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd')






    Out[50]:





['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd']






    Out[50]:





{'d', 'e', 'h', 'l', 'o', 'r', 'w'}






    Out[50]:





{'e': 'o', 'h': 'w', 'l': 'l', 'o': 'd'}



In [52]:

    
str1.join(str2)









    Out[52]:





'whelloohellorhellolhellod'



In [53]:

    
str1 + str2









    Out[53]:





'helloworld'



In [54]:

    
parts = ['Is', 'Chicago', 'Not', 'Chicago?']
' '.join(parts)









    Out[54]:





'Is Chicago Not Chicago?'



In [57]:

    
x = [1, 2, 4, -1, 0, 2]
def sam(arg):
    yield(arg)
for o in sam(x):
    o









    Out[57]:





[1, 2, 4, -1, 0, 2]

字符串中插入变量

Python并没有对在字符串中简单替换变量值提供直接的支持。但是通过使用字符串的 format() 方法来解决这个问题
如果要被替换的变量能在变量域中找到，那么你可以结合使用 format_map() 和 vars()



In [58]:

    
s = '{name} has {n} messages.'
s.format(name='Guido', n=37)









    Out[58]:





'Guido has 37 messages.'



In [59]:

    
name = 'Guido'
n = 37
s.format_map(vars())









    Out[59]:





'Guido has 37 messages.'

以指定列宽格式化字符串

使用 textwrap 模块来格式化字符串的输出



In [62]:

    
# 同 Linux 一样, python 中同样可以使用 \ 表示换行
s = "Look into my eyes, look into my eyes, the eyes, the eyes, \
the eyes, not around the eyes, don't look around the eyes, \
look into my eyes, you're under."



In [63]:

    
s









    Out[63]:





"Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under."

字节字符串上的字符串操作

字节字符串同样也支持大部分和文本字符串一样的内置操作
同样也适用于字节数组



In [65]:

    
data = b'Hello World'
data
data[0:5]
data.startswith(b'Hello')
data.split()
data.replace(b'Hello', b'Hello Cruel')









    Out[65]:





b'Hello World'






    Out[65]:





b'Hello'






    Out[65]:





True






    Out[65]:





[b'Hello', b'World']






    Out[65]:





b'Hello Cruel World'



In [67]:

    
data = bytearray(b'Hello World')
data
data[0:5]
data.startswith(b'Hello')
data.split()
data.replace(b'Hello', b'Hello Cruel')









    Out[67]:





bytearray(b'Hello World')






    Out[67]:





bytearray(b'Hello')






    Out[67]:





True






    Out[67]:





[bytearray(b'Hello'), bytearray(b'World')]






    Out[67]:





bytearray(b'Hello Cruel World')



In [ ]: