The Burrows–Wheeler transform is an algorithm used to prepare data for use with data compression techniques such as bzip2.
When a character string is transformed by the BWT, the transformation permutes the order of the characters. If the original string had several substrings that occurred often, then the transformed string will have several places where a single character is repeated multiple times in a row.
function BWT (string s)
create a table, rows are all possible rotations of s
sort rows alphabetically
return (last column of the table)
function inverseBWT (string s)
create empty table
repeat length(s) times
// first insert creates first column
insert s as a column of table before first column of the table
sort rows of the table alphabetically
return (row that ends with the 'EOF' character)
In [48]:
END_CHAR = '$'
def rotate(s, n):
return s[n:] + s[:n]
In [43]:
def bwt(text):
return ''.join(s[-1] for s in sorted([rotate(text, i) for i in range(len(text))]))
In [71]:
def ibwt(text):
table = [''] * len(text)
for _ in range(len(text)):
table = sorted([c + s for c, s in zip(text, table)])
return next(filter(lambda s: s.endswith(END_CHAR), table)).strip(END_CHAR)
In [69]:
bwt('alex' + END_CHAR)
Out[69]:
In [72]:
ibwt('x$lae')
Out[72]:
In [75]:
bwt('abababababaababababaaabababbababbabcabababbcaccacabcabcbabab' + END_CHAR)
Out[75]:
In [76]:
ibwt('bbbabbbbbbbbab$acbbbbbbbccccaaaaaacaaaaaabaaaabaaaaabababcbba')
Out[76]: