Consider a string of finite length $m$ Let it be $T$. Finding whether a string $P$ of length $n$ exsists in $T$ is known as String Matching, Following is some of the comparision based String Matching Algorithms.
Before looking at the analysis part, we shall examine the Language in built methods to sorting
in
operator and str.index()
We have already seen the in
operator in several contexts. Let's see the working of in
operator again
In [1]:
x = 'this is some random text used for illustrative purposes'
In [2]:
x
Out[2]:
In [3]:
'this' in x
Out[3]:
In [4]:
'not' in x
Out[4]:
In [5]:
x.index('is')
Out[5]:
In [6]:
x.index('not')
In [10]:
from openanalysis.string_matching import StringMatchingAlgorithm,StringMatchingAnalyzer
%matplotlib inline
%config InlineBackend.figure_formats={"svg", "pdf"}
StringMatchingAlgorithm
is the base class providing the standards to implement sorting algorithms, SearchVisualizer
visualizes and analyses the algorithm
StringMatchingAlgorithm
classAny String Matching Algorithm, which has to be implemented, has to be derived from this class. Now we shall see data members and member functions of this class.
name
- Name of the Searching Algorithmcount
- Holds the number of basic operations performed__init__(self, name):
- Initializes algorithm with a name
match(self, text, pattern)
_ The base String Matching function. Sets count
to 0. Now we shall implement the class Horspool
In [11]:
class Horspool(StringMatchingAlgorithm): # Must derive from StringMatchingAlgorithm
def __init__(self):
StringMatchingAlgorithm.__init__(self, "Hosrpool String Matching")
self.shift_table = {}
self.pattern = ''
def generate_shift_table(self, pattern): # class is needed to include helper methods
self.pattern = pattern
for i in range(0, len(pattern) - 1):
self.shift_table[pattern[i]] = len(pattern) -i - 1
def match(self, text: str, pattern: str):
StringMatchingAlgorithm.match(self, text, pattern)
self.generate_shift_table(pattern)
i = len(self.pattern) - 1
while i < len(text):
j = 0
while j < len(self.pattern) and text[i-j] == self.pattern[len(self.pattern)-1-j]:
j += 1
self.count += j # Increment count here
if j == len(self.pattern):
return i-len(self.pattern)+1
if text[i] in self.shift_table:
i += self.shift_table[text[i]]
else:
i += len(self.pattern)
return -1
StringMatchingAnalyzer
classThis class provides the visualization and analysis methods. Let's see its methods in detail
__init__(self, matching):
Initializes visualizer with a String Matching Algorithm. searcher
is a class, which is derived from StringMatchingAlgorithm
analyze(self,progress = True):
progress
indicates whether Progress Bar has to be shown or not
In [13]:
StringMatchingAnalyzer(Horspool).analyze(progress=False)
Note $\mathcal{O}(n)$ performance of algorithm
You can see more examples at Github