My first jupyter notebook, demonstrating some code from class and the second chapter.
We explore somewhat how to write efficient algorithms. How do we define efficient though? Typically order of magnitude, or O(n**2) is least efficient. This is because we are squaring n with each iteration or function of O. On the scale of algorithm efficieny O(n**2) is closer to being an exponential algorithm. Which is bad, becase if n is greater than a certain number modern computers would not be able to calculate for n, we want algorithms as close to linear as possible. In the case that we are working with large amounts of data or doing huge mathematical expressions the problem becomes exponentially harder to solve, which takes more time.</span>
In [22]:
# Importing some things
import matplotlib.patches as patches
from matplotlib import pyplot
from time import time
import math
from executiontime import execution_time
from issubstring import *
from random_string import *
In [42]:
title = 'Plotting Big O(): '
label_y = ''
label_x = ''
#"Lower bound"
lb = 1
#"Upper bound"
ub = 30000
vals = range(lb, ub)
ub = ub * 100
# Exponential
exp = [x ** 2 for x in vals]
# Linear
fn = [10 * x + 0.1 * x * math.log(x) for x in vals]
# Logorithmic
gn = [0.01 * x ** 2 for x in vals]
expplot = pyplot.plot(exp, label='Exponential')
linplot = pyplot.plot(fn, label='Linear')
logplot = pyplot.plot(gn, label='Logorithmic')
pyplot.legend(loc='best')
pyplot.ylabel(label_y)
pyplot.xlabel(label_x)
pyplot.title(title)
pyplot.grid(False)
pyplot.show()
In this case the Linear function which is green on the plot above, is the fastest algorithm. Another way to think about the problem of writing efficient algorithms: an ideal algorithm would be written such that it both takes the least amount of steps and that the steps taken also take the least amount of steps themselves. Since python is a very high level language as n becomes larger so too do the amount of steps exponentially, making processing times grow exponentially as well. So an ideal algorithm scales in order of magnitude as linearly as possible.
This one was tricky, here's how Dylan or this guy wrote the algorithm.
But I didn't want to do something so simple with so many lines of code. I wrote my own. This is my attempt at the Boyer-Moore string search algorithm:
In [5]:
string = 'bazbarthequickbrownfoxjumpsoverthelazydogoffthequickbrownfoxjumpsoverthelazydogthequickbrownoofjumpsoverthelazyfoo'
substring = 'foo'
print(isSubStr().is_substring(substring, string))
print(isSubStr().is_substring('bleh', 'blah'))
Say we have the string "ANPANMAN" and want to find the substring "PAN", this is the Boyer-Moore algorithm in a nutshell:
In [6]:
string = 'ANPANMAN'
print(string)
fstring = string[0:3]
print(fstring + '_____')
fstring = string[1:4]
print('.' + fstring + '____')
fstring = string[2:5]
print('..' + fstring + '___')
fstring = string[3:6]
print('...' + fstring + '__')
fstring = string[4:7]
print('....' + fstring + '_')
fstring = string[5:8]
print('.....' + fstring)
fstring = string[6:9]
print('......' + fstring)
Demonstrating how I want my algorithm to behave:
In [7]:
print(isSubStr().is_substring('foo', 'FOO'))
print(isSubStr().is_substring('foo', 'oof'))
print(isSubStr().is_substring('foo', 'foofoofoo'))
Of course this is just to demonstrate learning how to implement an algorithm from scratch (and testing for O()), but there's the default function:
In [8]:
print(isSubStr().default_is_substring('foo', 'FOO'))
print(isSubStr().default_is_substring('foo', 'oof'))
print(isSubStr().default_is_substring('foo', 'foofoofoo'))
""" issubstring.py | Tue, Jan 31, 2017 | Roman S. Collins Implementation of Boyer-Moore string search algorithm in Python. Source: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm https://duckduckgo.com/?q=python+substring+function&t=ffab&ia=q://duckduckgo.com/?q=python+substring+function&t=ffab&ia=qa """ class isSubStr: string = 'checkoffcheckoofcheckfool' substring = 'foo' def is_substring(self, substring, string): lcolon = 0; rcolon = len(substring); found = False for a_iteration in range(len(string) + 1): astring = string[lcolon:rcolon] lcolon += 1; rcolon += 1; check_char = 0 for b_iteration in range(len(astring)): if astring[b_iteration] == substring[b_iteration]: check_char += 1 if check_char == len(substring): found = True if found == True: return found #if astring == substring: # Still comparing whole strings # found = True # return found # pass if rcolon >= (len(string) + 1): found = False return found def default_is_substring(self, a, b): return a in b def main(): """ Doctest: >>> string = 'ANPANMAN' >>> substring = 'PAN' >>> print(isSubStr().is_substring(substring, string)) True >>> print(isSubStr().default_is_substring(substring, string)) True """ string = 'ANPANMAN' substring = 'PAN' while isSubStr().is_substring(substring, string): break print(isSubStr().is_substring(substring, string)) if __name__=='__main__': main()
This was shown in the book as the monkey theorem problem. (And if I recall correctly in Jim showed an algorithm in class...)
This is mine. The function takes n equal to the length of the string you would like. Possible characters are simply every letter in the English alphabet plus space:
In [9]:
print(randStr().rand_string(10))
""" random_string.py | Wed, Feb 01, 2017 | Roman S. Collins Demonstrating returning random strings in Python. """ import random class randStr: def rand_string(self, n): abc = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' '] abc_copy = list(abc[x] for x in range(len(abc))) return ''.join(random.sample(abc_copy, n)) def main(): """ Again only sanity checks, because output of the function changes. Doctest: >>> len(randStr().rand_string(10)) 10 >>> isinstance(randStr().rand_string(10), str) True """ # Fill main with something other than pass print(randStr().rand_string(10)) if __name__=='__main__': main()
Let's find out how my algorithm scales compared to Python's builtin version, and some others:
In [14]:
from executiontime import execution_time
from issubstring import *
from matplotlib import pyplot
def main():
""" n is equal to the sample size, each sample is 10,000 iterations of each function
to attempt to get a more accurate average execution time from system time.
"""
n = 30000
substring = 'PAN'
fullstring = 'ANPANMAN'
is_substring_times = []
for i in range(1, n + 1):
is_substring_time = execution_time(isSubStr().is_substring(substring, fullstring), 10000)
is_substring_times.append(is_substring_time[1])
#print(is_substring_times)
pyplot.plot(is_substring_times, label='is_substring()')
default_is_substring_times = []
for i in range(1, n + 1):
default_is_substring_time = execution_time(isSubStr().default_is_substring(substring, fullstring), 10000)
default_is_substring_times.append(default_is_substring_time[1])
#print(default_is_substring_times)
pyplot.plot(default_is_substring_times, label='default_is_substring()')
dylan_is_substring_times = []
for i in range(1, n + 1):
dylan_is_substring_time = execution_time(isSubStr().dylan_is_substring(substring, fullstring), 10000)
dylan_is_substring_times.append(dylan_is_substring_time[1])
#print(is_substring_times)
pyplot.plot(dylan_is_substring_times, label='dylan_is_substring()')
jim_is_substring_times = []
for i in range(1, n + 1):
jim_is_substring_time = execution_time(isSubStr().jim_is_substring(substring, fullstring), 10000)
jim_is_substring_times.append(jim_is_substring_time[1])
#print(is_substring_times)
pyplot.plot(jim_is_substring_times, label='jim_is_substring()')
title = 'Plotting Substring Search Algorithms: '
label_y = ''
label_x = ''
pyplot.legend(loc='best')
pyplot.ylabel(label_y)
pyplot.xlabel(label_x)
pyplot.title(title)
pyplot.grid(False)
pyplot.show()
if __name__ == '__main__':
main()
In [ ]:
def main():
n = 100
substring = 'fjwefiojxcvnm'
fullstring = """asdfkljasdflkalsdkfjlasdflkjasdflkjasdflasdflkjasdflkjasdflkjasdfljkasdflkjasdflkjasdflkjas
dflkjfghdsfkljljweoidfjwefiojxcvnm"""
is_substring_times = []
for i in range(1, n + 1):
is_substring_time = execution_time(isSubStr().is_substring(substring, fullstring), 10000)
is_substring_times.append(is_substring_time[1])
#print(is_substring_times)
pyplot.plot(is_substring_times, label='is_substring()')
default_is_substring_times = []
for i in range(1, n + 1):
default_is_substring_time = execution_time(isSubStr().default_is_substring(substring, fullstring), 10000)
default_is_substring_times.append(default_is_substring_time[1])
#print(default_is_substring_times)
pyplot.plot(default_is_substring_times, label='default_is_substring()')
dylan_is_substring_times = []
for i in range(1, n + 1):
dylan_is_substring_time = execution_time(isSubStr().dylan_is_substring(substring, fullstring), 10000)
dylan_is_substring_times.append(dylan_is_substring_time[1])
#print(is_substring_times)
pyplot.plot(dylan_is_substring_times, label='dylan_is_substring()')
jim_is_substring_times = []
for i in range(1, n + 1):
jim_is_substring_time = execution_time(isSubStr().jim_is_substring(substring, fullstring), 10000)
jim_is_substring_times.append(jim_is_substring_time[1])
#print(is_substring_times)
pyplot.plot(jim_is_substring_times, label='jim_is_substring()')
title = 'Plotting Substring Search Algorithms: '
label_y = ''
label_x = ''
pyplot.legend(loc='best')
pyplot.ylabel(label_y)
pyplot.xlabel(label_x)
pyplot.title(title)
pyplot.grid(False)
pyplot.show()
if __name__ == '__main__':
main()