Group-ID: 15
Group members: Emanuele Rossi, Nikolay Yotov, Tencho Tenev
The aim of this coursework is to enhance your algorithmic skills by mastering the divide and conquer and dynamic programming strategies. You are asked to show that you can:
This notebook is the coursework. It contains cells with function definitions that you will need to complete. You will submit this notebook as your coursework.
The comparisons of different algorithms involve textual descriptions and graphical plots. For graphing you will be using matplotlib to generate plots. This tutorial will be useful to go through to get you up to speed. For the textual descriptions you may wish to use LaTeX inline like $\mathcal{O}(n\log{}n)$. Double click this cell to reveal the required markup - and see here for useful guidance on producing common symbols used in asymptotic run time analysis.
Here we define a collection of functions that will be useful for the rest of the coursework. You'll need to run this cell to get started.
In [ ]:
# so our plots get drawn in the notebook
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
from pathlib import Path
from sys import maxsize
from time import clock
from urllib.request import urlretrieve
# a timer - runs the provided function and reports the
# run time in ms
def time_f(f):
before = clock()
f()
after = clock()
return after - before
# we can get a word list from here - we download it once
# to 'wordlist.txt' and then reuse this file.
url = 'http://www.doc.ic.ac.uk/~bglocker/teaching/wordlist.txt'
if not Path('wordlist.txt').exists():
print("downloading word list...")
urlretrieve(url, 'wordlist.txt')
print('acquired word list.')
with open('wordlist.txt') as f:
# here we use a *set* comprehension - just
# like we've done with lists in the past but
# the result is a set so each element is
# guaranteed to be unique.
# https://docs.python.org/3/tutorial/datastructures.html#sets
# note that you can loop over a set just like you would a list
wordlist = {l.strip() for l in f.readlines()}
print("loaded set of words into 'wordlist' variable")
Complete the below definition for levenshtein_distance
. Do not change the name of the function or it's arguments.
Hints:
np
). Numpy is the crown jewel of the scientific Python community - it provides a multidimensional array (np.array()
) which can be very convenient to solve problems involving matrices.
In [ ]:
def levenshtein_distance(x, y):
# complete function without changing signature
n = len(x)
m = len(y)
d = np.zeros((n + 1, m + 1))
for i in range (n + 1):
d[i, 0] = i
for j in range (m + 1):
d[0, j] = j
for i in range (1, m + 1):
for j in range (1, n + 1):
c = 0 if x[j - 1] == y[i - 1] else 1
d[j, i] = min(d[j - 1, i] + 1, min(d[j, i - 1] + 1, d[j - 1, i - 1] + c))
return d[n, m]
Use this test to confirm your implementation is correct.
In [ ]:
print(levenshtein_distance('sunny', 'snowy') == 3)
print(levenshtein_distance('algorithm', 'altruistic') == 6)
print(levenshtein_distance('imperial', 'empirical') == 3)
print(levenshtein_distance('weird', 'wired') == 2)
Use your levenshtein_distance
function to find the closest_match
between a candidate
word and an iterable of words
. Note that if multiple words from words
share the minimal edit distance to the candidate
, you should return the word which would come first in a dictionary.
As a concrete example, zark
has an edit distance of 1 with both ark
and bark
, but you would return ark
as it comes lexicographically before bark
.
Your function should return a tuple of two values - first the closest word match, and secondly the edit distance between this word and the candidate.
closest, distance = closest_match('zark', ['ark', 'bark', ...])
assert closest == 'ark'
assert distance == 1
In [ ]:
def closest_match(candidate, words):
# complete function without changing signature
n = len(words)
max_edit_dist = len(candidate)+1
closest_word = ""
for word in words:
temp = levenshtein_distance(candidate, word)
if temp < max_edit_dist :
max_edit_dist = temp
closest_word = word
return closest_word, max_edit_dist
Run the below cell to test your implementation
In [ ]:
# A one liner that queries closest_match and then prints the result
print_closest = lambda w, wl: print('{}: {} ({})'.format(w, *closest_match(w, wl)))
print_closest('zilophone', wordlist)
print_closest('inconsidrable', wordlist)
print_closest('bisamfiguatd', wordlist)
Discuss in a few lines the running time of closest_match
. Can you propose any ideas for making this faster? (Only discuss those in words, no need to do any implementations, unless you want to.)
If we have a string $s_1$ of length $n$ and a string $s_2$ of length $m$, then we know that the complexity of the levenshtein_distance algorithm is $\mathcal{O}(nm)$.
Since levenshtein_distance() is called in a loop of length $l$, where $l$ is the size of the list $words$, then a call to closestmatch(candidate, words) will have a complexity of $\mathcal{O}(nlm{max})$, where $n$ is the size of candidate, and $m_{max}$ is the size of the longest word in the list.
Coin change is the problem of finding the least number of coins for a given amount of money.
For example, the UK coin set contains the following coins: 1p, 2p, 5p, 10p, 20p, 50p, £1, £2, and £5 (very uncommon). For £2.82, the optimal change is £2, 50p, 20p, 10p, 2p.
i) Implement the coin_change
function and answer the following questions by running an experimental analysis.
ii) How many coins are needed on average to represent any amounts between £0.01 and £5.00 with the UK coin set?
iii) How many more coins are needed on average to represent any amounts between £0.01 and £5.00 if we were to remove both the 10p and 20p coins from the UK coin set?
iv) If you had to decide whether to keep the 10p or the 20p coin in the UK coin set, which one would you choose?
Let $n$ be the amount you want to find the change for, and $c$ be the size of the coin set.
In [ ]:
def coin_change_naive(n,coins):
change = [0 for x in range(n + 1)]
def coin_change_helper(n, coins):
# n is 0, basecase. We need no coins
if n == 0:
return 0, change
sol = -1
# Try every possible coin and recursively compute the best solution
for c in reversed(coins):
if n - c >= 0:
sol_t, _ = coin_change_helper(n - c, coins)
sol_t = sol_t + 1
if sol_t < sol or sol == -1:
sol = sol_t
change[n] = c
return sol, change
return coin_change_helper(n, coins)
In [ ]:
def coin_change_top_down(n, coins):
change = [0 for x in range(n + 1)]
sol = [0 for x in range(n + 1)]
def coin_change_helper(n, coins):
# n is 0, basecase. We need no coins
if n == 0:
return 0, change
# We have already computed the solution, just return it
if sol[n] != 0:
return sol[n], change
# Try every possible coin and recursively compute the best solution
for c in reversed(coins):
if n - c >= 0:
sol_t, _ = coin_change_helper(n - c, coins)
sol_t = sol_t + 1
if sol_t < sol[n] or sol[n] == 0:
sol[n] = sol_t
change[n] = c
return sol[n], change
return coin_change_helper(n, coins)
In [ ]:
def coin_change_top_down_increasing_ord(n, coins):
change = [0 for x in range(n + 1)]
sol = [0 for x in range(n + 1)]
def coin_change_helper(n, coins):
# n is 0, basecase. We need no coins
if n == 0:
return 0, change
# We have already computed the solution, just return it
if sol[n] != 0:
return sol[n], change
# Try every possible coin and recursively compute the best solution
for c in coins:
if n - c >= 0:
sol_t, _ = coin_change_helper(n - c, coins)
sol_t = sol_t + 1
if sol_t < sol[n] or sol[n] == 0:
sol[n] = sol_t
change[n] = c
return sol[n], change
return coin_change_helper(n, coins)
In [ ]:
def coin_change_bottom_up(n,coins):
dp = (n + max(coins))*[n]
dp_path = (n + max(coins))*[0]
for coin in coins:
if coin <= n:
dp[coin] = 1
dp_path[coin] = coin
for i in range(n):
for coin in coins:
if dp[i] + 1 < dp[i + coin]:
dp[i + coin] = dp[i] + 1
dp_path[i + coin] = coin
return (dp[n], dp_path)
It is important to notice that this greedy solution only works for particular sets of coins (including the given one), but it is not a generally correct solution. We include it here only for analysis against the dynamic programming approach and to show a more efficient way to solve the problem by making use of additional information in advance (the coin set in this case).
In [ ]:
def coin_change_greedy(n, coins):
sol = 0
for c in reversed(coins):
if n >= c:
mod = n % c
times = (n - mod) / c
n = mod
sol = sol + times
return sol
General implementation (can choose which specific one to call)
In [ ]:
def coin_change(n, coins):
return coin_change_top_down(n, coins)
In [ ]:
def print_change(n,coins):
counts, change = coin_change_naive(n,coins)
while n > 0:
print(change[n])
n = n - change[n]
UK_coin_set = [1,2,5,10,20,50,100,200,500]
print_change(25,UK_coin_set)
In [ ]:
test_coin_set = [1,2,5,10,20,50,100,200,500]
test_data = [i for i in range(22)]
for t in test_data:
assert coin_change_naive(t, test_coin_set)[0] == coin_change(t, test_coin_set)[0]
assert coin_change_naive(t, test_coin_set)[0] == coin_change_bottom_up(t, test_coin_set)[0]
assert coin_change_bottom_up(t, test_coin_set)[0] == coin_change_greedy(t, test_coin_set)
In [ ]:
test_coin_set = [1,2,5,10,20,50,100,200,500]
test_data = [i for i in range(1200)]
for t in test_data:
assert coin_change_bottom_up(t, test_coin_set)[0] == coin_change(t, test_coin_set)[0]
assert coin_change_bottom_up(t, test_coin_set)[0] == coin_change_greedy(t, test_coin_set)
In [ ]:
data = range(1,23)
In [ ]:
dp_top_down_res = []
naive_res = []
for i in data:
dp_top_down_res.append(time_f(lambda: coin_change_top_down(i, UK_coin_set)[0]))
naive_res.append(time_f(lambda: coin_change_naive(i, UK_coin_set)[0]))
In [ ]:
%matplotlib inline
from matplotlib import pyplot as plt
top_down = plt.scatter(data, dp_top_down_res, c='red')
naive = plt.scatter(data, naive_res, c='blue')
plt.xlabel('n')
plt.ylabel('time (/s)')
plt.xlim(0)
plt.ylim(0)
plt.legend((top_down, naive),
('Top Down', 'Naive'))
From the chart above we can see the exponential nature of the naive solution with respect to the top down dp approach.
In [ ]:
data = range(1,20000,100)
In [ ]:
dp_top_down_res = []
dp_bottom_up_res = []
greedy_res = []
for i in data:
dp_top_down_res.append(time_f(lambda: coin_change_top_down(i, UK_coin_set)))
dp_bottom_up_res.append(time_f(lambda: coin_change_bottom_up(i, UK_coin_set)))
greedy_res.append(time_f(lambda: coin_change_greedy(i, UK_coin_set)))
In [ ]:
%matplotlib inline
from matplotlib import pyplot as plt
top_down = plt.scatter(data, dp_top_down_res, c='red')
bottom_up = plt.scatter(data, dp_bottom_up_res, c='blue')
greedy = plt.scatter(data, greedy_res, c='green')
plt.xlabel('n')
plt.ylabel('time (/s)')
plt.xlim(0)
plt.ylim(0)
plt.legend((top_down, bottom_up, greedy),
('Top Down', 'Bottom Up', 'Greedy'))
From the chart above we can see how both the top down and the bottom up dp approaches are linear (given a fixed $c$, as it is in our case). Moreover, we can see that the greedy approach is constant in time, and therefore much more efficient.
However, even though the two dp approaches have the same complexity, from the chart it is easy to notice how the bottom up approach is faster in practice. This is not a contradiction, since it simply means that they have different coefficients in their linearity.
In [ ]:
data = range(1,4000,1)
In [ ]:
import sys
sys.setrecursionlimit(23000)
dp_top_down_res = []
dp_top_down_incr_res = []
for i in data:
dp_top_down_res.append(time_f(lambda: coin_change_top_down(i, UK_coin_set)[0]))
dp_top_down_incr_res.append(time_f(lambda: coin_change_top_down_increasing_ord(i, UK_coin_set)[0]))
In [ ]:
%matplotlib inline
from matplotlib import pyplot as plt
top_down = plt.scatter(data, dp_top_down_res, c='red')
top_down_incr = plt.scatter(data, dp_top_down_incr_res, c='blue')
plt.xlabel('n')
plt.ylabel('time (/s)')
plt.xlim(0)
plt.ylim(0)
plt.legend((top_down, top_down_incr),
('Top Down', 'Top Down Increasing Order'))
Answer questions (ii)-(iv) here.
In [ ]:
# Avg number of coins from 0.01 to 5.00
def get_avg(coin_set, start_amount=1, end_amount=500):
vals = []
for val in range (start_amount, end_amount + 1):
c, _ = coin_change(val, coin_set)
vals.append(c)
return sum(vals) / len(vals)
print('Average number of coins needed to represent amounts between 0.01 and 5.00 is', get_avg(UK_coin_set))
# Avg number of coins from 0.01 to 5.00 if coins 10 and 20 are removed from coin set
reduced_coin_set = [coin for coin in UK_coin_set if coin not in [10,20]]
print('''Average number of coins needed to represent amounts between 0.01 and 5.00 \
without using coins of 10 and 20 is''', get_avg(reduced_coin_set))
coin_set_wo_20_avg = get_avg(reduced_coin_set + [10])
coin_set_wo_10_avg = get_avg(reduced_coin_set + [20])
if coin_set_wo_20_avg <= coin_set_wo_10_avg:
if coin_set_wo_20_avg == coin_set_wo_10_avg:
print("It doesn't matter which one of the two coins is removed")
else:
print("It's better to keep the coin 10")
else:
print("It's better to keep the coin 20")
In [ ]: