Typing quickly and accurately on a smartphone screen is hard! One invention to make it easier is gesture typing, in which your finger can trace a path consisting of letter-to-letter segments. When you lift your finger the path (and the word) is complete. Below we see the path for the word "hello." Note that the path is imprecise; it didn't quite hit the "L", but the word was recognized anyways, because "Hello" is a known word, whereas "Hekko", "Hwerklo", etc., are not.
My colleague Nicolas Schank examined (and answered) the question of what word has the longest path length. I mentioned this to Shumin Zhai, the pioneer of gesture typing, and between the three of us we expanded the list of questions:
Let's look at each of these questions, but first, let's get a rough idea for of the concepts we will need to model.
We will need to talk about the following concepts:
Note: Before we get started writing any real code, I've taken all the imports I will need throughout this notebook and gathered them together here:
In [1]:
%pylab --no-import-all inline
from __future__ import division
from collections import Counter
import matplotlib.pyplot as plt
import urllib
import itertools
import random
import re
The representation for a keyboard needs to describe the location of each of the letters. Using the principle of "Do the simplest thing that could possibly work," I represent a keyboard as a dict
of {letter: point}
pairs, where there will be 26 letters, A-Z,
and each point will mark the x-y coordinates of the center of the corresponding key. In a standard keyboard the letters are not all on a strict rectangular grid; the A key is half way between the Q and W in the horizontal direction. I would like to have a programmer-friendly way of defining keyboard layouts. For example, a programmer should be able to write:
Keyboard(('Q W E R T Y U I O P',
' A S D F G H J K L ',
' Z X C V B N M '))
and this would be equivalent to the dict
:
{'Q': Point(0, 0), 'W': Point(1, 0), ...
'A': Point(0.5, 1), 'S': Point(1.5, 1), ...
'Z': Point(1.5, 2), 'X': Point(2.5, 2), ...}
Note that one key width is two characters in the input to Keyboard
. Here is the implementation:
In [2]:
def Keyboard(rows):
"A keyboard is a {letter: location} map, e.g. {'Q':Point(0, 0), 'A': Point(0.5, 1)}."
return {ch: Point(x/2, y)
for (y, row) in enumerate(rows)
for (x, ch) in enumerate(row)
if ch != ' '}
What about Point
? At first glance, Python does not appear to have a two-dimensional point as a builtin data type, but
on second thought, it does: complex
. A complex number is a point in the two-dimensional complex plane;
we can use that to model the two-dimensional (x, y) plane. Because complex numbers are built in, manipulating them will be efficient. A bonus is that the distance between points A
and B
is simply abs(A-B)
; easier than the usual formula involving squares and a square root. Thus, the simplest possible thing I could do to represent points is
Point = complex
That would work fine. However, I would like to refer to the x coordinate of point p
as p.x
rather than p.real
, and I would like points to display nicely, so I will do the second-simplest thing: make Point
be a subclass of complex
with x
and y
properties and a __repr__
method:
In [3]:
class Point(complex):
"A point in the (x, y) plane."
def __repr__(self): return 'Point({}, {})'.format(self.x, self.y)
x = property(lambda p: p.real)
y = property(lambda p: p.imag)
def distance(A, B):
"The distance between two points."
return abs(A - B)
Design notes for Point: Alternative representations for points include an (x, y)
tuple, or a NumPy two-element array, or a namedtuple.
Design notes for Keyboard: Alternatives for Keyboard
include a subclass of dict
, or a class that contains a dict
.
I considered support for keyboards
in which one row is, say, offset 1/3 of a key rather than 1/2 key from the row above. In the end I decided to keep the function Keyboard
as simple as possible; you can always define a function, FancyKeyboard
, if you want more.
Now we can check that Keyboard
works:
In [4]:
qwerty = Keyboard(('Q W E R T Y U I O P',
' A S D F G H J K L ',
' Z X C V B N M '))
qwerty
Out[4]:
In [5]:
W, O, R, D = qwerty['W'], qwerty['O'], qwerty['R'], qwerty['D'],
distance(W, O) + distance(O, R) + distance(R, D)
Out[5]:
Let's make a function to compute this:
In [6]:
def path_length(word, kbd=qwerty):
"The total path length for a word on this keyboard: the sum of the segment lengths."
return sum(distance(kbd[word[i]], kbd[word[i+1]])
for i in range(len(word)-1))
In [7]:
path_length('WORD')
Out[7]:
Let's check with a simpler example that we know the answer to:
In [8]:
path_length('TO')
Out[8]:
That makes it clearer—the O is four keys to the right of the T, on the same row, so the distance between them is 4.
Here's another one that you can verify on your own:
In [9]:
path_length('TYPEWRITER') == 1 + 4 + 7 + 1 + 2 + 4 + 3 + 2 + 1 == 25
Out[9]:
In [10]:
WORDS = set(urllib.urlopen('http://norvig.com/ngrams/TWL06.txt').read().split())
In [11]:
len(WORDS)
Out[11]:
That's a lot of words; which one has the longest path?
In [12]:
max(WORDS, key=path_length)
Out[12]:
And the longest ten paths? Including the lengths? We'll use a helper function, print_top
, which prints the top n items in a seqence according to some key function:
In [13]:
def print_top(n, sequence, key=None, fmt='{:.1f} {}'):
"Find the top n elements of sequence as ranked by key function, and print them."
for x in sorted(sequence, key=key, reverse=True)[:n]:
print(fmt.format(key(x), x))
print_top(10, WORDS, path_length)
In [14]:
def path_length_ratio(word, kbd=qwerty): return path_length(word, kbd) / len(word)
In [15]:
print_top(10, WORDS, path_length_ratio)
How did I find "TYPEWRITER," a long word whose letters are all on the top row? I could have looked it up, but instead I used the following code:
In [16]:
def on_top_row(word): return all(L in 'QWERTYUIOP' for L in word)
print_top(10, filter(on_top_row, WORDS), len)
What is the average segment length for a typical typing work load? To answer that, we need to know what a typical work load is.
We will read a file of "typical" text, and count up how many times each segment is used. We'll define a Workload
as a dict
of the form {segment: proportion, ...},
e.g. {'AB': 0.02}
, where each key is a two-letter string representing a segment, and each value is the proportion of time that segment appears in the workload. Since the distance from A
to B
on a keyboard is the same as the distance from B
to A
, we don't need both AB
and BA
as keys; we'll combine the counts for both of them under the alphabetically first one (in this case, AB
). So Workload
will first generate all two-character bigrams in the text, then filter out the non-alphabetic ones, then take the segment
of each pair, and count up (with Counter
) how many times each segment occurs, and finally normalize these counts so that they sum to 1.0, but retain their proportion to each other.
In [17]:
def Workload(text):
"""Create a Workload--a dict of the form {'AB': 1000, ...}
saying how often each letter pair occurs in text."""
pairs = filter(str.isalpha, bigrams(text))
return normalize(Counter(segment(A, B) for (A, B) in pairs))
def bigrams(sequence):
"Every sequence of two adjacent elements (overlapping) in sequence."
return (sequence[i:i+2] for i in range(len(sequence) - 1))
def segment(A, B):
"Canonical two-character segment: segment('A', 'B') == segment('B', 'A') == 'AB'."
return A+B if (A < B) else B+A
def normalize(dictionary):
"Normalize a {key: val} dict so that the sum of the vals is 1.0."
total = sum(dictionary.values())
for k in dictionary:
dictionary[k] /= total
return dictionary
Let's see what a workload looks like for a tiny text:
In [18]:
Workload('SHOT IS GOOD -- GOOOOOOOOOOOAL!')
Out[18]:
I happened to have a file of about a megabyte of random text, smaller.txt
; that should work fine as a typical work load:
In [19]:
TEXT = urllib.urlopen('http://norvig.com/ngrams/smaller.txt').read().upper()
WORKLOAD = Workload(TEXT)
Let's peek at the most common segments:
In [20]:
WORKLOAD.most_common(10)
Out[20]:
How many distict segments are there?
In [21]:
len(WORKLOAD)
Out[21]:
Now we want to compute the average segment length, on a given keyboard, over all the segments that appear in a workload, so just multiply the length of each segment by the proportion of time it appears and sum these up:
Close enough. Now we can compute the workload average:
In [22]:
def workload_average(kbd, workload=WORKLOAD):
"The average segment length over a workload of segments."
return sum(distance(kbd[A], kbd[B]) * workload[A+B]
for (A, B) in workload)
In [23]:
workload_average(qwerty)
Out[23]:
So, on average, your finger has to travel a little over 3 keys from one letter to the next over a typical workload.
We'll need a way of visualizing what a keyboard looks like. I could just print
letters, but I think it is more compelling to use IPython's plot
facility. In the function plot_kbd
we'll draw a square around the center point of each key, and annotate the square with the key letter. In show_kbd
, we'll call plot_kbd
and also print out the keyboard's name and workload average:
In [24]:
def plot_kbd(kbd, K=20):
"Plot the keyboard with square keys, K units on a side."
H = K / 2 ## (K is Key width/height; H is half K)
# Draw a square for each key
for L in kbd:
x, y = K * kbd[L].x, -K * kbd[L].y
plot_square(x, y, H, label=L)
# Show the plot
plt.axis('equal'); plt.axis('off'); plt.show()
def plot_square(x, y, H, label='', style='k-'):
"Plot a square with center (x, y), half-width H, and optional label."
plt.plot([x-H, x+H, x+H, x-H, x-H],
[y-H, y-H, y+H, y+H, y-H], style)
plt.annotate(label, (x-H/4, y-H/4)) # H/4 seems to place label well.
def show_kbd(kbd, name='keyboard'):
"Plot a keyboard and print statistics about it."
plot_kbd(kbd)
print('workload average = {:.1f} for {}'
.format(workload_average(kbd), name))
In [25]:
show_kbd(qwerty)
Now for a much harder question: can we find a different keyboard layout that has a smaller average segment length over the workload? First, let's note that there are two ways to modify a keyboard:
Let's start by limiting ourselves to just swapping letters.
This is an optimization problem. There are many permutations of letters; too many to try them all. To be precise, there are 26! (26 factorial) permutations, which is about 1026 (fun fact: 25 and 26 are the only integers for which n! &approx 10n). If we can't try them all, we need some way to sample the configurations, trying to make progress towards a better one. Again, we'll try the simplest thing that could possibly work:
In [26]:
def improve(kbd, swaps=1000, scorer=workload_average):
"Minimize scorer(kbd) by swapping keys and keeping improvements."
score = scorer(kbd)
letters = list(kbd)
for _ in range(swaps):
A, B = random.sample(letters, 2) # Step 1: pick two keys
swap(kbd, A, B) # Step 2: swap them
score2 = scorer(kbd)
if score2 < score: # Step 3: If better, keep them
score = score2 # (and record the new best total)
else:
swap(kbd, B, A) # Step 4: swap back if not better
return kbd
def swap(kbd, A, B): kbd[A], kbd[B] = kbd[B], kbd[A]
Note 1: This strategy is called hillclimbing, drawing on the metaphor of getting to a high peak by trying to take a step, and continuing if the step is uphill, and returning if it is not. This technique often finds a local maximum—a solution that is better than all its neighbors, but not as good as another solution that is many steps away.
Note 2: I make scorer
be a parameter, in case we later decide we want to minimize something else other than workload_average
(such as confusingness).
Note 3: The procedure improve
modifies its argument. If you called improve(qwerty)
, then qwerty
would be modified. We
probably don't want that. So, in analogy with the built-in function sorted
, I will define the pure function improved
to return a new keyboard, without modifying its argument:
In [27]:
def improved(kbd, swaps=1000, scorer=workload_average):
"Minimize scorer(kbd) by swapping keys and keeping improvements; don't modify kbd."
return improve(kbd.copy(), swaps, scorer)
Let's see how well we can do:
In [28]:
show_kbd(improved(qwerty, 1000))
That's a pretty good improvement! We decreased the workload average by about a third. (If you are reading this in an active IPython notebook, you can re-run the cell above and see a different result each time.)
improved
Let's get a better feeling for what improved
does. We will keep track of the workload average after each swap, and plot that as a line. (We know this line should be monotonically decreasing, starting at 3.23.) We will then plot a dozen instances of this line (different each time because the swaps are random).
I'll copy-and-paste improve
(regrettably violating the don't repeat yourself principle), and make improve_scores
, adding code that yields the average workload after each swap.
In [29]:
def improve_scores(kbd, swaps=1000, scorer=workload_average):
"A version of 'improve' that yields the workload average after each swap."
total = scorer(kbd)
letters = list(kbd)
T = sum(WORKLOAD.values()) # <<< NEW
for _ in range(swaps):
A, B = random.sample(letters, 2) # Step 1: pick two keys
swap(kbd, A, B) # Step 2: swap them
total2 = scorer(kbd)
if total2 < total: # Step 3: If better, keep them
total = total2 # (and record the new best total)
else:
swap(kbd, B, A) # Step 4: swap back if not better
yield total / T # <<< NEW
def workload_plot(kbd, swaps):
plt.ylabel('Workload average segment length')
plt.xlabel('Number of swaps');
plt.plot(list(improve_scores(kbd.copy(), swaps)))
workload_plot(qwerty, 3000)
That's interesting; it looks like most of the progress in in the first few hundred swaps. But this is just one random run of the algorithm. Let's compare ten runs:
In [30]:
for run in range(10):
workload_plot(qwerty, 3000)
This plot is even more interesting. (Note: on most browsers you can drag the lower-right corner of the plot to make it bigger.) I note the following:
This last point suggests a strategy: repeat the hillclimbing search multiple times, and take the best result. Let's investigate that strategy.
Suppose we have a certain budget of time that we are willing to spend to try to find a keyboard
with a low workload average. Suppose the budget is, say, 3,000 swaps and scorings. How should we spend this budget? We could have a single run of improved(qwerty, 3000)
. Or we could repeat improved(qwerty, 1000)
three times and take the best of the three results. Or repeat improved(qwerty, 500)
six times. From the plot above, it is not clear which of these approaches would be better. (From the plot it is clear that there is rapid progress in the first 500 swaps, so I wouldn't want each run to be much less than that.)
The function repeated_improved
allows you to decide how to split up your budget between long runs and repeated runs, and always takes the best of the repeated runs:
In [31]:
def repeated_improved(kbd, repeats=20, swaps=1000, scorer=workload_average):
"Try improved(kbd, swaps) multiple times and take the best result."
return min([improved(kbd, swaps, scorer) for run in range(repeats)],
key=scorer)
Let's see what this can do:
In [32]:
show_kbd(repeated_improved(qwerty, 1, 3000), "3000 swaps repeated once")
show_kbd(repeated_improved(qwerty, 3, 1000), "1000 swaps repeated three times")
show_kbd(repeated_improved(qwerty, 6, 500), "500 swaps repeated six times")
Let's do it again, to see how different the result is, and this time let's also check how long it takes:
In [33]:
%%time
show_kbd(repeated_improved(qwerty, 1, 3000), "3000 swaps repeated once")
show_kbd(repeated_improved(qwerty, 3, 1000), "1000 swaps repeated three times")
show_kbd(repeated_improved(qwerty, 6, 500), "500 swaps repeated six times")
So it looks like in about a second, we can repeatedly get down from a workload average of 3.2 to about 1.9. The resulting keyboard will not be the same each time, but will tend to have the rare letters in the corners and the common ones in the middle.
What happens if we work a lot harder? Say we have a budget of 30,000 instead of 3,000?
In [34]:
%%time
show_kbd(repeated_improved(qwerty, 1, 30000), "30000 swaps repeated once")
show_kbd(repeated_improved(qwerty, 15, 2000), "2000 swaps repeated fifteen times")
show_kbd(repeated_improved(qwerty, 60, 500), "500 swaps repeated sixty times")
It looks like spending more time doesn't help. I can't say we've found the best possible keyboard, but I can say it will take either a very long run time or a more clever algorithm to do much better than 1.9. I can also say that it doesn't seem to matter too much exactly how you manage the budget between repetitions and swaps.
Now let's allow keys to be in different physical locations. Rather than allowing complete freedom of movement, we'll start from a few different fixed key layouts and swap keys from there. I'll define three layouts and gather them into a dict
:
In [35]:
keyboards = {
'qwerty': Keyboard(('Q W E R T Y U I O P',
' A S D F G H J K L ',
' Z X C V B N M ')),
'4-by-7': Keyboard((' A B C D E F ',
'G H I J K L M',
' N O P Q R S ',
'T U V W X Y Z')),
'5-by-6': Keyboard((' A B C D E ',
' F G H I J ',
'K L M N O P',
' Q R S T U ',
' V W X Y Z '))
}
In [36]:
# Compare keyboards
for name in keyboards:
show_kbd(keyboards[name], name)
(Note: The plots of different-shaped keyboards have different-sized squares around the keys. Some of the plots have a lot of whitespace around them. If anyone knows an easy way to tell plot
to display them better, let me know.)
And here is a function to iterate over these layouts, applying repeated_improved
to each one and displaying the improved keyboard, the workload average before and after improvement, and the time taken:
In [37]:
def report(keyboards, scorer=workload_average):
"Iterate through a dict of {name: kbd} pairs, showing kbd before and after repeated_improved(kbd)."
for (name, kbd) in keyboards.items():
show_kbd(repeated_improved(kbd, scorer=scorer),
'repeated improved ' + name)
In [38]:
%%time
report(keyboards)
So, any keyboard can be improved, but the more compact keyboards with a smaller diameter (4-by-7
and 5-by-6
) perform slightly better than qwerty
.
But ... I still want to know: how close are we coming to the optimal keyboard? Can we do better than about 1.7 or 1.8?
One thing to try is to start from a better position. It is easier to do hillclimbing of Mt. Everest if you start at Everest Base Camp (elevation 17,000 ft) rather than at sea level. Let's see what we can do.
What if we started with a keyboard where the most common letters (as measured by the workload text) are in the center? Then, on average, many of the common segments would tend to be short. To make this a little more precise:
In [39]:
cat = ''.join ## Function to join strings together into one big string
# Order letters by frequency in TEXT
ordered_letters = cat(sorted(list(qwerty), reverse=True, key=lambda L: TEXT.count(L)))
def recentered(kbd):
"Put the most frequent letters in the center of this keyboard."
center = mean(kbd.values())
ordered_locations = sorted(kbd.values(), key=lambda point: abs(point-center))
return dict(zip(ordered_letters, ordered_locations))
def mean(numbers): return sum(numbers) / len(numbers)
In [40]:
ordered_letters
Out[40]:
In [41]:
show_kbd(recentered(qwerty), 'recentered qwerty')
In [42]:
for _ in range(10):
workload_plot(recentered(qwerty), 3000)
That's interesting. At first glance it looks like there is much more variability in the lines, compared to the previous plot for the plain qwerty
keyboard. But that's just an illusion. The Y axis is much more compressed on this plot; if you look at the numbers on the Y axis, you see the variation is actually smaller on this plot.
Let's try the report
function:
In [43]:
show_kbd(repeated_improved(recentered(qwerty)), 'repeated improved recentered qwerty')
So the starting layout of keys seems not to matter very much; you can start with a workload average of 3.2 or 2.1, but either way you end up with an improved version somewhere close to 1.9.
The best improvement on Qwerty that I've seen so far has workload average of 1.86. But can we estimate a lower bound on the best possible permutation of letters? Here's one approach:
L
on the keyboard, figure out the best possible rearrangement of the other letters to minimize the workload on segments that involve L
. The best rearrangement will have the letter that appears in the most segments with L
closest to L
; the letter that appears in the next most number of segments with L
the next closest to L
, and so on.I can't quite prove this is a lower bound, because of the assumption that we hold one end of the segments fixed in the recentered-keyboard layout. I'm pretty confident that the freedom to optimize the other end of each segment for each letter independently means that it is a lower bound, but I can't prove it. Still, here it is:
In [44]:
def workload_lower_bound(kbd=recentered(qwerty), workload=WORKLOAD):
"Given a keyboard, find a lower bound on the workload average."
letters = list(kbd)
def distances(L):
"Distances to L key, shortest distance first."
return sorted(abs(kbd[A] - kbd[L]) for A in letters if A != L)
def counts(L):
"Workload counts for segments involving L, biggest count first."
return sorted([workload[segment(A, L)] for A in letters if A != L],
reverse=True)
return (sum(dot_product(distances(L), counts(L)) for L in kbd)
/ sum(workload.values()) / 2) # Divide by 2 because we counted each segment twice
def dot_product(A, B):
"Σ_i{A_i * B_i}"
return sum(A[i] * B[i] for i in range(len(A)))
In [45]:
for name in keyboards:
print name, workload_lower_bound(keyboards[name])
So there's a gap of around 0.2 or 0.3 of a key distance between the best keyboard we've found so far, and the lower bound we computed for each keyboard shape. We don't know where within this gap the actual best keyboard lies. But rather than mind the gap, I'll turn my attention to other matters.
When can one word be confused with another? When their paths are similar (which means that their corresponding letters are in similar locations). For example, on a Qwerty keyboard, the paths for "HELLO" and "JELLO" are similar, because H and J are adjacent, and the other letters are the same.
We'd like to know, for a given keyboard, how confusing is it? How many words have paths on the keyboard that can be confused for other words? We have our work cut out for us:
So, as a first step, we will make a mapping from each key to the keys that it can be confused with. I'll say that any key within a distance of 1.5 units on the keyboard is a neighboring key, and thus a potential confusion:
In [46]:
def neighboring_keys(kbd, radius=1.5):
"Build a dict of {Letter:NeighboringLetters}, e.g. {'Q':'QAW', ...}."
def neighboring_letters(A):
return cat(B for B in kbd if distance(kbd[A], kbd[B]) <= radius)
return {A: neighboring_letters(A) for A in kbd}
def neighboring_keys(kbd, radius=1.5):
"Build a dict of {Letter:NeighboringLetters}, e.g. {'Q':'QAW', ...}."
def neighboring_letters(A):
return
return {A: cat(B for B in kbd if distance(kbd[A], kbd[B]) <= radius)
for A in kbd}
cat = ''.join ## Function to join letters (or strings) into one string
In [47]:
qwerty_neighbors = neighboring_keys(qwerty)
qwerty_neighbors
Out[47]:
In [48]:
choices = [qwerty_neighbors[L] for L in 'HELLO']
choices
Out[48]:
Think of this as five "columns" of letters, and if we pick one letter from each column, we get a path that is formed by letters that are each confusions of letters in the original word, and so the whole path is a confusion for the original word. So "JELLO" is a confusion for "HELLO", as would be "BEKKI" (formed by taking the first possible neighbor for each of the five letters) and "YWPPP" (formed by taking the last possible neighbors), except they are not words.
It turns out that there is a library function, itertools.product
, that will take an iterable of alternative choices, and generate all possible ways of assembling a sequence consisting of one selection (letter) from each alternative choice.
In [49]:
paths = {cat(letters) for letters in itertools.product(*choices)}
How many paths are there?
In [50]:
len(paths)
Out[50]:
Let's look at a few of them:
In [51]:
random.sample(paths, 8)
Out[51]:
And let's see all the paths that are also words:
In [52]:
WORDS & paths
Out[52]:
Only 4 out of 5000! That's pretty good, but it means "HELLO" is still a confusing word. We can wrap this up as the function confusions
:
In [53]:
def confusions(word, neighbors=neighboring_keys(qwerty), words=WORDS):
"All valid words whose paths could be confused with the path for the given word."
choices = [neighbors[L] for L in word]
return {cat(letters) for letters in itertools.product(*choices)} & words
In [54]:
confusions('HELLO')
Out[54]:
In [55]:
confusions('WORLD')
Out[55]:
In [56]:
confusions('TESTING')
Out[56]:
In [57]:
%time confusions('SOMETHING')
Out[57]:
It took (on my computer) 4 seconds to compute this. Why so long? Let's count:
In [58]:
[len(neighboring_keys(qwerty)[L]) for L in 'SOMETHING']
Out[58]:
There are 7 × 5 × 5 × 5 × 5 × 8 × 5 × 6 × 8 = 8,400,000 paths for confusions
to consider. Looking at them all takes 4 seconds, but for most of these paths, we're wasting our time. For example, one choice for the first two neighboring letters of 'SOMETHING' is 'XP', but 'XP' does not start any word in the dictionary. Nevertheless, itertools.product
will generate 240,000 combinations that start with 'XP', and will then rule them out one at a time. It would be better to stop as soon as we see 'XP', and never consider continuations of this path.
So that gives us the key idea for a more efficient version of confusions
: only follow paths that form a prefix of at least one word.
So, first we need to define the set of prefixes:
In [59]:
def prefixes(words):
"Return a set of prefixes (1 to N characters) of this collection of words."
return {word[:i] for word in words for i in range(1, len(word)+1)}
In [60]:
prefixes(['THESE', 'THEY', 'THOSE'])
Out[60]:
Let's generate the prefixes of all the words, and check how many there are:
In [61]:
PREFIXES = prefixes(WORDS)
len(PREFIXES), len(WORDS)
Out[61]:
We can describe the more efficient version of the confusions
algorithm:
In [62]:
def confusions(word, neighbors=neighboring_keys(qwerty), words=WORDS, prefixes=PREFIXES):
"All valid words whose paths could be confused with the path for this word."
results = set() # A set of words that are confusions of 'word'
Q = [''] # A queue of partial paths
while Q:
path = Q.pop()
if len(path) < len(word):
for L in neighbors[word[len(path)]]:
if path + L in prefixes:
Q.append(path + L)
elif path in words:
results.add(path)
return results
Let's do one quick check to see if we are getting the same answers as before:
In [63]:
confusions('HELLO')
Out[63]:
And let's see how much faster this version is:
In [64]:
%time confusions('SOMETHING')
Out[64]:
We went from about 4 seconds to under a millisecond: that's more than 4000 times faster! We can do a bigger test:
In [65]:
def words(text):
"Extract a list of all words from a text. Make everything uppercase."
return re.findall('[A-Z]+', text.upper())
test_words = words("""Hello world! Testing something: confusion paths on
multiple distinct inputs of various lengths. See if everything works
perfect, or poorly, or somewhere between the extremes.""")
def test_confusions(words=test_words):
"Run through some test words, printing and summarizing the confusions."
total = 0
for word in sorted(set(words)):
others = sorted(set(confusions(word)) - {word})
total += len(others)
print('{}({}): {}'.format(word, len(others), ' '.join(others)))
print 'Total of {} confusions for {} words'.format(total, len(words))
In [66]:
%time test_confusions()
So, 108 confusions for 25 test words, or an average of 4 confusions per word.
To determine if these confusions
are good ones, let's build something to visualize paths on a keyboard.
I'll add functionality to plot_kbd
to call the new function plot_paths
and implement that:
In [67]:
def plot_kbd(kbd, K=20, words=()):
"Plot the keyboard with square keys, K units on a side."
H = K / 2 ## (K is Key width/height; H is half K)
# Draw a square for each key, and plot paths for words
for L in kbd:
x, y = K * kbd[L].x, -K * kbd[L].y
plot_square(x, y, H, label=L)
plot_paths(kbd, K, words)
# Show the plot and print the workload average
plt.axis('equal'); plt.axis('off'); plt.show()
def plot_paths(kbd, K, words):
"Plot paths for each word."
for (i, word) in enumerate(words):
Xs = [+K * kbd[L].x for L in word]
Ys = [-K * kbd[L].y for L in word]
plt.plot(Xs, Ys, '-o')
Let's see how it works:
In [68]:
plot_kbd(qwerty, words=['HELLO', 'JELLO'])
OK, we're on the right track, but I see three problems: first, the letters are obscured by the circles. Second, when the paths are the same (for the "ELLO" part), they overwrite each other. And third, there is no indication what direction the path is going in ("OLLEH" would look the same as "HELLO").
I have an idea to fix the first two problems (and I will ignore the third for now). Instead of plotting the circle in the center of the key, offset it to one of the corners of the key. Have each path offset to a different corner; that way they won't overlap (at least for up to four paths).
In [69]:
def plot_paths(kbd, K, words):
"Plot paths for each word, each with a different offset (and color)."
Q = K / 4 ## Q is a quarter of a key width
offsets = [Point(-Q, -Q), Point(-Q, +Q), Point(Q, +Q), Point(Q, -Q)]
for (i, word) in enumerate(words):
Xs = [+K * kbd[L].x + offsets[i % 4].x for L in word]
Ys = [-K * kbd[L].y + offsets[i % 4].y for L in word]
plt.plot(Xs, Ys, '-o')
In [70]:
plot_kbd(qwerty, words=['HELLO', 'JELLO'])
That looks much better! Let's look at all four confusions of "HELLO":
In [71]:
plot_kbd(qwerty, words=confusions('HELLO'))
In [72]:
confusions('JELLO')
Out[72]:
Two confusions are 'JELLO' and 'BROOK':
In [73]:
plot_kbd(qwerty, words=['JELLO', 'BROOK'])
I think these two paths should not be confusions of each other. To see why, I'll start with just the first segment of each path:
In [74]:
plot_kbd(qwerty, words=['JE', 'BR'])
Note that 'J' and 'B' are within 1.5 units of each other, as are 'E' and 'R' (as all neighboring letters must be). The issue, as I see it, is that 'B' is offset to the left from 'J' while 'R' is offset to the right. Together, that seems like too much difference to allow these segments to be confusions of each other. How much difference is it? We can define JE
and BR
as the vectors for each segment, and find the difference (or distance) between these two vectors:
In [75]:
JE = qwerty['E'] - qwerty['J']
BR = qwerty['R'] - qwerty['B']
distance(JE, BR)
Out[75]:
That seems like too big a difference—maybe we should limit the difference.
Let's experiment with a function, similar_segments
, that decides whether the last segment in a path is similar to the corresponding segment in a word:
In [76]:
def similar_segments(kbd, path, word):
"Do the last two letters of path form a similar segment to corresponding letters of word?"
n = len(path)
if n < 2: return True
# Define endpoints (P1, P2), (W1, W2); then vectors P and W; then difference between.
(P1, P2), (W1, W2) = (path[n-2:n], word[n-2:n])
P = kbd[P2] - kbd[P1]
W = kbd[W2] - kbd[W1]
return distance(P, W) < 2
We can play with similar_segments
to make sure it looks right:
In [77]:
similar_segments(qwerty, 'BR', 'JELLO')
Out[77]:
In [78]:
similar_segments(qwerty, 'HE', 'JELLO')
Out[78]:
In [79]:
similar_segments(qwerty, 'HEP', 'JELLO')
Out[79]:
In [80]:
similar_segments(qwerty, 'J', 'JELLO')
Out[80]:
Now we will modify confusions
to call similar_segments
, and then re-test:
In [81]:
def confusions(word, neighbors=neighboring_keys(qwerty),
words=WORDS, prefixes=PREFIXES, kbd=qwerty):
"All valid words whose paths could be confused with the path for this word."
results = set() # A set of words that are confusions of 'word'
Q = [''] # A queue of partial paths
while Q:
path = Q.pop()
if len(path) < len(word):
for L in neighbors[word[len(path)]]:
newpath = path + L
if (newpath in prefixes) and similar_segments(kbd, newpath, word):
Q.append(newpath)
elif path in words:
results.add(path)
return results
In [82]:
test_confusions()
We went from 108 down to 75 confusions. Studying the results, this looks like an improvement. For "HELLO", only "JELLO" remains, which looks good. But there are still some confusions that don't look right. For example:
In [83]:
plot_kbd(qwerty, words=['SEE', 'DEE'])
To me, these look different because the SE segment is going to the right while the DE segment is going to the left. Yet they are considered similar:
In [84]:
similar_segments(qwerty, 'DE', 'SEE')
Out[84]:
Let's extend similar_segments
to make it check that two corresponding segments aren't going in opposite directions. If one goes right, the other can't go left; if one goes down, the other can't go up.
How do I test that? We have the vectors for each segment; we want to make sure that the x
components of the two vectors don't have opposite signs, and that the y
components don't have opposite signs. One way to tell if two numbers have opposite sign is to multiply them together and see if the result is negative.
In [85]:
def similar_segments(kbd, path, word):
"Do the last two letters of path form a similar segment to corresponding letters of word?"
n = len(path)
if n < 2: return True
# Define endpoints (P1, P2), (W1, W2); then vectors P and W; then difference between.
(P1, P2), (W1, W2) = (path[n-2:n], word[n-2:n])
P = Point(kbd[P2] - kbd[P1])
W = Point(kbd[W2] - kbd[W1])
return (distance(P, W) < 2) and (P.x * W.x >= 0) and (P.y * W.y >= 0)
We can re-test once more:
In [86]:
test_confusions()
Now we're down to 68 confusions, from our initial 108. I think I'll declare victory on confusions
, and move on to answer the question we really want to answer.
The question is: how confusing is a keyboard across a workload of words? First we'll need a workload of words. That's easy because we already have some TEXT
, and the function words
to extract words, Counter
to otal them up, and normalize
to make them sum to 1.0. We'll call the workload of words WORDLOAD
:
In [87]:
WORDLOAD = normalize(Counter(words(TEXT)))
In [88]:
len(WORDLOAD)
Out[88]:
This tells us there were 12,020 distinct words. We can see the most common words:
In [89]:
WORDLOAD.most_common(10)
Out[89]:
And we can find all the confusing words:
In [90]:
WORDLOAD_PREFIXES = prefixes(WORDLOAD)
CONFUSING = [w for w in WORDLOAD
if len(confusions(w, words=WORDLOAD, prefixes=WORDLOAD_PREFIXES)) > 1]
One metric for confusingness is the percentage of words that are confused with other words:
In [91]:
print(len(CONFUSING))
print(len(WORDLOAD))
print(len(CONFUSING) / len(WORDLOAD))
So 26% of the words in the dictionary are confusing (on a Qwerty keyboard). But it seems more important if "THE" is confusing than if "REMONSTRATED" is. We can weight each word by its frequency in WORDLOAD
:
In [92]:
sum(WORDLOAD[w] for w in WORDLOAD
if len(confusions(w, words=WORDLOAD, prefixes=WORDLOAD_PREFIXES)) > 1)
Out[92]:
That's unfortunate. A majority (55%) of words in running text are confusing. (Unfortunate—but it shouldn't be surprising. A long word is unlikely to be confused with other words, because a long word has many segments, and if just one segment differs, then there is no confusion. A short word has only a few segments, and thus fewer chances to differentiate itself from other words. Running text is mostly short words, while the dictionary is mostly long words; that's why running text has a higher confusingness percentage.)
We can wrap this computation up in a function:
In [93]:
def confusingness(kbd, wordload=WORDLOAD, prefixes=WORDLOAD_PREFIXES):
"The proportion of words in wordload that are confused with other words."
neighbors = neighboring_keys(kbd)
return sum(WORDLOAD[w] for w in WORDLOAD
if len(confusions(w, neighbors, wordload, prefixes)) > 1)
And we can make it part of show_kbd
:
In [94]:
def show_kbd(kbd, name='keyboard'):
"Plot a keyboard and print statistics about it."
plot_kbd(kbd)
print('workload average = {:.1f}; confusingness = {:.0%} for {}'
.format(workload_average(kbd), confusingness(kbd), name))
Let's see how our keyboards do on this new metric:
In [95]:
for name in keyboards:
show_kbd(keyboards[name], name)
It looks like Qwerty is the worst on all counts. Now we can try the improved keyboards. Remember that we are still working to improve the workload average, so any change to confusingness will be incidental, not planned.
In [96]:
report(keyboards)
Good to see that the "improved" keyboards actually improve confusingness, as well as workload average. Of course, we haven't yet explicitly searched for a keyboard with a good confusingness score. Could we?
Unfortunately, we won't get very far towards solving this problem. Consider:
In [97]:
%time confusingness(qwerty)
Out[97]:
In [98]:
%time workload_average(qwerty)
Out[98]:
Computing confusingness
takes thousands of times longer than computing workload_average
, so confusingness
, as it stands, is not a good candidate for a scorer function in the inside loop of improved
. We'd need some way of factoring confusingness
into pieces that can be recomputed incrementally to make things go faster. So for now, I'll make a token effort of scoring with confusingness, but with a paltry 20 swaps (designed to take around a minute of computation):
In [99]:
%time kbd = improved(qwerty, swaps=20, scorer=confusingness)
show_kbd(kbd)
This did indeed reduce confusingness; not bad for only 20 swaps. Too bad the workload average got worse.
What is user satisfaction? I don't know, but for now I'll approximate satisfaction (or rather, dissatisfaction, since lower scores are better) with a combined score that is the average of workload average and 6 times confusingness. Why 6 times? Because our best scores are around 1.8 for workload average and 0.3 for confusingness, and I want both factors to weigh about the same.
First we'll define the combined scorer function:
In [100]:
def combined_scorer(kbd):
"The average of workload average and 6 * confusingness."
return mean([workload_average(kbd), 6 * confusingness(kbd)])
We can incorporate the combined score into show_kbd
:
In [101]:
def show_kbd(kbd, name='keyboard'):
"Plot a keyboard and print statistics about it."
plot_kbd(kbd)
W = workload_average(kbd)
C = confusingness(kbd)
print('workload average = {:.1f}; confusingness = {:.0%}; combined = {:.1f} for {}'
.format(W, C, mean([W, 6 * C]), name))
Let's get a baseline for our keyboards to see which performs best on the combined metric.
In [102]:
for name in keyboards:
show_kbd(keyboards[name], name)
It looks like 4-by-7 gets the best score. Let's try a few swaps to improve it:
In [104]:
%time kbd = improved(keyboards['4-by-7'], swaps=30, scorer=combined_scorer)
show_kbd(kbd)
Success! We were in fact able to make progress on the combined metric. If we could run 30,000 swaps instead of just 30, we could probably do even better. To do so would require either (1) more computers, (2) more patience, or (3) more efficient algorithms. I'll leave it up to you to make more progress.
Note: Each time this notebook is run, different random results can occur. I'll record a keyboard found by one of the good runs (only 6% confusingness) here, just in case another run is not as good:
In [107]:
show_kbd(Keyboard((' A B E D Q F ',
'G U M J I L H',
' N O P W C S ',
'T X R V Y K Z')))
So where are we? Let's revisit our initial questions and see what answers we have:
Now it is your turn to answer the open questions, or make up some questions of your own. Good luck! Here are a few ideas to get you started:
Hillclimbing just keeps the one best keyboard it has found so far. Other optimization techniques such as beam search or genetic algorithms or ant colony optimization maintain several candidates at a time. Is that a good idea?
The code in this notebook emphasises clarity, not efficiency. Can you modify the code (or perhaps port it to another language) and make it twice as efficient? 10 times? 100 times?
What other factors do you think are important to user satisfaction with a keyboard. Can you measure them?
Consider the 5 paths below. They all start at 'P', move in a straight line to 'T', and then go to 'S', but they all make different stops along the top row. In other words, the 5 paths all trace exacty the same lines, so they are very confusing, but our definition of confusions
makes most of them different. Can you think
of a better way to handle confusions for paths like this?
In [108]:
plot_kbd(qwerty, words=['PUTS', 'POTS', 'PITS', 'POUTS', 'PUTTS'])