Can You Find the Fish in State Names?

The Riddler - 2020-05-22

Ohio is the only state whose name doesn’t share any letters with the word “mackerel.” It’s strange, but it’s true.

But that isn’t the only pairing of a state and a word you can say that about — it’s not even the only fish! Kentucky has “goldfish” to itself, Montana has “jellyfish” and Delaware has “monkfish,” just to name a few.

What is the longest “mackerel?” That is, what is the longest word that doesn’t share any letters with exactly one state? (If multiple “mackerels” are tied for being the longest, can you find them all?)

Extra credit: Which state has the most “mackerels?” That is, which state has the most words for which it is the only state without any letters in common with those words?


In [65]:
from collections import Counter
from joblib import Parallel, delayed
import json
import pandas as pd
import requests

To get our word list, we will use a list provided by Peter Norvig. And for the 50 states, I found a list on Github.


In [5]:
def get_word_list():
    r = requests.get('https://norvig.com/ngrams/word.list')
    word_list = [w.strip() for w in r.text.split()]
    return word_list

word_list = get_word_list()
len(word_list)


Out[5]:
263533

In [11]:
def get_state_list():
    r = requests.get('https://gist.githubusercontent.com/tvpmb/4734703/raw/b54d03154c339ed3047c66fefcece4727dfc931a/US%2520State%2520List')
    state_dct_list = json.loads(r.text)
    return [s.get('name') for s in state_dct_list]

state_list = get_state_list()
state_list[:10]


Out[11]:
['Alabama',
 'Alaska',
 'Arizona',
 'Arkansas',
 'California',
 'Colorado',
 'Connecticut',
 'Delaware',
 'District of Columbia',
 'Florida']

First let's start off simple by creating a method that will take a state name and a word and return a set of the overlapping letters.


In [18]:
def get_shared_letters(a, b):
    return set(a.lower()).intersection(set(b.lower()))

get_shared_letters('mackerel', 'Mississippi')


Out[18]:
{'m'}

Now we will make a method that takes a word and the list of states and returns a dictionary mapping the state name to the overlapping set.


In [21]:
def get_state_to_shared_letters(word, state_list):
    return {state_name: get_shared_letters(word, state_name) for state_name in state_list}

get_state_to_shared_letters('mackerel', state_list)


Out[21]:
{'Alabama': {'a', 'l', 'm'},
 'Alaska': {'a', 'k', 'l'},
 'Arizona': {'a', 'r'},
 'Arkansas': {'a', 'k', 'r'},
 'California': {'a', 'c', 'l', 'r'},
 'Colorado': {'a', 'c', 'l', 'r'},
 'Connecticut': {'c', 'e'},
 'Delaware': {'a', 'e', 'l', 'r'},
 'District of Columbia': {'a', 'c', 'l', 'm', 'r'},
 'Florida': {'a', 'l', 'r'},
 'Georgia': {'a', 'e', 'r'},
 'Hawaii': {'a'},
 'Idaho': {'a'},
 'Illinois': {'l'},
 'Indiana': {'a'},
 'Iowa': {'a'},
 'Kansa': {'a', 'k'},
 'Kentucky': {'c', 'e', 'k'},
 'Lousiana': {'a', 'l'},
 'Maine': {'a', 'e', 'm'},
 'Maryland': {'a', 'l', 'm', 'r'},
 'Massachusetts': {'a', 'c', 'e', 'm'},
 'Michigan': {'a', 'c', 'm'},
 'Minnesota': {'a', 'e', 'm'},
 'Mississippi': {'m'},
 'Missouri': {'m', 'r'},
 'Montana': {'a', 'm'},
 'Nebraska': {'a', 'e', 'k', 'r'},
 'Nevada': {'a', 'e'},
 'New Hampshire': {'a', 'e', 'm', 'r'},
 'New Jersey': {'e', 'r'},
 'New Mexico': {'c', 'e', 'm'},
 'New York': {'e', 'k', 'r'},
 'North Carolina': {'a', 'c', 'l', 'r'},
 'North Dakota': {'a', 'k', 'r'},
 'Ohio': set(),
 'Oklahoma': {'a', 'k', 'l', 'm'},
 'Oregon': {'e', 'r'},
 'Pennsylvania': {'a', 'e', 'l'},
 'Rhode Island': {'a', 'e', 'l', 'r'},
 'South Carolina': {'a', 'c', 'l', 'r'},
 'South Dakota': {'a', 'k'},
 'Tennessee': {'e'},
 'Texas': {'a', 'e'},
 'Utah': {'a'},
 'Vermont': {'e', 'm', 'r'},
 'Virginia': {'a', 'r'},
 'Washington': {'a'},
 'West Virginia': {'a', 'e', 'r'},
 'Wisconsin': {'c'},
 'Wyoming': {'m'}}

Now we can make a method that filters for the state names with empty sets only.


In [22]:
def filter_empty_sets(state_to_shared_letters_dct):
    return {state_name: shared_set for state_name, shared_set in state_to_shared_letters_dct.items() if len(shared_set) == 0}


filter_empty_sets(get_state_to_shared_letters('mackerel', state_list))


Out[22]:
{'Ohio': set()}

Finally, we can iterate through all the words in the dictionary and find the words that have only a single state with no shared letters.


In [56]:
def get_words_with_one_state_no_shared_letters(word_list, state_list):
    use_joblib_parallel = True
    if use_joblib_parallel:
        def _get_filtered_empty_sets(word):
            print(word)
            empty_set_dict = filter_empty_sets(get_state_to_shared_letters(word, state_list))
            if len(empty_set_dict) == 1:
                return (word, list(empty_set_dict.keys())[0])
        dct = Parallel(n_jobs=2)(delayed(_get_filtered_empty_sets)(word) for word in word_list)
        dct = {item[0]: item[1] for item in dct if item is not None}
    else:
        dct = {}
        for word in word_list:
            empty_set_dict = filter_empty_sets(get_state_to_shared_letters(word, state_list))
            if len(empty_set_dict) == 1:
                dct[word] = list(empty_set_dict.keys())[0]
    return dct


words_with_one_state_no_shared_letters = get_words_with_one_state_no_shared_letters(word_list, state_list)
len(words_with_one_state_no_shared_letters)


Out[56]:
45385

Find the longest words.


In [63]:
for word in sorted(words_with_one_state_no_shared_letters.keys(), key=len, reverse=True)[:100]:
    print(word, words_with_one_state_no_shared_letters[word])


counterproductivenesses Alabama
hydrochlorofluorocarbon Mississippi
counterproductiveness Alabama
unconscientiousnesses Alabama
counterconditionings Alabama
deoxycorticosterones Alabama
expressionlessnesses Utah
hyperconsciousnesses Alabama
hypersensitivenesses Alabama
incompressiblenesses Utah
interconnectednesses Alabama
microelectrophoretic Kansa
nondestructivenesses Alabama
overprotectivenesses Alabama
overscrupulousnesses Hawaii
supposititiousnesses Alabama
transcendentalnesses Ohio
underconsciousnesses Alabama
untranslatablenesses Ohio
conscientiousnesses Alabama
counterinstitutions Alabama
counterinsurgencies Alabama
deoxycorticosterone Alabama
discontinuousnesses Alabama
heterogeneousnesses Alabama
inconsecutivenesses Alabama
inconspicuousnesses Alabama
indiscerniblenesses Utah
inexpressiblenesses Utah
intersubjectivities Oklahoma
introspectivenesses Alabama
irrepressiblenesses Utah
irresponsiblenesses Utah
noninterventionists Alabama
nonproductivenesses Alabama
oversensitivenesses Alabama
photoconductivities Alabama
preternaturalnesses Ohio
psychophysiologists Nevada
stereospecificities Alabama
superconductivities Alabama
superintendentships Alabama
superstitiousnesses Alabama
surreptitiousnesses Alabama
tetrachloroethylene Mississippi
thoroughgoingnesses Alabama
unconditionednesses Alabama
unconscientiousness Alabama
unpersuadablenesses Ohio
unpretentiousnesses Alabama
unpreventablenesses Ohio
unprogressivenesses Alabama
untrustworthinesses Alabama
biobibliographical Tennessee
chlorofluorocarbon Mississippi
coccidioidomycoses Utah
coccidioidomycosis Utah
compressiblenesses Utah
constructivenesses Alabama
contemptuousnesses Hawaii
counterconventions Alabama
counterdeployments Hawaii
countergovernments Hawaii
counterinstitution Alabama
counterquestioning Alabama
countersuggestions Alabama
creditworthinesses Alabama
deconstructionists Alabama
disconnectednesses Alabama
discontentednesses Alabama
discourteousnesses Alabama
disingenuousnesses Alabama
distributivenesses Oklahoma
expressionlessness Utah
feeblemindednesses Utah
hyperconsciousness Alabama
hypercorrectnesses Alabama
hypersensitiveness Alabama
hypersensitivities Alabama
incompressibleness Utah
incontiguousnesses Alabama
incorrigiblenesses Utah
indefensiblenesses Utah
interconnectedness Alabama
introductorinesses Alabama
introspectionistic Alabama
irremissiblenesses Utah
irreversiblenesses Utah
micromorphological Tennessee
nondestructiveness Alabama
noninterventionist Alabama
obstreperousnesses Hawaii
overprotectiveness Alabama
overscrupulousness Hawaii
photoperiodicities Alabama
photoreproductions Alabama
photosensitivities Alabama
preconsciousnesses Alabama
presumptuousnesses Hawaii
propertylessnesses Hawaii

What states have the most words associated with them?


In [68]:
Counter(words_with_one_state_no_shared_letters.values()).most_common()


Out[68]:
[('Ohio', 11342),
 ('Alabama', 8274),
 ('Utah', 6619),
 ('Mississippi', 4863),
 ('Hawaii', 1763),
 ('Kentucky', 1580),
 ('Wyoming', 1364),
 ('Tennessee', 1339),
 ('Alaska', 1261),
 ('Nevada', 1229),
 ('Kansa', 884),
 ('Oregon', 682),
 ('Montana', 648),
 ('Texas', 639),
 ('Indiana', 482),
 ('Colorado', 481),
 ('Delaware', 399),
 ('Oklahoma', 369),
 ('New Jersey', 337),
 ('Iowa', 201),
 ('Virginia', 107),
 ('New York', 105),
 ('Illinois', 79),
 ('Missouri', 73),
 ('Maryland', 67),
 ('Wisconsin', 60),
 ('North Dakota', 54),
 ('New Mexico', 30),
 ('Vermont', 27),
 ('Maine', 14),
 ('Connecticut', 9),
 ('Michigan', 4)]

In [ ]: