A dictionary is a mapping

  • A dictionary (or map) is a mapping between keys and values
  • Real-world examples are dictionaries of words and phone-books
  • A key-value pair is an association between a key (e.g., a word) and a value (e.g., a definition)
  • The function dict creates a new dictionary object with no items
  • For example, we can create a dictionary that maps birthdays to a person's name

In [1]:
birthdays = dict()
print( birthdays )


{}
  • To add an item to the dictionary, use square brackets like a list

In [2]:
birthdays['0704'] = 'Steve'
birthdays['0529'] = 'Tony'
print( birthdays )


{'0704': 'Steve', '0529': 'Tony'}
  • Note that order isn't preserved in a dictionary (unlike a list)
  • The values can be retrieved using the same notation

In [3]:
print( birthdays['0529'] )


Tony

In [4]:
# Get the number of key-value pairs
print( len( birthdays ) )

# Get the values in the dictionary
print( birthdays.values() )

# Get the keys in the dictionary
print( birthdays.keys() )


2
dict_values(['Steve', 'Tony'])
dict_keys(['0704', '0529'])

Dictionary as a set of counters

  • Often there are many alternatives for implementing a computation
  • Some are better than others (as we saw with the Fibonacci series)
  • If you need to count how many times a letter appears in a string, you could:
    • Create 26 variables with each holding the number of times a letter occurs
    • Create a list with 26 elements with each index corresponding to a letter
    • Create a dictionary with the letters as keys and the values as the counters
  • What are the pros and cons with each approach?

Looping and dictionaries

  • If you use a dictionary in a for statement, the loop traverses the dictionary's keys
  • Remember that there is no implied ordering of keys in a dictionary

In [5]:
for a_date in birthdays:
    print( a_date )


0704
0529

Reverse lookup

  • Dictionaries are designed to find a value given a key
  • This is called a lookup
  • What if you want to do the reverse? (Note: this is referred to as a reverse lookup)
  • For example, what if you want to lookup a word by its definition?
  • Unfortunately there is no simple function to do that
  • Why might that be?
  • The book presents a simple function that implements this functionality

In [6]:
def reverse_lookup( a_dict, value ):
    for key in a_dict:
        if a_dict[key] == value:
            return key
    raise ValueError
  • Why is this approach inefficient?
  • Note the raise keyword
  • It causes an exception (specifically a ValueError) if the value isn't in the dictionary
  • The raise statement also takes an optional argument that is a detailed error message

Dictionaries and lists

  • Lists can appear as values in a dictionary
  • For example, people often have the same birthday

In [7]:
birthdays['0704'] = [ 'Steve', 'Nick' ]
  • Sometimes you may want to invert a dictionary
  • This means that you turn the keys into values and values into keys
  • However, remember that keys are unique, but values aren't necessarily unique
  • This means that the original dictionary can have multiple keys with the same value
  • The inversion process is therefore more complex than simply switching things around.

In [8]:
def invert_dict( a_dict ):
    inverted_dict = dict()
    for key in a_dict:
        value = a_dict[key]
        if value not in inverted_dict:
            inverted_dict[value] = [ key ]
        else:
            inverted_dict[value].append( key )
    return inverted_dict
  • A list can be a value in a dictionary, but it can’t be a key
  • Keys are required to be hashable
  • A hash function takes a value and returns an integer
  • Dictionaries use these to store a look up keys in a very efficient manner
  • Hash functions operate on all types of values, not just numbers
  • This approach works great if the keys are immutable, but breaks if they are mutable (like a list)

Global variables

  • Disclaimer: Global variables usually bad and are generally frowned upon
  • If you don't declare a variable in a function, it belongs to a special frame/function called __main__
  • These variables are called global and can be accessed from any code
  • Why might this be bad?
  • The textbook illustrates how this can be a problem on pg. 110-111
  • If you want to change a global variable in a function, you need to declare the variable as global before you use it
  • Otherwise, a new variable with the same name is temporarily created while the function is executing
  • Don't use global variables

Debugging

  • As we saw with the words.txt exercises, debugging programs that use large amounts of data is difficult
  • The book presents a few suggestions on how to debug such programs:
    • Scale down the input: Reduce the size of the dataset like we did with our sample tests
    • Check summaries and types: Debug using aggregate data or datatypes instead of all the data
    • Write self-checks: Have your code test things automatically (e.g., sanity or consistency checks)
    • Prety print the output: Format the debugging output to make it easier to read

Exercises

  • Write a function that processes the words.txt file and computes the relative frequency of each letter in the alphabet. For example, if the letter e occurs 100 times and there are a total of 500 letters in the file, the letter e has a relative frequency of 20\%.
  • Write a function that processes the words.txt file and computes the relative frequency of the length of all the words. You must use a dictionary.
  • Write a function that prints a dictionary such that the keys are ordered in alphabetical order.