Sets and Dictionaries in Python: Dictionaries (Instructor Version)

Objectives

  • Explain the similarities and differences between sets and dictionaries.
  • Perform common operations on dictionaries.

Lesson

Now that we know how to find out what kinds of atoms are in our inventory, we want to find out how many of each we have. Our input is a list of several thousand atomic symbols, and the output we want is a list of names and counts.

Once again, we could use a list to store names and counts, but the right solution is to use another new data strucure called a dictionary. A dictionary is a unordered collection of key-value pairs. The keys are immutable, unique, and unordered, just like the elements of a set. There are no restrictions on the values stored with those keys: they don't have to be immutable or unique. However, we can only look up entries by their keys, not by their values.

We create a new dictionary by putting key-value pairs inside curly braces with a colon between the two parts of each pair:


In [1]:
birthdays = {'Newton' : 1642, 'Darwin' : 1809}

The dictionary's keys are the strings 'Newton' and 'Darwin'. The value associated with 'Newton' is 1642, while the value associated with 'Darwin' is 1809. We can think of this as a two-column table:

KeyValue
'Newton'1642
'Darwin'1809

but it's important to remember that the entries aren't necessarily stored in this order (or any other specific order).

We can get the value associated with a key by putting the key in square brackets:


In [2]:
print birthdays['Newton']


1642

This looks just like subscripting a string or list, except dictionary keys don't have to be integers—they can be strings, tuples, or any other immutable object. It's just like using a phonebook or a real dictionary: instead of looking things up by location using an integer index, we look things up by name.

If we want to add another entry to a dictionary, we just assign a value to the key, just as we create a new variable in a program by assigning it a value:


In [3]:
birthdays['Turing'] = 1612
print birthdays


{'Turing': 1612, 'Newton': 1642, 'Darwin': 1809}

If the key is already in the dictionary, assignment replaces the value associated with it rather than adding another entry (since each key can appear at most once). Let's fix Turing's birthday by replacing 1612 with 1912:


In [4]:
birthdays['Turing'] = 1912
print birthdays


{'Turing': 1912, 'Newton': 1642, 'Darwin': 1809}

At this point, our set looks like this:

Trying to get the value associated with a key that isn't in the dictionary is an error, just like trying to access a nonexistent variable or get an out-of-bounds element from a list. For example, let's try to look up Florence Nightingale's birthday:


In [5]:
print birthdays['Nightingale']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-44ca8cabb590> in <module>()
----> 1 print birthdays['Nightingale']

KeyError: 'Nightingale'

If we're not sure whether a key is in a dictionary or not, we can test for it using in:


In [6]:
print 'Nightingale' in birthdays


False

In [7]:
print 'Darwin' in birthdays


True

We can see how many entries are in the dictionary using len, and loop over the keys in a dictionary using for:


In [8]:
print len(birthdays)
for name in birthdays:
    print name, birthdays[name]


3
Turing 1912
Newton 1642
Darwin 1809

This is a little bit different from looping over a list. When we loop over a list we get the values in the list. When we loop over a dictionary, on the other hand, the loop gives us the keys, which we can use to look up the values.

We're now ready to count atoms. The main body of our program looks like this:


In [14]:
def main(filename):
    counts = count_atoms(filename)
    for atom in counts:
        print atom, counts[atom]

count_atoms reads atomic symbols from a file, one per line, and creates a dictionary of atomic symbols and counts. Once we have that dictionary, we use a loop like the one we just saw to print out its contents.

Here's the function that does the counting:


In [15]:
def count_atoms(filename):
    '''Count unique atoms, returning a dictionary.'''

    result = {}
    with open(filename, 'r') as reader:
	for line in reader:
	    atom = line.strip()
            if atom not in result:
                result[atom] = 1
            else:
                result[atom] = result[atom] + 1
    return result

We start with a docstring to explain the function's purpose to whoever has to read it next. We then create an empty dictionary to fill with data, and use a loop to process the lines from the input file one by one. Notice that the empty dictionary is written {}: this is the "previous use" we referred to when explaining why an empty set had to be written set().

After stripping whitespace off the atom's symbol, we check to see if we've seen it before. If we haven't, we set its count to 1, because we've now seen that atom one time. If we have seen it before, we add one to the previous count and store that new value back in the dictionary. When the loop is done, we return the dictionary we have created.

Let's watch this function in action. Before we read any data, our dictionary is empty. After we see 'Na' for the first time, our dictionary has one entry: its key is 'Na', and its value is 1. When we see 'Fe', we add another entry to the dictionary with that string as a key and 1 as a value. Finally, when we see 'Na' for the second time, we add one to its count.

Input Dictionary
start {}
Na {'Na' : 1}
Fe {'Na' : 1, 'Fe' : 1}
Na {'Na' : 2, 'Fe' : 1}

In [16]:
main('some_atoms.txt')


Na 3
Si 1
Fe 1
Pd 1

Just as we use tuples for multi-part entries in sets, we can use them for multi-part keys in dictionaries. For example, if we want to store the years in which scientists were born using their full names, we could do this:


In [17]:
birthdays = {
    ('Isaac', 'Newton') : 1642,
    ('Charles', 'Robert', 'Darwin') : 1809,
    ('Alan', 'Mathison', 'Turing') : 1912
}
print birthdays


{('Charles', 'Robert', 'Darwin'): 1809, ('Isaac', 'Newton'): 1642, ('Alan', 'Mathison', 'Turing'): 1912}

If we do this, though, we always have to look things up by the full key: there is no way to ask for all the entries whose keys contain the word 'Darwin', because Python cannot match part of a tuple.

If we think of a dictionary as a two-column table, it is occasionally useful to get one or the other column, i.e., just the keys or just the values:


In [18]:
all_keys = birthdays.keys()
print all_keys


[('Charles', 'Robert', 'Darwin'), ('Isaac', 'Newton'), ('Alan', 'Mathison', 'Turing')]

In [20]:
all_values = birthdays.values()
print all_values


[1809, 1642, 1912]

These methods should be used sparingly: the dictionary doesn't store the keys or values in a list, these methods both actually create a new list as their result. In particular, we shouldn't loop over a dictionary's entries like this:


In [2]:
for key in birthdays.keys():
    print key, birthdays[key]


Newton 1642
Darwin 1809

Key Points

  • Use dictionaries to store key-value pairs with distinct keys.
  • Create dictionaries using {k1:v1, k2:v2, ...}.
  • Dictionaries are mutable, i.e., they can be updated in place.
  • Dictionary keys must be immutable, but values can be anything.
  • Use tuples to store multi-part keys in dictionaries.
  • dict[key] refers to the dictionary entry with a particular key.
  • key in dict tests whether a key is in a dictionary.
  • len(dict) returns the number of entries in a dictionary.
  • A loop over a dictionary produces each key once, in arbitrary order.
  • dict.keys() creates a list of the keys in a dictionary.
  • dict.values() creates a list of the keys in a dictionary.