Once again, we could use a list to store names and counts, but the right solution is to use another new data strucure called a dictionary. A dictionary is a unordered collection of key-value pairs. The keys are immutable, unique, and unordered, just like the elements of a set. There are no restrictions on the values stored with those keys: they don't have to be immutable or unique. However, we can only look up entries by their keys, not by their values.
We create a new dictionary by putting key-value pairs inside curly braces with a colon between the two parts of each pair:
In [1]:
birthdays = {'Newton' : 1642, 'Darwin' : 1809}
The dictionary's keys are the strings 'Newton'
and 'Darwin'
. The
value associated with 'Newton'
is 1642, while the value associated
with 'Darwin'
is 1809. We can think of this as a two-column table:
Key | Value |
---|---|
'Newton' | 1642 |
'Darwin' | 1809 |
but it's important to remember that the entries aren't necessarily stored in this order (or any other specific order).
We can get the value associated with a key by putting the key in square brackets:
In [2]:
print birthdays['Newton']
This looks just like subscripting a string or list, except dictionary keys don't have to be integers—they can be strings, tuples, or any other immutable object. It's just like using a phonebook or a real dictionary: instead of looking things up by location using an integer index, we look things up by name.
If we want to add another entry to a dictionary, we just assign a value to the key, just as we create a new variable in a program by assigning it a value:
In [3]:
birthdays['Turing'] = 1612
print birthdays
If the key is already in the dictionary, assignment replaces the value associated with it rather than adding another entry (since each key can appear at most once). Let's fix Turing's birthday by replacing 1612 with 1912:
In [4]:
birthdays['Turing'] = 1912
print birthdays
At this point, our set looks like this:
Trying to get the value associated with a key that isn't in the dictionary is an error, just like trying to access a nonexistent variable or get an out-of-bounds element from a list. For example, let's try to look up Florence Nightingale's birthday:
In [5]:
print birthdays['Nightingale']
If we're not sure whether a key is in a dictionary or not, we can test
for it using in
:
In [6]:
print 'Nightingale' in birthdays
In [7]:
print 'Darwin' in birthdays
We can see how many entries are in the dictionary using len
,
and loop over the keys in a dictionary using for
:
In [8]:
print len(birthdays)
for name in birthdays:
print name, birthdays[name]
This is a little bit different from looping over a list. When we loop over a list we get the values in the list. When we loop over a dictionary, on the other hand, the loop gives us the keys, which we can use to look up the values.
We're now ready to count atoms. The main body of our program looks like this:
In [14]:
def main(filename):
counts = count_atoms(filename)
for atom in counts:
print atom, counts[atom]
count_atoms
reads atomic symbols from a file, one per line, and
creates a dictionary of atomic symbols and counts. Once we have that
dictionary, we use a loop like the one we just saw to print out its
contents.
Here's the function that does the counting:
In [15]:
def count_atoms(filename):
'''Count unique atoms, returning a dictionary.'''
result = {}
with open(filename, 'r') as reader:
for line in reader:
atom = line.strip()
if atom not in result:
result[atom] = 1
else:
result[atom] = result[atom] + 1
return result
We start with a docstring to explain the function's purpose to whoever
has to read it next. We then create an empty dictionary to fill with
data, and use a loop to process the lines from the input file one by
one. Notice that the empty dictionary is written {}
: this is the
"previous use" we referred to when explaining why an
empty set had to be written set()
.
After stripping whitespace off the atom's symbol, we check to see if we've seen it before. If we haven't, we set its count to 1, because we've now seen that atom one time. If we have seen it before, we add one to the previous count and store that new value back in the dictionary. When the loop is done, we return the dictionary we have created.
Let's watch this function in action. Before we read any data, our
dictionary is empty. After we see 'Na'
for the first time, our
dictionary has one entry: its key is 'Na'
, and its value is 1. When we
see 'Fe'
, we add another entry to the dictionary with that string as a
key and 1 as a value. Finally, when we see 'Na'
for the second time,
we add one to its count.
Input | Dictionary |
---|---|
start | {} |
Na |
{'Na' : 1} |
Fe |
{'Na' : 1, 'Fe' : 1} |
Na |
{'Na' : 2, 'Fe' : 1} |
In [16]:
main('some_atoms.txt')
Just as we use tuples for multi-part entries in sets, we can use them for multi-part keys in dictionaries. For example, if we want to store the years in which scientists were born using their full names, we could do this:
In [17]:
birthdays = {
('Isaac', 'Newton') : 1642,
('Charles', 'Robert', 'Darwin') : 1809,
('Alan', 'Mathison', 'Turing') : 1912
}
print birthdays
If we do this, though, we always have to look things up by the full key:
there is no way to ask for all the entries whose keys contain the word
'Darwin'
, because Python cannot match part of a tuple.
If we think of a dictionary as a two-column table, it is occasionally useful to get one or the other column, i.e., just the keys or just the values:
In [18]:
all_keys = birthdays.keys()
print all_keys
In [20]:
all_values = birthdays.values()
print all_values
These methods should be used sparingly: the dictionary doesn't store the keys or values in a list, these methods both actually create a new list as their result. In particular, we shouldn't loop over a dictionary's entries like this:
In [2]:
for key in birthdays.keys():
print key, birthdays[key]
{k1:v1, k2:v2, ...}
.dict[key]
refers to the dictionary entry with a particular key.key in dict
tests whether a key is in a dictionary.len(dict)
returns the number of entries in a dictionary.dict.keys()
creates a list of the keys in a dictionary.dict.values()
creates a list of the keys in a dictionary.