We can now solve Fan's original nanotech inventory problem. As explained in the introduction, our goal is to find out how many molecules of various kinds we can make using the atoms in our warehouse. The number of molecules of any particular type we can make is limited by the scarcest atom that molecule requires. For example, if we have five nitrogen atoms and ten hydrogen atoms, we can only make three ammonia molecules, because we need three hydrogen atoms for each.
The formulas for the molecules we know how to make are stored in a file like this:
In [11]:
!cat formulas-03.txt
and our inventory is stored in a file like this:
In [12]:
!cat inventory-03.txt
Let's start by reading in our inventory. It consists of pairs of strings and numbers, which by now should suggest using a dictionary for storage. The keys will be atomic symbols, and the values will be the number of atoms of that kind we currently have. If an atom isn't listed in our inventory, we'll assume that we don't have any.
What about the formulas for the molecules we know how to make? Once again, we want to use strings—the names of molecules—as indices, which suggests a dictionary. Each of its values will be something storing atomic symbols and the number of atoms of that type in the molecule—the same structure, in fact, that we're using for our inventory. The diagram below shows what this looks like in memory if the only molecules we know how to make are water and ammonia.
Finally, we'll store the results of our calculation in yet another dictionary, this one mapping the names of molecules to how many molecules of that kind we can make, and display like this:
helium 1
hydrogen 2
water 2
The main body of the program is straightforward. It reads in the input files, does our calculation, and prints the result:
In [13]:
def main(inventory_file, formula_file):
inventory = read_inventory(inventory_file)
formulas = read_formulas(formula_file)
counts = calculate_counts(inventory, formulas)
show_counts(counts)
Reading the inventory file is simple. We take each interesting line in the file, split it to get an atomic symbol and a count, and store them together in a dictionary:
In [14]:
def read_inventory(inventory_file):
result = {}
with open(inventory_file, 'r') as reader:
for line in reader:
name, count = line.strip().split()
result[name] = int(count)
return result
Let's test it:
In [15]:
print read_inventory('inventory-03.txt')
Our mistake was to forget that files can contain blank lines and comments. It's easy enough to modify the function to handle them, though it complicates the logic:
In [16]:
def read_inventory(inventory_file):
result = {}
with open(inventory_file, 'r') as reader:
for line in reader:
line = line.strip()
if (not line) or line.startswith('#'):
continue
name, count = line.split()
result[name] = int(count)
return result
print read_inventory('inventory-03.txt')
The next step is to read the files containing formulas. Since the file format is more complicated, the function is as well. In fact, it's complicated enough that we'll come back later and simplify it.
FIXME: diagram
In [17]:
def read_formulas(formula_file):
result = {}
with open(formula_file, 'r') as reader:
for line in reader:
line = line.strip()
if (not line) or line.startswith('#'):
continue
name, atoms = line.split(':')
name = name.strip()
atoms = atoms.strip().split()
formula = {}
for i in range(0, len(atoms), 2):
symbol = atoms[i].strip()
count = int(atoms[i+1])
formula[symbol] = count
result[name] = formula
return result
We start by creating a dictionary to hold our results. We then split each interesting line in the data file on the colon ':' to separate the molecule's name (which may contain spaces) from its formula. We then split the formulas into a list of strings. These alternate between atomic symbols and numbers, so in the inner loop, we move forward through those values two elements at a time, storing the atomic symbol and count in a dictionary. Once we're done, we store that dictionary as the value for the molecule name in the main dictionary. When we've processed all the lines, we return the final result. Here's a simple test:
In [18]:
print read_formulas('formulas-03.txt')
Now that we have all our data, it's time to calculate how many molecules
of each kind we can make. inventory
maps atomic symbols to counts, and
so does formulas[name]
, so let's loop over all the molecules we know
how to make and "divide" the inventory by each one:
In [19]:
def calculate_counts(inventory, formulas):
counts = {}
for name in formulas:
counts[name] = dict_divide(inventory, formulas[name])
return counts
We say we're "dividing" the inventory by each molecule because we're trying to find out how many of that molecule we can make without requiring more of any particular atom than we actually have. (By analogy, when we divide 11 by 3, we're trying to find out how many 3's we can make from 11.) The function that does the division is:
In [20]:
def dict_divide(inventory, molecule):
number = None
for atom in molecule:
required = molecule[atom]
available = inventory.get(atom, 0)
limit = available / required
if (number is None) or (limit < number):
number = limit
return number
This function loops over all the atoms in the molecule we're trying to build, see what limit the available inventory puts on us, and return the minimum of all those results. This function uses a few patterns that come up frequently in many kinds of programs:
None
, then test for that value inside the loop to make sure we
re-set it to a legal value the first time we have real data. In this
case, we could just as easily use -1 or some other impossible value
as an "uninitialized" flag for number
.molecule
, we know that we can
get the value stored in molecule[atom]
. However, that atom might
not be a key in inventory
, so we use inventory.get(atom, 0)
to
get either the stored value or a sensible default. In this case
zero, the sensible default is 0, because if the atom's symbol isn't
in the dictionary, we don't have any of it. This is our second
pattern.The last step in building our program is to show how many molecules of each kind we can make. We could just loop over our result dictionary, printing each molecule's name and the number of times we could make it, but let's put the results in alphabetical order to make it easier to read:
In [21]:
def show_counts(counts):
names = counts.keys()
names.sort()
for name in names:
print name, counts[name]
It's time to test our code. Let's start by using an empty inventory and a single formula:
In [22]:
!cat inventory-00.txt
In [23]:
!cat formulas-00.txt
In [24]:
main('inventory-00.txt', 'formulas-00.txt')
There's no output, which is what we expect. Let's add a formula but no atoms:
In [26]:
!cat formulas-01.txt
In [27]:
main('inventory-00.txt', 'formulas-01.txt')
and now an atom:
In [29]:
!cat inventory-01.txt
In [30]:
main('inventory-01.txt', 'formulas-01.txt')
That seems right as well. Let's add some hydrogen and another formula:
In [31]:
!cat inventory-02.txt
In [33]:
!cat formulas-02.txt
In [34]:
main('inventory-02.txt', 'formulas-02.txt')
The output doesn't change, which is correct. Our final test adds some oxygen:
In [35]:
!cat inventory-03.txt
In [36]:
main('inventory-03.txt', 'formulas-02.txt')
That's right too: we can make two water molecules (because we don't have enough hydrogen to pair with our three oxygen atoms).
There are quite a few other interesting tests still to run, but before we do that, we should clean up our code. Both of our input functions handle comments and blank lines the same way; let's put that code in a helper function:
In [37]:
def readlines(filename):
result = []
with open(filename, 'r') as reader:
for line in reader:
line = line.strip()
if line and (not line.startswith('#')):
result.append(line)
return result
If we convert read_inventory
to use it, the result is six lines long
instead of ten. More importantly, the logic of what we're doing is much
clearer:
In [38]:
def read_inventory(inventory_file):
result = {}
for line in readlines(inventory_file):
name, count = line.split()
result[name] = int(count)
return result
The converted version of read_formulas
is 15 lines instead of 19:
In [ ]:
def read_formulas(formula_file):
result = {}
for line in readlines(formula_file):
name, atoms = line.split(':')
name = name.strip()
atoms = atoms.strip().split()
formula = {}
for i in range(0, len(atoms), 2):
symbol = atoms[i].strip()
count = int(atoms[i+1])
formula[symbol] = count
result[name] = formula
return result
We can do better still by putting the code that handles atom/count pairs in a helper function of its own:
In [39]:
def read_formulas(formula_file):
result = {}
for line in readlines(formula_file):
name, atoms = line.split(':')
name = name.strip()
result[name] = make_formula(atoms)
return result
def make_formula(atoms):
formula = {}
atoms = atoms.strip().split()
for i in range(0, len(atoms), 2):
symbol = atoms[i].strip()
count = int(atoms[i+1])
formula[symbol] = count
return formula
This change has actually made the code slightly longer, but each
function now does one small job, and as a bonus, the code in
make_formula
(which is moderately complex) can now be tested on its
own. We're not done until we test our changes, though:
In [40]:
main('inventory-00.txt', 'formulas-00.txt')
In [41]:
main('inventory-01.txt', 'formulas-01.txt')
In [42]:
main('inventory-02.txt', 'formulas-02.txt')
In [43]:
main('inventory-03.txt', 'formulas-03.txt')
In fact, we could (and should) have tested right after refactoring read_inventory
, and only then refactored read_formulas
.