Sets and Dictionaries in Python: Nanotech Inventory (Learner Version)

Objectives

Create and manipulate nested dictionaries.
Explain the similarities and differences between nested dictionaries and nested lists.

Lesson

How many molecules of various kinds can we make from the atoms in our warehouse?
Inventory stored in files like this:



In [7]:

    
!cat inventory-03.txt









    



# Atomic inventory file
He 1
H 4
O 3

Inventory in Memory

Formulas stored in files like this:



In [8]:

    
!cat formulas-03.txt









    



# Molecular formula file

helium : He 1
water : H 2 O 1
hydrogen : H 2

Formulas in Memory

Result is molecule names and counts

helium 1
hydrogen 2
water 2

Main body of program reads files, calculates result, and prints it:



In [9]:

    
def main(inventory_file, formula_file):
    inventory = read_inventory(inventory_file)
    formulas = read_formulas(formula_file)
    counts = calculate_counts(inventory, formulas)
    show_counts(counts)

Inventory format is simpler than formula format, so write that function first



In [10]:

    
def read_inventory(inventory_file):
    result = {}
    with open(inventory_file, 'r') as reader:
        for line in reader:
            name, count = line.strip().split()
            result[name] = int(count)
    return result

And test:



In [11]:

    
print read_inventory('inventory-03.txt')









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-c05b7b912bfb> in <module>()
----> 1 print read_inventory('inventory-03.txt')

<ipython-input-10-d5dd028eb45b> in read_inventory(inventory_file)
      3     with open(inventory_file, 'r') as reader:
      4         for line in reader:
----> 5             name, count = line.strip().split()
      6             result[name] = int(count)
      7     return result

ValueError: too many values to unpack

Whoops: forgot comments and blank lines



In [12]:

    
def read_inventory(inventory_file):
    result = {}
    with open(inventory_file, 'r') as reader:
        for line in reader:
            line = line.strip()
            if (not line) or line.startswith('#'):
                continue
            name, count = line.split()
            result[name] = int(count)
    return result

print read_inventory('inventory-03.txt')









    



{'H': 4, 'O': 3, 'He': 1}

Now read formulas (taking blank lines and comments into account)
Function is complex enough that we'll come back later and simplify it



In [13]:

    
def read_formulas(formula_file):
    result = {}
    with open(formula_file, 'r') as reader:
        for line in reader:
            line = line.strip()
            if (not line) or line.startswith('#'):
                continue
            name, atoms = line.split(':')
            name = name.strip()
            atoms = atoms.strip().split()
    
            formula = {}
            for i in range(0, len(atoms), 2):
                symbol = atoms[i].strip()
                count = int(atoms[i+1])
                formula[symbol] = count
            result[name] = formula

    return result

And test:



In [14]:

    
print read_formulas('formulas-03.txt')









    



{'water': {'H': 2, 'O': 1}, 'hydrogen': {'H': 2}, 'helium': {'He': 1}}

Calculate how many molecules of each kind we could make given available inventory
- Again, calculate for each molecule independently



In [15]:

    
def calculate_counts(inventory, formulas):
    counts = {}
    for name in formulas:
        counts[name] = dict_divide(inventory, formulas[name])
    return counts

And write helper function:



In [16]:

    
def dict_divide(inventory, molecule):
    number = None
    for atom in molecule:
        required = molecule[atom]
        available = inventory.get(atom, 0)
        limit = available / required
        if (number is None) or (limit < number):
            number = limit
    return number

Again, initializing with None rather than some arbitrary large value

Finally, display counts in alphabetical order:



In [17]:

    
def show_counts(counts):
    names = counts.keys()
    names.sort()
    for name in names:
        print name, counts[name]

Simplest test we could do is no inventory and no formulas



In [18]:

    
!cat inventory-00.txt



In [19]:

    
!cat formulas-00.txt



In [20]:

    
main('inventory-00.txt', 'formulas-00.txt')

No output is correct, but hardly reassuring
What about no inventory, and one formula?



In [21]:

    
!cat formulas-01.txt









    



helium : He 1



In [22]:

    
main('inventory-00.txt', 'formulas-01.txt')









    



helium 0

That's encouraging
Try some inventory



In [23]:

    
!cat inventory-01.txt



In [24]:

    
main('inventory-01.txt', 'formulas-01.txt')









    



helium 1

Now something more complex



In [25]:

    
!cat inventory-02.txt



In [26]:

    
!cat formulas-02.txt









    



helium : He 1
water : H 2 O 1



In [27]:

    
main('inventory-02.txt', 'formulas-02.txt')









    



helium 1
water 0

Use the inventory file that included some oxygen



In [28]:

    
!cat inventory-03.txt









    



# Atomic inventory file
He 1
H 4
O 3



In [29]:

    
main('inventory-03.txt', 'formulas-02.txt')









    



helium 1
water 2

Now refactor
Write a single function to read interesting lines from files
Then rewrite read_inventory and read_formulas to use it



In [33]:

    
def readlines(filename):
    result = []
    with open(filename, 'r') as reader:
        for line in reader:
            line = line.strip()
            if line and (not line.startswith('#')):
                result.append(line)
    return result



In [34]:

    
def read_inventory(inventory_file):
    result = {}
    for line in readlines(inventory_file):
        name, count = line.split()
        result[name] = int(count)
    return result

6 lines instead of 10
And only one level of nesting instead of two



In [ ]:

    
def read_formulas(formula_file):
    result = {}
    for line in readlines(formula_file):
        name, atoms = line.split(':')
        name = name.strip()
        atoms = atoms.strip().split()

        formula = {}
        for i in range(0, len(atoms), 2):
            symbol = atoms[i].strip()
            count = int(atoms[i+1])
            formula[symbol] = count
        result[name] = formula

    return result

15 lines instead of 19, but all those lines are devoted to reading meaningful content
We can do better still



In [ ]:

    
def read_formulas(formula_file):
    result = {}
    for line in readlines(formula_file):
        name, atoms = line.split(':')
        name = name.strip()
        result[name] = make_formula(atoms)
    return result

def make_formula(atoms):
    formula = {}
    atoms = atoms.strip().split()
    for i in range(0, len(atoms), 2):
        symbol = atoms[i].strip()
        count = int(atoms[i+1])
        formula[symbol] = count
    return formula

16 lines instead of 15, but each function does one job with one level of nesting

Now test



In [35]:

    
main('inventory-00.txt', 'formulas-00.txt')



In [36]:

    
main('inventory-01.txt', 'formulas-01.txt')









    



helium 1



In [37]:

    
main('inventory-02.txt', 'formulas-02.txt')









    



helium 1
water 0



In [38]:

    
main('inventory-03.txt', 'formulas-03.txt')









    



helium 1
hydrogen 2
water 2

Could have (should have) tested right after refactoring read_inventory

Key Points

Whenever names are used to label things, consider using dictionaries to store them.
Use nested dictionaries to store hierarchical values (like molecule names and atomic counts).
Get it right, then refactor to make each part simple.
Test after each refactoring step.