Sets and Dictionaries in Python: Nanotech Inventory (Learner Version)

Objectives

  • Create and manipulate nested dictionaries.
  • Explain the similarities and differences between nested dictionaries and nested lists.

Lesson

  • How many molecules of various kinds can we make from the atoms in our warehouse?
  • Inventory stored in files like this:

In [7]:
!cat inventory-03.txt


# Atomic inventory file
He 1
H 4
O 3

  • Formulas stored in files like this:

In [8]:
!cat formulas-03.txt


# Molecular formula file

helium : He 1
water : H 2 O 1
hydrogen : H 2

  • Result is molecule names and counts
helium 1
hydrogen 2
water 2
  • Main body of program reads files, calculates result, and prints it:

In [9]:
def main(inventory_file, formula_file):
    inventory = read_inventory(inventory_file)
    formulas = read_formulas(formula_file)
    counts = calculate_counts(inventory, formulas)
    show_counts(counts)
  • Inventory format is simpler than formula format, so write that function first

In [10]:
def read_inventory(inventory_file):
    result = {}
    with open(inventory_file, 'r') as reader:
        for line in reader:
            name, count = line.strip().split()
            result[name] = int(count)
    return result
  • And test:

In [11]:
print read_inventory('inventory-03.txt')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-c05b7b912bfb> in <module>()
----> 1 print read_inventory('inventory-03.txt')

<ipython-input-10-d5dd028eb45b> in read_inventory(inventory_file)
      3     with open(inventory_file, 'r') as reader:
      4         for line in reader:
----> 5             name, count = line.strip().split()
      6             result[name] = int(count)
      7     return result

ValueError: too many values to unpack
  • Whoops: forgot comments and blank lines

In [12]:
def read_inventory(inventory_file):
    result = {}
    with open(inventory_file, 'r') as reader:
        for line in reader:
            line = line.strip()
            if (not line) or line.startswith('#'):
                continue
            name, count = line.split()
            result[name] = int(count)
    return result

print read_inventory('inventory-03.txt')


{'H': 4, 'O': 3, 'He': 1}
  • Now read formulas (taking blank lines and comments into account)
  • Function is complex enough that we'll come back later and simplify it

In [13]:
def read_formulas(formula_file):
    result = {}
    with open(formula_file, 'r') as reader:
        for line in reader:
            line = line.strip()
            if (not line) or line.startswith('#'):
                continue
            name, atoms = line.split(':')
            name = name.strip()
            atoms = atoms.strip().split()
    
            formula = {}
            for i in range(0, len(atoms), 2):
                symbol = atoms[i].strip()
                count = int(atoms[i+1])
                formula[symbol] = count
            result[name] = formula

    return result
  • And test:

In [14]:
print read_formulas('formulas-03.txt')


{'water': {'H': 2, 'O': 1}, 'hydrogen': {'H': 2}, 'helium': {'He': 1}}
  • Calculate how many molecules of each kind we could make given available inventory
    • Again, calculate for each molecule independently

In [15]:
def calculate_counts(inventory, formulas):
    counts = {}
    for name in formulas:
        counts[name] = dict_divide(inventory, formulas[name])
    return counts
  • And write helper function:

In [16]:
def dict_divide(inventory, molecule):
    number = None
    for atom in molecule:
        required = molecule[atom]
        available = inventory.get(atom, 0)
        limit = available / required
        if (number is None) or (limit < number):
            number = limit
    return number
  • Again, initializing with None rather than some arbitrary large value
  • Finally, display counts in alphabetical order:

In [17]:
def show_counts(counts):
    names = counts.keys()
    names.sort()
    for name in names:
        print name, counts[name]
  • Simplest test we could do is no inventory and no formulas

In [18]:
!cat inventory-00.txt

In [19]:
!cat formulas-00.txt

In [20]:
main('inventory-00.txt', 'formulas-00.txt')
  • No output is correct, but hardly reassuring
  • What about no inventory, and one formula?

In [21]:
!cat formulas-01.txt


helium : He 1

In [22]:
main('inventory-00.txt', 'formulas-01.txt')


helium 0
  • That's encouraging
  • Try some inventory

In [23]:
!cat inventory-01.txt


He 1

In [24]:
main('inventory-01.txt', 'formulas-01.txt')


helium 1
  • Now something more complex

In [25]:
!cat inventory-02.txt


He 1
H 4

In [26]:
!cat formulas-02.txt


helium : He 1
water : H 2 O 1

In [27]:
main('inventory-02.txt', 'formulas-02.txt')


helium 1
water 0
  • Use the inventory file that included some oxygen

In [28]:
!cat inventory-03.txt


# Atomic inventory file
He 1
H 4
O 3

In [29]:
main('inventory-03.txt', 'formulas-02.txt')


helium 1
water 2
  • Now refactor
  • Write a single function to read interesting lines from files
  • Then rewrite read_inventory and read_formulas to use it

In [33]:
def readlines(filename):
    result = []
    with open(filename, 'r') as reader:
        for line in reader:
            line = line.strip()
            if line and (not line.startswith('#')):
                result.append(line)
    return result

In [34]:
def read_inventory(inventory_file):
    result = {}
    for line in readlines(inventory_file):
        name, count = line.split()
        result[name] = int(count)
    return result
  • 6 lines instead of 10
  • And only one level of nesting instead of two

In [ ]:
def read_formulas(formula_file):
    result = {}
    for line in readlines(formula_file):
        name, atoms = line.split(':')
        name = name.strip()
        atoms = atoms.strip().split()

        formula = {}
        for i in range(0, len(atoms), 2):
            symbol = atoms[i].strip()
            count = int(atoms[i+1])
            formula[symbol] = count
        result[name] = formula

    return result
  • 15 lines instead of 19, but all those lines are devoted to reading meaningful content
  • We can do better still

In [ ]:
def read_formulas(formula_file):
    result = {}
    for line in readlines(formula_file):
        name, atoms = line.split(':')
        name = name.strip()
        result[name] = make_formula(atoms)
    return result

def make_formula(atoms):
    formula = {}
    atoms = atoms.strip().split()
    for i in range(0, len(atoms), 2):
        symbol = atoms[i].strip()
        count = int(atoms[i+1])
        formula[symbol] = count
    return formula
  • 16 lines instead of 15, but each function does one job with one level of nesting
  • Now test

In [35]:
main('inventory-00.txt', 'formulas-00.txt')

In [36]:
main('inventory-01.txt', 'formulas-01.txt')


helium 1

In [37]:
main('inventory-02.txt', 'formulas-02.txt')


helium 1
water 0

In [38]:
main('inventory-03.txt', 'formulas-03.txt')


helium 1
hydrogen 2
water 2
  • Could have (should have) tested right after refactoring read_inventory

Key Points

  • Whenever names are used to label things, consider using dictionaries to store them.
  • Use nested dictionaries to store hierarchical values (like molecule names and atomic counts).
  • Get it right, then refactor to make each part simple.
  • Test after each refactoring step.