Fan Fullerene has just joined Molecules'R'Us, a nanotechnology startup that fabricates molecules using only the highest quality atoms. His first job is to build a simple inventory management system that compares incoming orders for molecules to the stock of atoms in the company's supercooled warehouse to see how many of those molecules we can build. For example, if the warehouse holds 20 hydrogen atoms, 5 oxygen atoms, and 11 nitrogen atoms, Fan could make 10 water molecules (H2O) or 6 ammonia molecules (NH3), but could not make any methane (CH4) because there isn't any carbon.
Fan could solve this problem using the tools we've seen so far. As we'll see, though, it's a lot more efficient to do it using a different data structure. And "efficient" means both "takes less programmer time to create" and "takes less computer time to execute": the data structures introduced in this chapter are both simpler to use and faster than the lists most programmers are introduced to first.
The ostensible goal of this set of lessons is to introduce learners to non-linear data structures. Most have ony ever seen arrays or lists, i.e., things that are accessed using sequential numeric indices. Sets and dictionaries are usually their first exposure to accessing content by value rather than by location, and to the bigger idea that there are lots of other data structures they might want to learn about. (Unfortunately, there still isn't a good data structure handbook for Python programmers that we can point them at.)
These lessons also introduce JSON as a general-purpose data format that requires less effort to work with than flat text or CSV. We discuss its shortcomings as well as its benefits to help learners see what forces are at play when designing and/or choosing data representations.
Finally, these lessons are also our first chance to introduce the idea of computational complexity via back-of-the-envelope calculations of how the number of steps required to look things up in an unordered list grows with the number of things being looked up. We return to this idea in the extended example of invasion percolation, and to the notion that algorithmic improvements help more than tuning code, but this is a chance to touch on the idea in classes that don't get to that example. The discussion of hash tables is also good preparation for the discussion of relational databases, but isn't required.
Everything in this lesson except the final example on phylogenetic trees can be covered in two hours, assuming that only three short exercises are given (one for sets, one for basic dictionary operations, and one related to aggregation).
Basic data types (strings, numbers, lists); loops; file I/O; conditionals; string operations; references and aliasing; creating functions; top-down development.
FIXME: add links
Every programmer meets lists (or arrays or matrices) early in her career. Many in science never meet sets and dictionaries, and that's a shame: they often make programs easier to write and faster to run at the same time.
Before we leave this topic, try running the function globals at an
interactive Python prompt:
In [1]:
globals()
Out[1]:
That's right—Python actually stores the program's variables in a dictionary. In fact, it uses one dictionary for the global variables and one for each currently-active function call:
In [3]:
def example(first, second):
print 'locals in example:', locals()
example(22, 33)
You now know everything you need to know in order to build a programming language of your own. But please don't: the world will be much better off if you keep doing science instead.