Sets and Dictionaries in Python: Storage (Learner Version)

Objectives

  • Draw a diagram showing how hash tables are implemented, and correctly label the main parts.
  • Explain the purpose of a hash function.
  • Explain why using mutable values as keys in a hash table can cause problems.
  • Correctly identify the error messages Python produces when programs try to put mutable values in hashed data structures.
  • Explain the similarities and differences between tuples and lists.
  • Explain why using tuples is better than concatenating values when storing multi-part data in hashed data structures.

Lesson

  • Can add a string to a set:

In [1]:
things = set()
things.add("lithium")
print things


set(['lithium'])
  • But can't add list:

In [2]:
things.add([1, 2, 3])


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-5c34a9426fc7> in <module>()
----> 1 things.add([1, 2, 3])

TypeError: unhashable type: 'list'
  • Computer allocates a block of memory to store set
  • Then uses a hash function to determine where to store each item

  • If we put a reference to a list into a set, then change the list, the reference is in the wrong place

  • Can only happen with mutable values (like lists)
  • Can't affect immutable values (like numbers and strings)
  • Use tuples to store multi-part values

In [3]:
full_name = ('Charles', 'Darwin')
print full_name[0]


Charles

In [4]:
print len(full_name)


2
  • Cannot modify tuple after creation

In [6]:
full_name[0] = 'Erasmus'


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-a7488b670820> in <module>()
----> 1 full_name[0] = 'Erasmus'

TypeError: 'tuple' object does not support item assignment
  • So tuples can safely be added to sets

In [7]:
names = set()
names.add(('Charles', 'Darwin'))
print names


set([('Charles', 'Darwin')])

Key Points

  • Sets are stored in hash tables, which guarantee fast access for arbitrary values.
  • The values in sets must be immutable to prevent hash tables misplacing them.
  • Use tuples to store multi-part values in sets.