Sets and Dictionaries in Python: Sets (Learner Version)

Objectives

  • Explain why some programs that use lists become proportionally slower as data sizes increase.
  • Explain the three adjectives in "unordered collection of distinct values".
  • Use a set to eliminate duplicate values from data.

Lesson

  • Keeping track of unique atoms using a list:

In [1]:
def another_atom(seen, atom):
    for i in range(len(seen)):
        if seen[i] == atom:
            return # atom is already present, so do not re-add
    seen.append(atom)
  • Average time to check or insert grows in proportion to the number of distinct values V
  • So with N insertions, running time is NV/2
  • If most values are distinct, this is N^2/2
  • Better way to solve the problem is to use a set
  • An unordered collection of distinct values
  • Create a set:

In [1]:
primes = {3, 5, 7}

  • Must use set() to create an empty set instead of {}

In [2]:
even_primes = set()
  • Create some sets for examples

In [4]:
ten = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
lows = {0, 1, 2, 3, 4}
odds = {1, 3, 5, 7, 9}
  • Display values

In [6]:
print lows


set([0, 1, 2, 3, 4])
  • Usual set operations work as expected

In [7]:
print lows.union(odds)


set([0, 1, 2, 3, 4, 5, 7, 9])

In [8]:
print lows.intersection(odds)


set([1, 3])

In [9]:
print lows.difference(odds)


set([0, 2, 4])

In [10]:
print lows.symmetric_difference(odds)


set([0, 2, 4, 5, 7, 9])

In [11]:
print lows.issubset(ten)


True

In [12]:
print lows.issuperset(odds)


False

In [13]:
print len(odds)


5

In [15]:
print 6 in odds


False

In [16]:
lows.add(9)
print lows


set([0, 1, 2, 3, 4, 9])

In [17]:
lows.remove(0)
print lows


set([1, 2, 3, 4, 9])

In [18]:
lows.clear()
print lows


set([])
  • Re-initialize example sets

In [19]:
ten = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
lows = {0, 1, 2, 3, 4}
odds = {1, 3, 5, 7, 9}
  • Use operators instead of methods

In [20]:
print lows - odds # difference


set([0, 2, 4])

In [21]:
print lows & odds # intersection


set([1, 3])

In [22]:
print lows <= ten # subset


True

In [23]:
print lows < ten # strict subset


True

In [24]:
print lows >= ten # superset


False

In [25]:
print lows > ten # strict superset


False

In [26]:
print lows ^ odds # symmetric difference


set([0, 2, 4, 5, 7, 9])

In [27]:
print lows | odds # union


set([0, 1, 2, 3, 4, 5, 7, 9])

In [28]:
def unique_atoms(filename):
    atoms = set()
    with open(filename, 'r') as source:
        for line in source:
            name = line.strip()
            atoms.add(name)
    return atoms

In [2]:
!cat some_atoms.txt


Na
Fe
Na
Si
Pd
Na

In [30]:
print unique_atoms('some_atoms.txt')


set(['Na', 'Si', 'Fe', 'Pd'])
  • Can construct a set from anything that can be looped over

In [31]:
print set('lithium')


set(['i', 'h', 'm', 'l', 'u', 't'])
  • But why are letters out of order?

Key Points

  • Use sets to store distinct unique values.
  • Create sets using set() or {v1, v2, ...}.
  • Sets are mutable, i.e., they can be updated in place like lists.
  • A loop over a set produces each element once, in arbitrary order.
  • Use sets to find unique things.