Collections

Let's start with lists


In [ ]:
spam = ["eggs", 7.12345]  # This is a list, a comma-separated sequence of values between square brackets
print spam

In [ ]:
print type(spam)

In [ ]:
eggs = [spam,
        1.2345,
        "fooo"]           # No problem with multi-line declaration
print eggs
  • You can mix all kind of types inside a list
    • Even other lists, of course

In [ ]:
spam = []  # And this is an empty list
print spam

What about tuples?


In [ ]:
spam = ("eggs", 7.12345)  # This is a tuple, a comma-separated sequence of values between parentheses
print spam

print type(spam)

In [ ]:
eggs = (spam,
        1.2345,
        "fooo")           # Again, no problem with multiline declaration
print eggs

In [ ]:
spam = ("eggs", )  # Single item tuple requires the comma
print spam

In [ ]:
# what will be the output of this
spamOne = ("eggs")
print spamOne

In [ ]:
spam = "eggs",  # Actually, it is enough with the comma
print spam

In [ ]:
spam = "eggs", 7.12345  # This is called tuple packing
print spam

In [ ]:
val1, val2 = spam  # And this is the opposite, tuple unpacking
print val1
print val2

What about both together?


In [ ]:
spam = "spam"
eggs = "eggs"
eggs, spam = spam, eggs
print spam
print eggs

Let's go back to lists


In [ ]:
spam = ["eggs", 7.12345]
val1, val2 = spam         # Unpacking also works with lists (but packing always generates tuples)
print val1
print val2

In [ ]:
# And what about strings? Remember they are sequences too...

spam = "spam"
s, p, a, m = spam  # Unpacking even works with strings
print s
print p
print a
print m

str and unicode are also sequences

Python ordered sequence types (arrays in other languages, not linked lists):

Python ordered sequence types (arrays in other languages, not linked lists):

  • They are arrays, not linked lists, so they have constant O(1) time for index access
  • list:
    • Comma-separated with square brackets
    • Mutable
    • Kind of dynamic array implementation (reserve space in advanced)
      • Resizing is O(n)
      • Arbitrary insertion is O(n)
      • Appending is amortized O(1)
  • tuple:
    • Comma-separated with parentheses
    • Parentheses only required in empty tuple
    • Immutable
    • Slightly better traversing perfomance than lists
  • str and unicode:

    • One or three single or double quotes
    • They have special methods
    • Immutable
  • Standard library also provides other bult-in collection formats:

    • set and frozenset: unordered, without repeated values (content must be hashable)
      • High performant in operations like intersection, union, difference, membership check
    • bytearray, buffer, xrange: special sequences for concrete use cases
    • collections module, with deque, namedtuple, Counter, OrderedDict and defaultdict

Let's a play a bit with sequences operations


In [ ]:
spam = ["1st", "2nd", "3rd", "4th", "5th"]
eggs = (spam, 1.2345, "fooo")

In [ ]:
print "eggs" in spam
print "fooo" not in eggs
print "am" in "spam"      # Check items membership
print "spam".find("am")   # NOT recommended for membership

In [ ]:
print spam.count("1st")   # Count repetitions (slow)

In [ ]:
print spam + spam
print eggs + eggs
print "spam" + "eggs"  # Concatenation (shallow copy), must be of the same type

In [ ]:
print spam * 5
print eggs * 3
print "spam" * 3  # Also "multiply" creating shallow copies concatenated

In [ ]:
print len(spam)
print len(eggs)
print len("spam")  # Obtain its length

In [ ]:
# Let's obtain min and max values (slow)
print min([5, 6, 2])
print max("xyzw abcd XYZW ABCD")

In [ ]:
# Let's see how indexing works
spam = ["1st", "2nd", "3rd", "4th", "5th"]
eggs = (spam, 1.2345, "fooo")

In [ ]:
print spam[0]
print eggs[1]
print "spam"[2]  # Access by index, starting from 0 to length - 1, may raise an exception

In [ ]:
print spam[-1]
print eggs[-2]
print "spam"[-3]  # Access by index, even negative

In [ ]:
print eggs[0]
print eggs[0][0]
print eggs[0][0][-1]  # Concatenate index accesses

In [ ]:
# Let's see how slicing works
spam = ("1st", "2nd", "3rd", "4th", "5th")
print spam[1:3]                             # Use colon and a second index for slicing
print type(spam[1:4])                       # It generates a brand new object (shallow copy)

In [ ]:
spam = ["1st", "2nd", "3rd", "4th", "5th"]
print spam[:3]
print spam[1:7]
print spam[-2:7]                            # Negative indexes are also valid
print spam[3:-2]

In [ ]:
print spam[:]                               # Without indexes it performs a shallow copy

print spam[1:7:2]                           # Use another colon and a third int to specify the step

print spam[::2]
print spam[::-2]                             # A negative step traverse the sequence in the other way

print spam[::-1]                             # Useful to reverse a sequence

In slicing Python is able to cleverly set the indexes

  • No IndexError when slicing index is out of range
  • First (0) and last (-1) index is automatically filled
  • Step is 1 by default and does not need to be multiple of sequence length

In [ ]:
# Let's try something different

spam = ["1st", "2nd", "3rd", "4th", "5th"]
spam[3] = 1
print spam                                  # Index direct modification, may raise an exception

Let's see some slice modifications


In [ ]:
spam = [1, 2, 3, 4, 5]
eggs = ['a', 'b', 'c']
spam[1:3] = eggs
print spam              # We can use slicing here too!

In [ ]:
spam = [1, 2, 3, 4, 5, 6, 7, 8]
eggs = ['a', 'b', 'c']
spam[1:7:2] = eggs
print spam                       # We can use even slicing with step!!

In [ ]:
spam = [1, 2, 3, 4, 5]
spam.append("a")
print spam              # We can append an element at the end (amortized O(1))

In [ ]:
spam = [1, 2, 3, 4, 5]
eggs = ['a', 'b', 'c']
spam.extend(eggs)
print spam              # We can append another sequence elements at the end (amortized O(1))

In [ ]:
spam = [1, 2, 3, 4, 5]
eggs = ['a', 'b', 'c']
spam.append(eggs)
print spam              # Take care to not mix both commands!!

In [ ]:
spam = [1, 2, 3, 4, 5]
spam.insert(3, "a")
print spam              # The same like spam[3:3] = ["a"]

In [ ]:
spam = [1, 2, 3, 4, 5]
print spam.pop()
print spam              # Pop (remove and return) last item
print spam.pop(2)
print spam              # Pop (remove and return) given item

In [ ]:
spam = [1, 2, 3, 4, 5]
del spam[3]
print spam              # Delete an item

In [ ]:
spam = tuple([1, 2, 3, 4, 5, 6, 7, 8])
eggs = list(('a', 'b', 'c'))            # Shallow copy constructors
print spam
print eggs

DICTIONARIES


In [ ]:
spam = {"one": 1, "two": 2, "three": 3}  # This is a dictionary
print spam

In [ ]:
print type(spam)

In [ ]:
eggs = {1: "one",
        2: "two",
        3: "three"}  # Again, no problem with multiline declaration
print eggs

 Still more ways to declare dictionaries


In [ ]:
spam = dict(one=1, two=2, three=3)  # Use keyword arguments (we will talk about them in short)
print spam

In [ ]:
eggs = dict([(1, "one"), (2, "two"), (3, "three")])  # Sequence of two elements sequences (key and object)
print eggs                                           # Note that these tuples require the parentheses just to group

In [ ]:
spam = dict(eggs)  # Shallow copy constructor
print spam

Python mappings

  • dict:
    • Comma-separated list of hashable key, colon and arbitrary object between curly brackets
    • Mutable
    • Unordered
    • Access by key
    • Heavily optimized:
      • Creation with n items is O(n)
      • Arbitrary access is O(1)
      • Adding a new key is amortized O(1)
  • dictview:
    • Dynamic subset of a dictionary data which is kept updated
    • Improved in Py3k (specially items, keys and values methods)

Let's play a bit with dictionaries


In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print spam["two"]                        # Access by key, may raise an exception

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print "two" in spam                      # Check keys membership
print 2 not in spam                      # Check keys membership

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print spam.get("two")
print spam.get("four")
print spam.get("four", 4)                # Safer access by key, never raises an exception, optional default value

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print spam.keys()                        # Retrieve keys list (copy) in arbitrary order
print spam.values()                      # Retrieve values list (copy) in arbitrary order
print spam.items()                       # Retrieve key, values pairs list (copy) in arbitrary order

Let's play a bit with inplace modifications of dicts content


In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
spam["two"] = 22                         # Set or replace a key value
spam["four"] = 44                        # Set or replace a key value
print spam

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print spam.popitem()
print spam

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
print spam.pop("two")                    # Pop (remove and return) given item, may raise an exception
print spam.pop("four", 4)                # Pop (remove and return) given item with optional default value
print spam

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
eggs = {"three": 33, "four": 44}
spam.update(eggs)                        # Update dictionary with other dict content
print spam

In [ ]:
spam = {"one": 1, "two": 2, "three": 3}
eggs = {1: "one", 2: "two", 3: "three"}
spam.update(two=22, four=44)             # Like dict constructor, it accepts keyword arguments
eggs.update([(0, "ZERO"), (1, "ONE")])   # Like dict constructor, it accepts a sequence of pairs
print spam
print eggs