Lists are simultaneously among the most useful and most confusing data structures in Python. Why? Because of mutability.
Mutating. So what is that and why can't we use lists as keys in dictionaries? Let's see by creating some lists and "mutating" them. Though do be careful: mutations may be hazardous to your (program's) health!
In [ ]:
a = list(range(5))
print ("The list we created:", a, "of length", len(a))
b = list(range(6,10))
print ("The second list we created:", b, "of length", len(b))
a[1:3] = b # Line 7
print ("The first list after we changed a couple of elements is", a, "with length", len(a))
Interesting... We wanted to change two elements (line 7), but added four instead! Rather, we mutated list a with list b. "Mutability" means, simply, that we can change the structure of the object.
It's easy to change, or mutate, a list. Virtually any action on the list changes its structure, length, the order of its elements, or something else that is likely important. After mutation, it's the same list with the same variable name, but different content.
Hashing. Besides being mutable, objects may or may not be hashable.
As you should know by know, Python provides built-in collections, like dict and set, which provide "high-speed" access to their elements. The way they do that is through the __hash__() method. A hash function is one that takes an object as input and returns an (arbitrary) number. You can read more about hash functions on Wikipedia. If an object has a __hash__() method then it is hashable. We can check how it works trying to call hash() function on different data types:
In [ ]:
print ("hash of int(42) is", hash(42))
print ("hash of float(42.001) is", hash(42.001))
print ("hash of str('42') is", hash('42'))
try:
print ("hash of list(42) is", hash([42]))
except TypeError:
print("TypeError: unhashable type: 'list'")
We see that we can get a hash from int (exact int value), from float, and string (although hash for them is not as obvious as for int), but not for a list. You can see that trying to call hash() function on a list returns a
TypeError. That happens because list doesn't implement __hash__() method, which is a consequence of list mutability. Why? Because if a list can mutate, there isn't a way to create a unique mapping of a given list to a consistent value, because the list could change at any time. So, when you have some collection of elements in a list and you need to make it hashable (for example, to use a key for dict), turn it into tuple:
In [ ]:
print ("hash of tuple(42, '42') is", hash((42, '42')))
Aliasing. Another important concept is aliasing. Let's start with example:
In [ ]:
a = list(range(5))
print ("The list we created:", a)
b = a
print ("The second list we created:", b)
a[3] = 10
print ("The first list after changing it:", a)
print ("And the second list:", b)
What happened?
Apparently, when we create a new variable and assign it the name of another variable, Python does not create a new variable. Instead it creates an alias, which is a new name that points to the same object. Literally, both a and b point to the same list, so when we change the list using the name a, accessing it through the name b also reflects these changes.
To avoid such behavior we need to make a copy using, for instance, a copy constructor, like so:
In [ ]:
a = list(range(5))
print ("The list we created:", a)
b = a[:]
print ("The second list we created:", b)
a[3] = 10
print ("The first list after changing it:", a)
print ("And the second list:", b)
Almost all list methods in Python do not return a new list, but modify (or mutate) it. For that reason, if you have several aliases, all of them reflect the changes after list mutation.
In [ ]:
a = list(range(5))
print ("The list we created:", a)
b = a
print ("The second list we created:", b)
b.append(11)
a.extend([12, 13])
print ("The first list after mutation:", a)
print ("The second list after mutation:", b)
In [ ]:
a = list(range(5))
b = a + [11, 12]
print ("The list we created:", a)
print ("The second list we created:", b)
b.append(21)
a.extend([22, 23])
print ("The first list after mutation:", a)
print ("The second list after mutation:", b)
Exercise. Aliasing also happens at function call boundaries. Try to predict what the following code will do before you run it.
In [ ]:
def rem_sublist(L, i, j):
L[i:j] = []
a = list(range(10))
print(a)
rem_sublist(a, 2, 5)
print(a)
In [ ]: