Data aliasing

One of the trickiest things about programming is figuring out exactly what data a variable refers to. Remember that we use names like data and salary to represent memory cells holding data values. The names are easier to remember than the physical memory addresses, but we can get fooled. For example, it's obvious that two variables x and y can both have the same integer value 7:


In [1]:
x = y = 7
print(x,y)


7 7

But, did you know that they are both referring to the same 7 object? In other words, variables in Python are always references or pointers to data so the variables are not technically holding the value. Pointers are like phone numbers that "point at" phones but pointers themselves are not the phone itself.

We can uncover this secret level of indirection using the built-in id(x) function that returns the physical memory address pointed out by x. To demonstrate that, let's ask what x and y point at:


In [2]:
x = y = 7
print(id(x))
print(id(y))


4468307488
4468307488

Wow! They are the same. That number represents the memory location where Python has stored the shared 7 object.

Of course, as programmers we don't think of these atomic elements as referring to the same object; just keep in mind that they do. We are more likely to view them as copies of the same number, as lolviz shows visually:


In [21]:
from lolviz import *
callviz(varnames=['x','y'])


Out[21]:
G node4509156840 globals x 7 y 7

Let's verify that the same thing happens for strings:


In [4]:
name = 'parrt'
userid = name # userid now points at the same memory as name
print(id(name))
print(id(userid))


4506178760
4506178760

Ok, great, so we are in fact sharing the same memory address to hold the string 'parrt' and both of the variable names point at that same shared space. We call this aliasing, in the language implementation business.

Things only get freaky when we start changing shared data. This can't happen with integers and strings because they are immutable (can't be changed). Let's look at two identical copies of a single list:


In [5]:
you = [1,3,5]
me  = [1,3,5]
print(id(you))
print(id(me))
callviz(varnames=['you','me'])


4508962504
4508962440
Out[5]:
G node140553757395624 globals you     me     node4508962504 0 1 2 1 3 5 node140553757395624:c->node4508962504 node4508962440 0 1 2 1 3 5 node140553757395624:c->node4508962440

Those lists have the same value but live a different memory addresses. They are not aliased; they are not shared. Consequently, changing one does not change the other:


In [6]:
you = [1,3,5]
me  = [1,3,5]
print(you, me)
you[0] = 99
print(you, me)


[1, 3, 5] [1, 3, 5]
[99, 3, 5] [1, 3, 5]

On the other hand, let's see what happens if we make you and me share the same copy of the list (point at the same memory location):


In [23]:
you = [1,3,5]
me  = you
print(id(you))
print(id(me))
print(you, me)
callviz(varnames=['you','me'])


4509139464
4509139464
[1, 3, 5] [1, 3, 5]
Out[23]:
G node4507753064 globals you     me     node4509139464 0 1 2 1 3 5 node4507753064:c->node4509139464 node4507753064:c->node4509139464

Now, changing one appears to change the other, but in fact both simply refer to the same location in memory:


In [8]:
you[0] = 99
print(you, me)
callviz(varnames=['you','me'])


[99, 3, 5] [99, 3, 5]
Out[8]:
G node4509061192 globals you     me     node4508962504 0 1 2 99 3 5 node4509061192:c->node4508962504 node4509061192:c->node4508962504

Don't confuse changing the pointer to the list with changing the list elements:


In [24]:
you = [1,3,5]
me  = you
callviz(varnames=['you','me'])


Out[24]:
G node140553757475240 globals you     me     node4508962632 0 1 2 1 3 5 node140553757475240:c->node4508962632 node140553757475240:c->node4508962632

In [27]:
me = [9,7,5] # doesn't affect `you` at all
print(you)
print(me)
callviz(varnames=['you','me'])


[1, 3, 5]
[9, 7, 5]
Out[27]:
G node140553734665048 globals you     me     node4508962632 0 1 2 1 3 5 node140553734665048:c->node4508962632 node4509099208 0 1 2 9 7 5 node140553734665048:c->node4509099208

This aliasing of data happens a great deal when we pass lists or other data structures to functions. Passing list Quantity to a function whose argument is called data means that the two are aliased. We'll look at this in more detail in the "Visibility of symbols" section of Organizing your code with functions.

Shallow copies


In [35]:
X = [[1,2],[3,4]]
Y = X.copy() # shallow copy
callviz(varnames=['X','Y'])


Out[35]:
G node4508966360 globals X     Y     node4508961416 0 1 node4508966360:c->node4508961416 node4507679688 0 1 node4508966360:c->node4507679688 node4508897096 0 1 1 2 node4508961416:0->node4508897096:w node4509115272 0 1 3 4 node4508961416:1->node4509115272:w node4507679688:0->node4508897096:w node4507679688:1->node4509115272:w

In [37]:
X[0][1] = 99
callviz(varnames=['X','Y'])
print(Y)


[[1, 99], [3, 4]]