Deep and Shallow Copies

This short lecture addresses a common source of bugs in python (and other languages). This is illustrated below.


In [1]:
# BUGGY Code
# Create a list
list1 = [1, 2, 3]
# Make a copy
list2 = list1
# Update the copy
list2.append(4)
# BUG: list1 has changed too!
print(list1, list2)


[1, 2, 3, 4] [1, 2, 3, 4]

There are different ways in which an object can be copied. Typically, these are characterized as:

  • Deep copy: Allocate new memory and copy the object to this memory. Modifications to this memory has no effect on the original object.
  • Shallow copy: Create a pointer to the object. Modifications to the object are seen by all code with a pointer to the object.

In general, assigning one python variable to another creates a shallow copy.


In [2]:
# Create a list
list1 = [1, 2, 3]
# Make a copy
list2 = list1
# Use "id" to see if list1 and list2 point to the same object
print(id(list1), id(list2))


140209619885000 140209619885000

To get a deep copy, you must use a special method. For lists use the list function. For pandas.DataFrame, use the copy method.


In [4]:
# Making a deep copy of a list.
list1 = [1, 2, 3]
list2 = list(list1)  # Make a copy of var3 to manipulate it
print(id(list1), id(list2))
list2.append(-1)
# Changing var4 does not change var3
print(list1, list2)


140209619467464 140209619467528
[1, 2, 3] [1, 2, 3, -1]

In [8]:
# Deep and shallow copies for DataFrames
import pandas as pd
df1 = pd.DataFrame({'a':range(10), 'b': range(10)})
df1
df2 = df1
del df2['a']
df1


Out[8]:
b
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

In [9]:
print(id(df1), id(df2))


140209091735056 140209091735056