This notebook explores the speed of searching for values in sets and lists.
After reading this notebook, watch Brandon Rhodes' videos All Your Ducks In A Row: Data Structures in the Standard Library and Beyond and The Mighty Dictionary.
In [1]:
def make_list(n):
if True:
return list(range(n))
else:
return list(str(i) for i in range(n))
In [2]:
n = int(25e6)
# n = 5
m = (0, n // 2, n-1, n)
a_list = make_list(n)
a_set = set(a_list)
In [3]:
n, m
Out[3]:
In [4]:
# Finding something that is in a set is fast.
# The key one is looking for has little effect on the speed.
beginning = 0
middle = n//2
end = n-1
%timeit beginning in a_set
%timeit middle in a_set
%timeit end in a_set
# Finding something that is _not_ in a set is also fast.
%timeit n in a_set
In [5]:
# Searching for something in a list
# starts at the beginning and compares each value.
# The search time depends on where the value is in the list.
# That can be slow.
beginning = 0
middle = n//2
end = n-1
%timeit beginning in a_list
%timeit middle in a_list
%timeit end in a_list
# Finding something that is not is a list is the worst case.
# It has to be compared to all values of the list.
%timeit n in a_list
In [6]:
max_exponent = 6
In [7]:
for n in (10 ** i for i in range(1, max_exponent+1)):
a_list = make_list(n)
a_set = set(a_list)
m = (0, n // 2, n-1, n)
for j in m:
print('length is %s, looking for %s' % (n, j))
%timeit j in a_set
Notice that the difference between searching small sets and large sets in not large. This is the magic of Python sets and dictionaries. Read the hash table Wikipedia article for an explanation of how this works.
In [8]:
for n in (10 ** i for i in range(1, max_exponent+1)):
a_list = make_list(n)
a_set = set(a_list)
m = (0, n // 2, n-1, n)
for j in m:
print('length is %s, looking for %s' % (n, j))
%timeit j in a_list
Notice that the search time for finding something in a list is proportional to the length of the list and how far the item is from the beginning of the list.