This notebook explores the speed of searching for values in sets and lists.
After reading this notebook, watch Brandon Rhodes' videos All Your Ducks In A Row: Data Structures in the Standard Library and Beyond and The Mighty Dictionary.
In [1]:
def make_list(n):
if True:
return list(range(n))
else:
return list(str(i) for i in range(n))
In [2]:
n = int(25e6)
m = int(n*.9)
m = n // 2
a_list = make_list(n)
a_set = set(a_list)
In [3]:
# Finding something that is in a set is fast.
%timeit m in a_set
# Finding something that is _not_ in a set is also fast.
%timeit n+1 in a_set
In [4]:
# Finding something in a list can be slow.
# It starts at the beginning and compares each value.
# The search time depends on where the value is in the list.
%timeit m in a_list
# Finding something that is not is a list is the worst case.
# It has to be compared to all values of the list.
%timeit n+1 in a_list
In [5]:
max_exponent = 6
In [6]:
for n in (10 ** i for i in range(1, max_exponent+1)):
m = n // 2
print('%d:' % n)
a_list = make_list(n)
a_set = set(a_list)
%timeit m in a_set
Notice that the difference between searching small sets and large sets in not large. This is the magic of Python sets and dictionaries. Read the hash table Wikipedia article for an explanation of how this works.
In [7]:
for n in (10 ** i for i in range(1, max_exponent+1)):
m = n // 2
print('%d:' % n)
a_list = make_list(n)
a_set = set(a_list)
%timeit m in a_list
Notice that the search time for finding something in the middle of a list is proportional to the size of the list.
In [8]:
def set_foo(n):
for i in range(n):
i in a_set
def list_foo(n):
for i in range(n):
i in a_list
In [9]:
for n in (10 ** i for i in range(1, max_exponent+1)):
print('%d:' % n)
a_list = list(range(n))
a_set = set(a_list)
%timeit set_foo(n)
In [ ]:
for n in (10 ** i for i in range(1, max_exponent+1)):
print('%d:' % n)
a_list = list(range(n))
a_set = set(a_list)
%timeit list_foo(n)
The above cell does not finish overnight, so what you see above is what was saved before it finished.