Built-In Data Structures

Python also has several built-in compound types, which act as containers for other types.

Type Name Example Description
list [1, 2, 3] Ordered collection
tuple (1, 2, 3) Immutable ordered collection
dict {'a':1, 'b':2, 'c':3} Unordered (key,value) mapping
set {1, 2, 3} Unordered collection of unique values

Note, round, square, and curly brackets have distinct meanings.

Lists

Lists are the basic ordered and mutable data collection type in Python.


In [1]:
a = [2, 3, 5, 7]

Lists have a number of useful properties and methods available to them.


In [2]:
# Length of a list
len(a)


Out[2]:
4

In [3]:
# Append a value to the end
a.append(11)
a


Out[3]:
[2, 3, 5, 7, 11]

In [4]:
# Addition concatenates lists
a + [13, 17, 19]


Out[4]:
[2, 3, 5, 7, 11, 13, 17, 19]

In [5]:
# sort() method sorts in-place
a = [2, 5, 1, 6, 3, 4]
a.sort()
a


Out[5]:
[1, 2, 3, 4, 5, 6]

One of the powerful features of Python's compound objects is that they can contain a mix of objects of any type.


In [6]:
a = [1, 'two', 3.14, [0, 3, 5]]
a


Out[6]:
[1, 'two', 3.14, [0, 3, 5]]

This flexibility is a consequence of Python's dynamic type system. Creating such a mixed sequence in a statically-typed language like C can be much more of a headache! We see that lists can even contain other lists as elements. Such type flexibility is an essential piece of what makes Python code relatively quick and easy to write.

List indexing and slicing

Python provides access to elements in compound types through indexing for single elements, and slicing for multiple elements.


In [7]:
a = [2, 3, 5, 7, 11]

Python uses zero-based indexing, so we can access the first and second element in using the following syntax:


In [8]:
a[0]


Out[8]:
2

In [9]:
a[1]


Out[9]:
3

Elements at the end of the list can be accessed with negative numbers, starting from -1:


In [10]:
a[-1]


Out[10]:
11

In [11]:
a[-2]


Out[11]:
7

You can visualize this indexing scheme this way:

Here values in the list are represented by large numbers in the squares; list indices are represented by small numbers above and below. In this case, L[2] returns 5, because that is the next value at index 2.

Where indexing is a means of fetching a single value from the list, slicing is a means of accessing multiple values in sub-lists. It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array. For example, to get the first three elements of the list, we can write:


In [12]:
a[0:3]


Out[12]:
[2, 3, 5]

we can equivalently write:


In [13]:
a[:3]


Out[13]:
[2, 3, 5]

Similarly, if we leave out the last index, it defaults to the length of the list. Thus, the last three elements can be accessed as follows:


In [14]:
a[-3:]


Out[14]:
[5, 7, 11]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:


In [15]:
a[::2]  # equivalent to a[0:len(a):2]


Out[15]:
[2, 5, 11]

A particularly useful version of this is to specify a negative step, which will reverse the array:


In [16]:
a[::-1]


Out[16]:
[11, 7, 5, 3, 2]

Both indexing and slicing can be used to set elements as well as access them. The syntax is as you would expect:


In [17]:
a[0] = 100
print(a)


[100, 3, 5, 7, 11]

In [18]:
a[1:3] = [55, 56]
print(a)


[100, 55, 56, 7, 11]

A very similar slicing syntax is also used in other data containers, such as NumPy arrays, as we will see in Day 2 sessions.

Tuples

Tuples are in many ways similar to lists, but they are defined with parentheses and cannot be changed!


In [19]:
t = (1, 2, 3)

They can also be defined without any brackets at all:


In [20]:
t = 1, 2, 3
print(t)


(1, 2, 3)

Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:


In [21]:
len(t)


Out[21]:
3

In [22]:
t[0]


Out[22]:
1

The main distinguishing feature of tuples is that they are immutable: this means that once they are created, their size and contents cannot be changed:

Tuples are often used in a Python program; e.g. in functions that have multiple return values.

Dictionaries

Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation. They can be created via a comma-separated list of key:value pairs within curly braces:


In [23]:
numbers = {'one':1, 'two':2, 'three':3}
# or
numbers = dict(one=1, two=2, three=2)

Items are accessed and set via the indexing syntax used for lists and tuples, except here the index is not a zero-based order but valid key in the dictionary:


In [24]:
# Access a value via the key
numbers['two']


Out[24]:
2

New items can be added to the dictionary using indexing as well:


In [25]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)


{'ninety': 90, 'three': 2, 'two': 2, 'one': 1}

Keep in mind that dictionaries do not maintain any order for the input parameters.

Sets

The 4th basic data container is the set, which contains unordered collections of unique items. They are defined much like lists and tuples, except they use the curly brackets of dictionaries.

They do not contain duplicate entries. Which means they are significantly faster than lists! http://stackoverflow.com/questions/2831212/python-sets-vs-lists


In [26]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

In [27]:
a = {1, 1, 2}

In [28]:
a


Out[28]:
{1, 2}

If you're familiar with the mathematics of sets, you'll be familiar with operations like the union, intersection, difference, symmetric difference, and others. Python's sets have all of these operations built-in, via methods or operators. For each, we'll show the two equivalent methods:


In [29]:
# union: items appearing in either
primes | odds      # with an operator
primes.union(odds) # equivalently with a method


Out[29]:
{1, 2, 3, 5, 7, 9}

In [30]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method


Out[30]:
{3, 5, 7}

In [31]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method


Out[31]:
{2}

In [32]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method


Out[32]:
{1, 2, 9}

More Specialized Data Structures

Python contains several other data structures that you might find useful; these can generally be found in the built-in collections module. The collections module is fully-documented in Python's online documentation, and you can read more about the various objects available there.

In particular, I've found the following very useful on occasion:

  • collections.namedtuple: Like a tuple, but each value has a name
  • collections.defaultdict: Like a dictionary, but unspecified keys have a user-specified default value
  • collections.OrderedDict: Like a dictionary, but the order of keys is maintained

References

  • A Whirlwind Tour of Python by Jake VanderPlas (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1
  • The python documentation of standard types