Compound Data Types: Lists, Dictionaries, Sets, Tuples, and Reading Files

Based on lecture materials by Milad Fatenejad, Joshua R. Smith, and Will Trimble

The ability to refer to collections of data will be critical to almost any science - at least if you're collecting more than a single measurement of something.

The two main collections are lists and dictionaries, but I'll mention sets and tuples as well. I'll also go over reading text data from files.

Lists

A list is an ordered, indexable collection of data. Lets say you have collected some current and voltage data that looks like this:

voltage:

-2.0
-1.0
0.0
1.0
2.0

current:

-1.0
-0.5
0.0
0.5
1.0

So you could put that data into lists like


In [1]:
voltage_list = [-2.0, -1.0, 0.0, 1.0, 2.0]

current_list = [-1.0, -0.5, 0.0, 0.5, 1.0]

obviously voltageList is of type list:


In [2]:
type(voltage_list)


Out[2]:
list

Python lists have the charming (annoying?) feature that they are indexed from zero (ask one of us later if you want to understand why). Therefore, to find the value of the first item in voltageList:


In [3]:
voltage_list[0]


Out[3]:
-2.0

And to find the value of the third item


In [ ]:
voltage_list[2]

Lists can be indexed from the back using a negative index. The last item of currentList


In [ ]:
current_list[-1]

and the next-to-last


In [ ]:
current_list[-2]

You can "slice" items from within a list. Lets say we wanted the second through fourth items from voltageList


In [ ]:
voltage_list[1:4]

Or from the third item to the end


In [ ]:
voltage_list[2:]

and so on.

Append and Extend

Just like strings have methods, lists do too. IPython lets us do tab completion after a dot ('.') to see what an object has to offer. Try it now!


In [ ]:
list.

One useful method is append. Lets say we want to stick the following data on the end of both our lists:

voltage:

3.0
4.0

current:

1.5
2.0

If you want to append items to the end of a list, use the append method.


In [ ]:
voltage_list.append(3.)

In [ ]:
voltage_list.append(4.)

In [ ]:
voltage_list

You can see how that approach might be tedious in certain cases. If you want to concatenate a list onto the end of another one, use extend.


In [ ]:
current_list.extend([1.5, 2.0])

In [ ]:
current_list

Length of Lists

Sometimes you want to know how many items are in a list. Use the len command.


In [ ]:
len(voltage_list)

Heterogeneous Data

Lists can contain hetergeneous data.


In [5]:
data_list = ["experiment: current vs. voltage",\
            "run", 47,\
            "temperature", 372.756,\
            "current", [-1.0, -0.5, 0.0, 0.5, 1.0],
            "voltage", [-2.0, -1.0, 0.0, 1.0, 2.0]]

We've got strings, ints, floats, and even other lists in there. The slashes are there so we can continue on the next line. They aren't necessary but they can sometimes make things look better.


In [6]:
print data_list


['experiment: current vs. voltage', 'run', 47, 'temperature', 372.756, 'current', [-1.0, -0.5, 0.0, 0.5, 1.0], 'voltage', [-2.0, -1.0, 0.0, 1.0, 2.0]]

When defining a long list in a notebook or a file, it is often useful to put each field on its own line. I can make the code much more readable especially when parsing different data types.


In [ ]:
another_long_list = [
    "something",
    132,
    23.5,
    "another long string",
    ['a','other','list'],
    {'mydict':32},
    ]

Assigning Variables to Other Variables

Something that might cause you headaches in the future is how python deals with assignment of one variable to another. When you set a variable equal to another, both variables point to the same thing. Changing the first one ends up changing the second. Be careful about this fact.


In [7]:
a = [1,2]

In [8]:
b = a

In [9]:
a.append(10)

In [10]:
b


Out[10]:
[1, 2, 10]

In [ ]:
# what will be the result of this?
a is b

In [11]:
## this will create a new object
d = a + b

In [12]:
## test if two items contain the same thing
d == a + b


Out[12]:
True

In [13]:
## test if two variables point to the same object
d is a + b


Out[13]:
False

Explanation:

The above resolves to False because when you call a + b it generates a new object. So even though the contents are the same, it is referring to different objects.

There's a ton more to know about lists, but lets press on. Check out the help / documentation linked from our course README.md for more info.

Reading From Files

At this point it is useful to take a detour regarding files. Lets say you have a file with some current and voltage data and some metadata.

data.dat:

experiment: current vs. voltage
run: 47
temperature: 372.756
current: [-1.0, -0.5, 0.0, 0.5, 1.0]
voltage: [-2.0, -1.0, 0.0, 1.0, 2.0]

This file should be in the same directory as these notebooks. If so, we can read this data into a list type variable pretty easily.


In [14]:
f = open("data.dat")

In [15]:
ivdata = f.readlines()

In [16]:
f.close()

In [17]:
ivdata


Out[17]:
['experiment: current vs. voltage\n',
 'run: 47\n',
 'temperature: 372.756\n',
 'current: [-1.0, -0.5, 0.0, 0.5, 1.0]\n',
 'voltage: [-2.0, -1.0, 0.0, 1.0, 2.0]\n',
 '\n']

Right now the data in ivdata isn't in a particularly useful format, but you can imagine that with some additional programming we could straighten it out. We will eventually learn to do that easily!

Tuples

Tuples are another of python's basic compound data types that are almost like lists. The difference is that a tuple is immutable; once you set the data in it, the tuple cannot be changed. You define a tuple as follows.


In [ ]:
tup = ("red", "white", "blue")

In [ ]:
type(tup)

You can slice and index the tuple exactly like you would a list. Tuples are used in the inner workings of python, and a tuple can be used as a key in a dictionary, whereas a list cannot as we will see in a moment.

See if you can retrieve the third element of tup:


In [ ]:


In [ ]:
## What happens when you try to do the following? (Remember a tuple is immutable)

tup[0] = 'aquamarine'

Sets

Most introductory python courses (including Codecademy) do not go over sets this early (or at all), but I've found this data type to be useful. The python set type is similar to the idea of a mathematical set: it is an unordered collection of unique things. Consider:


In [ ]:
mystuff = [1,2,3,4,4,4,4,4,4,4] # I like fours, even if this is a little redundant
my_unique_stuff = set(mystuff)

In [ ]:
## how many items are in my_unique_stuff?

In [18]:
# Note: the {} notation for sets is new in python 2.7.x
# For older pythons, you must use set(['apple', 'banana', ...])
fruit = {"apple", "banana", "pear", "banana"}

Since sets contain only unique items, there's only one banana in the set fruit.


In [22]:
print fruit
## what do you think happens when you try to get the first item in this set?
## fruit[0]


set(['pear', 'banana', 'apple'])

You can do things like intersections, unions, etc. on sets just like in math. Here's an example of an intersection of two sets (the common items in both sets).


In [ ]:
firstBowl = {"apple", "banana", "pear", "peach"}

In [ ]:
secondBowl = {"peach", "watermelon", "orange", "apple"}
Set operations can be performed with functions from the set class:

In [ ]:
set.intersection(firstBowl, secondBowl)
Or, you can use methods on one of your sets:

In [ ]:
firstBowl.intersection(secondBowl)

You can check out more info using the help docs. We won't be returning to sets, but its good for you to know they exist.

Dictionaries

Recall our file data.dat which contained our current-voltage data and also some metadata. We were able to import the data as a list, but clearly the list type is not the optimal choice for a data model. The dictionary is a much better choice. A python dictionary is a collection of key, value pairs. The key is a way to name the data, and the value is the data itself. Here's a way to create a dictionary that contains all the data in our data.dat file in a more sensible way than a list.


In [23]:
data_dict = {"experiment": "current vs. voltage", \
            "run": 47, \
            "temperature": 372.756, \
            "current": [-1.0, -0.5, 0.0, 0.5, 1.0], \
            "voltage": [-2.0, -1.0, 0.0, 1.0, 2.0]}

In [24]:
print data_dict


{'current': [-1.0, -0.5, 0.0, 0.5, 1.0], 'experiment': 'current vs. voltage', 'run': 47, 'temperature': 372.756, 'voltage': [-2.0, -1.0, 0.0, 1.0, 2.0]}

In [ ]:
# a dictionary has keys and values. You use the keys to access the values
print data_dict.keys()
print data_dict.values()

This model is clearly better because you no longer have to remember that the run number is in the second position of the list, you just refer directly to "run":


In [ ]:
data_dict["run"]

If you wanted the voltage data list:


In [ ]:
data_dict["voltage"]

Or perhaps you wanted the last element of the current data list


In [ ]:
data_dict["current"][-1]

Once a dictionary has been created, you can change the values of the data if you like.


In [ ]:
data_dict["temperature"] = 3275.39

You can also add new keys to the dictionary. Note that dictionaries are indexed with square braces, just like lists--they look the same, even though they're very different.


In [ ]:
data_dict["user"] = "Johann G. von Ulm"

Dictionaries, like strings, lists, and all the rest, have built-in methods. (We saw this above when we accessed the keys and values.

You can also get a list of keys and values


In [ ]:
data_dict.items()

One thing to be careful of, is an inplace method dict.update().

This also allows you to add something to your dictionary. It does NOT return a copy of the dictionary with the new key:value added. Instead it just updated your current variable.


In [ ]:
## so look at this example
new_data_dict = data_dict.update({'newthing':42})
## What does the variable **new_data_dict** contain??  Why?

## What does data_dict contain??


# What would happen if you did this?
# data_dict = data_dict.update({'newkey':'newvalue'})