Based on lecture materials by Milad Fatenejad, Joshua R. Smith, and Will Trimble
The ability to refer to collections of data will be critical to almost any science - at least if you're collecting more than a single measurement of something.
The two main collections are lists and dictionaries, but I'll mention sets and tuples as well. I'll also go over reading text data from files.
In [1]:
voltage_list = [-2.0, -1.0, 0.0, 1.0, 2.0]
current_list = [-1.0, -0.5, 0.0, 0.5, 1.0]
obviously voltageList is of type list:
In [2]:
type(voltage_list)
Out[2]:
Python lists have the charming (annoying?) feature that they are indexed from zero (ask one of us later if you want to understand why). Therefore, to find the value of the first item in voltageList:
In [3]:
voltage_list[0]
Out[3]:
And to find the value of the third item
In [ ]:
voltage_list[2]
Lists can be indexed from the back using a negative index. The last item of currentList
In [ ]:
current_list[-1]
and the next-to-last
In [ ]:
current_list[-2]
You can "slice" items from within a list. Lets say we wanted the second through fourth items from voltageList
In [ ]:
voltage_list[1:4]
Or from the third item to the end
In [ ]:
voltage_list[2:]
and so on.
In [ ]:
list.
One useful method is append. Lets say we want to stick the following data on the end of both our lists:
voltage:
3.0
4.0
current:
1.5
2.0
If you want to append items to the end of a list, use the append method.
In [ ]:
voltage_list.append(3.)
In [ ]:
voltage_list.append(4.)
In [ ]:
voltage_list
You can see how that approach might be tedious in certain cases. If you want to concatenate a list onto the end of another one, use extend.
In [ ]:
current_list.extend([1.5, 2.0])
In [ ]:
current_list
In [ ]:
len(voltage_list)
In [5]:
data_list = ["experiment: current vs. voltage",\
"run", 47,\
"temperature", 372.756,\
"current", [-1.0, -0.5, 0.0, 0.5, 1.0],
"voltage", [-2.0, -1.0, 0.0, 1.0, 2.0]]
We've got strings, ints, floats, and even other lists in there. The slashes are there so we can continue on the next line. They aren't necessary but they can sometimes make things look better.
In [6]:
print data_list
When defining a long list in a notebook or a file, it is often useful to put each field on its own line. I can make the code much more readable especially when parsing different data types.
In [ ]:
another_long_list = [
"something",
132,
23.5,
"another long string",
['a','other','list'],
{'mydict':32},
]
Something that might cause you headaches in the future is how python deals with assignment of one variable to another. When you set a variable equal to another, both variables point to the same thing. Changing the first one ends up changing the second. Be careful about this fact.
In [7]:
a = [1,2]
In [8]:
b = a
In [9]:
a.append(10)
In [10]:
b
Out[10]:
In [ ]:
# what will be the result of this?
a is b
In [11]:
## this will create a new object
d = a + b
In [12]:
## test if two items contain the same thing
d == a + b
Out[12]:
In [13]:
## test if two variables point to the same object
d is a + b
Out[13]:
The above resolves to False because when you call a + b
it generates a new object. So even though the contents are the same, it is referring to different objects.
There's a ton more to know about lists, but lets press on. Check out the help / documentation linked from our course README.md for more info.
At this point it is useful to take a detour regarding files. Lets say you have a file with some current and voltage data and some metadata.
data.dat:
experiment: current vs. voltage
run: 47
temperature: 372.756
current: [-1.0, -0.5, 0.0, 0.5, 1.0]
voltage: [-2.0, -1.0, 0.0, 1.0, 2.0]
This file should be in the same directory as these notebooks. If so, we can read this data into a list type variable pretty easily.
In [14]:
f = open("data.dat")
In [15]:
ivdata = f.readlines()
In [16]:
f.close()
In [17]:
ivdata
Out[17]:
Right now the data in ivdata isn't in a particularly useful format, but you can imagine that with some additional programming we could straighten it out. We will eventually learn to do that easily!
Tuples are another of python's basic compound data types that are almost like lists. The difference is that a tuple is immutable; once you set the data in it, the tuple cannot be changed. You define a tuple as follows.
In [ ]:
tup = ("red", "white", "blue")
In [ ]:
type(tup)
You can slice and index the tuple exactly like you would a list. Tuples are used in the inner workings of python, and a tuple can be used as a key in a dictionary, whereas a list cannot as we will see in a moment.
See if you can retrieve the third element of tup:
In [ ]:
In [ ]:
## What happens when you try to do the following? (Remember a tuple is immutable)
tup[0] = 'aquamarine'
In [ ]:
mystuff = [1,2,3,4,4,4,4,4,4,4] # I like fours, even if this is a little redundant
my_unique_stuff = set(mystuff)
In [ ]:
## how many items are in my_unique_stuff?
In [18]:
# Note: the {} notation for sets is new in python 2.7.x
# For older pythons, you must use set(['apple', 'banana', ...])
fruit = {"apple", "banana", "pear", "banana"}
Since sets contain only unique items, there's only one banana in the set fruit.
In [22]:
print fruit
## what do you think happens when you try to get the first item in this set?
## fruit[0]
You can do things like intersections, unions, etc. on sets just like in math. Here's an example of an intersection of two sets (the common items in both sets).
In [ ]:
firstBowl = {"apple", "banana", "pear", "peach"}
In [ ]:
secondBowl = {"peach", "watermelon", "orange", "apple"}
In [ ]:
set.intersection(firstBowl, secondBowl)
In [ ]:
firstBowl.intersection(secondBowl)
You can check out more info using the help docs. We won't be returning to sets, but its good for you to know they exist.
Recall our file data.dat which contained our current-voltage data and also some metadata. We were able to import the data as a list, but clearly the list type is not the optimal choice for a data model. The dictionary is a much better choice. A python dictionary is a collection of key, value pairs. The key is a way to name the data, and the value is the data itself. Here's a way to create a dictionary that contains all the data in our data.dat file in a more sensible way than a list.
In [23]:
data_dict = {"experiment": "current vs. voltage", \
"run": 47, \
"temperature": 372.756, \
"current": [-1.0, -0.5, 0.0, 0.5, 1.0], \
"voltage": [-2.0, -1.0, 0.0, 1.0, 2.0]}
In [24]:
print data_dict
In [ ]:
# a dictionary has keys and values. You use the keys to access the values
print data_dict.keys()
print data_dict.values()
This model is clearly better because you no longer have to remember that the run number is in the second position of the list, you just refer directly to "run":
In [ ]:
data_dict["run"]
If you wanted the voltage data list:
In [ ]:
data_dict["voltage"]
Or perhaps you wanted the last element of the current data list
In [ ]:
data_dict["current"][-1]
Once a dictionary has been created, you can change the values of the data if you like.
In [ ]:
data_dict["temperature"] = 3275.39
You can also add new keys to the dictionary. Note that dictionaries are indexed with square braces, just like lists--they look the same, even though they're very different.
In [ ]:
data_dict["user"] = "Johann G. von Ulm"
Dictionaries, like strings, lists, and all the rest, have built-in methods. (We saw this above when we accessed the keys and values.
You can also get a list of keys and values
In [ ]:
data_dict.items()
One thing to be careful of, is an inplace method dict.update().
This also allows you to add something to your dictionary. It does NOT return a copy of the dictionary with the new key:value added. Instead it just updated your current variable.
In [ ]:
## so look at this example
new_data_dict = data_dict.update({'newthing':42})
## What does the variable **new_data_dict** contain?? Why?
## What does data_dict contain??
# What would happen if you did this?
# data_dict = data_dict.update({'newkey':'newvalue'})