Python's efficient key/value hash table structure is called a "dict". The contents of a dict can be written as a series of key:value pairs within braces { }, e.g. dict = {key1:value1, key2:value2, ... }. The "empty dict" is just an empty pair of curly braces {}.
Looking up or setting a value in a dict uses square brackets, e.g. dict['foo'] looks up the value under the key 'foo'. Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case).
In [ ]:
## Can build up a dict by starting with the the empty dict {}
## and storing key/value pairs into the dict like this:
## dict[key] = value-for-that-key
dict = {}
dict['a'] = 'alpha'
dict['g'] = 'gamma'
dict['o'] = 'omega'
print dict
In [ ]:
print dict['a']
dict['a'] = 6
print dict['a']
'a' in dict
In [ ]:
print dict['z'] ## Throws KeyError
In [ ]:
if 'z' in dict: print dict['z'] ## Avoid KeyError
In [ ]:
print dict.get('z') ## None (instead of KeyError)
A for loop on a dictionary iterates over its keys by default. The keys will appear in an arbitrary order. The methods dict.keys() and dict.values() return lists of the keys or values explicitly. There's also an items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary. All of these lists can be passed to the sorted() function.
In [ ]:
## By default, iterating over a dict iterates over its keys.
## Note that the keys are in a random order.
for key in dict: print key
In [ ]:
## Exactly the same as above
for key in dict.keys(): print key
In [ ]:
## Get the .keys() list:
print dict.keys()
In [ ]:
## Likewise, there's a .values() list of values
print dict.values() ## ['alpha', 'omega', 'gamma']
In [ ]:
## Common case -- loop over the keys in sorted order,
## accessing each key/value
for key in sorted(dict.keys()):
print key, dict[key]
In [ ]:
## .items() is the dict expressed as (key, value) tuples
print dict.items()
In [ ]:
## This loop syntax accesses the whole dict by looping
## over the .items() tuple list, accessing one (key, value)
## pair on each iteration.
for k, v in dict.items(): print k, '>', v
There are "iter" variants of these methods called iterkeys(), itervalues() and iteritems() which avoid the cost of constructing the whole list -- a performance win if the data is huge. However, I generally prefer the plain keys() and values() methods with their sensible names.
Strategy note: from a performance point of view, the dictionary is one of your greatest tools, and you should use where you can as an easy way to organize data. For example, you might read a log file where each line begins with an ip address, and store the data into a dict using the ip address as the key, and the list of lines where it appears as the value. Once you've read in the whole file, you can look up any ip address and instantly see its list of lines. The dictionary takes in scattered data and make it into something coherent.
In [ ]:
hash = {}
hash['word'] = 'garfield'
hash['count'] = 42
s = 'I want %(count)d copies of %(word)s' % hash
s
In [ ]:
var = 6
del var # var no more!
var
In [ ]:
list = ['a', 'b', 'c', 'd']
del list[0] ## Delete first element
del list[-2:] ## Delete last two elements
print list
In [ ]:
dict = {'a':1, 'b':2, 'c':3}
del dict['b'] ## Delete 'b' entry
print dict
The open() function opens and returns a file handle that can be used to read or write a file in the usual way. The code f = open('name', 'r') opens the file into the variable f, ready for reading operations, and use f.close() when finished. Instead of 'r', use 'w' for writing, and 'a' for append. The special mode 'rU' is the "Universal" option for text files where it's smart about converting different line-endings so they always come through as a simple '\n'. The standard for-loop works for text files, iterating through the lines of the file (this works only for text files, not binary files). The for-loop technique is a simple and efficient way to look at all the lines in a text file:
In [ ]:
# Echo the contents of a file
f = open('data/poem.txt', 'rU')
for line in f: ## iterates over the lines of the file
print line, ## trailing , so print does not add an end-of-line char
## since 'line' already includes the end-of line.
f.close()
Reading one line at a time has the nice quality that not all the file needs to fit in memory at one time -- handy if you want to look at every line in a 10 gigabyte file without using 10 gigabytes of memory. The f.readlines() method reads the whole file into memory and returns its contents as a list of its lines. The f.read() method reads the whole file into a single string, which can be a handy way to deal with the text all at once, such as with regular expressions we'll see later.
For writing, f.write(string) method is the easiest way to write data to an open output file. Or you can use "print" with an open file, but the syntax is nasty: "print >> f, string". In python 3 the print syntax is fixed to be a regular function call with a file= optional argument: "print(string, file=f)".
Exercise: wordcount.py
Combining all the basic Python material -- strings, lists, dicts, tuples, files -- try the summary wordcount.py exercise in the Basic Exercises.
For practice with everything you've seen so far, go to the notebook 5.5 - Wordcount
In [ ]:
In [ ]:
In [ ]:
Note: This notebook is based on Google's python tutorial https://developers.google.com/edu/python
In [ ]: