In this lesson we'll contiune our exploration of more advanced data structures. Last time we took a peek at a way to represent ordered collections of items via lists.
This time we'll use dictionaries to create collections of unordered items (this is just an easy distinction - there's much more to it - but it's a good way to start wrapping your head around the subject).
Dictionaries are another data type in Python that, like lists, contains multiple items. The difference is that while lists use a index to access ordered items, dictionaries use 'keys' to access unordered values.
Like lists, dictionaries are also found in other programming languages, often under a different name. For instance, Python dictionaries might be referred to elsewhere as "maps", "hashes", or "associative arrays").
According to the Official Docs:
It is best to think of a dictionary as an unordered set of key-value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}
In other words, dictionaries are not lists: instead of just a checklist, we now have a key and a value. We use the key to find the value. So a generic dictionary looks like this:
theDictionary = {
key1: value1,
key2: value2,
key3: value3,
...
}
Each key/value pair is linked by a ':', and each pair is separated by a ','. It doesn't really matter if you put everything on on new lines (as we do here) or all on the same line. We're just doing it this way to make it easier to read.
Here's a more useful implementation of a dictionary:
In [ ]:
myDict = {
"key1": "Value 1",
3: "3rd Value",
"key2": "2nd Value",
"Fourth Key": [4.0, 'Jon']
}
print(myDict)
At the danger of repeating ourselves (but to really make the point!): an important difference between dictionaries
and lists
is that dictionaries are un-ordered. Always remember that you have no idea where things are stored in a dictionary
and you can't rely on indexing like you can with a list
. From this perspective a Python dictionary is not like a real dictionary (as a real dictionary presents the keys, i.e. words, in alphabetical order).
And notice too that every type of data can go into a dictionary: strings, integers, and floats. There's even a list
in this dictionary ([4.0, 'Jon']
)! The only constraint is that the key must be immutable; this means that it is a simple, static identifier and that can't change.
In [ ]:
# this will result in an error
myFaultyDict = {
["key1", 1]: "Value 1",
"key2": "2nd Value",
3: "3rd Value",
8.0: [5, 'jon']
}
This doesn't work because you can't use a list (["key1",1]
) as a key, though as you saw above you can use a list as a value. For more on the subject of (im)mutability checkout this SO answer ).
Like lists, we access an element in a dictionary using a 'location' marked out by a pair of square brackets ([...]). The difference is that the index is no longer an integer indicating the position of the item that we want to access, but is a key in the key:value pair:
In [ ]:
print(myDict["key1"])
print(myDict["Fourth Key"])
In [ ]:
print(myDict["key2"])
When it comes to error messages, dict
s and list
s behave in similar ways. If you try to access a dictionary using a key that doesn't exist then Python raises an exception.
What is the name of the exception generated by the following piece of code? Can you find it the Official Docs?
In [ ]:
print(myDict[99])
Handy, no? Again, Python's error messages are giving you helpful clues about where the problem it's encountering might be! Up above we had a TypeError
when we tried to create a key using a list. Here, we have a KeyError
that tells us something must be wrong with using 99
as a key in myDict
. In this case, it's that there is no key 99!
One of the simplest uses of a dictionary is as a phone book! (If you're not sure what a phone book is here's a handy guide and here's an example of someone using one).
So here are some useful contact numbers:
Now, how would you create a dictionary that allowed us to look up and print out an emergency phone number based on the two-character ISO country code? It's going to look a little like this:
eNumbers = {
...
}
print("The Icelandic emergency number is " + eNumbers['IS'])
print("The American emergency number is " + eNumbers['US'])
In [ ]:
eNumbers = {
"IS": '112', # It's not very important here whether we use single- or double-quotes
"US": '911'
}
print("The Icelandic emergency number is " + eNumbers['IS'])
print("The American emergency number is " + eNumbers['US'])
We are going to see in the next couple of notebooks how to systematically access values in a dictionary (amongst other things). For now, let's also take in the fact the dictionaries also have utility methods similar what we saw with the the list
. And as with the list, these methods are functions that only make sense when you're working with a dictionary, so they're bundled up in a way that makes them easy to use.
Let's say that you have forgotten what keys you put in your dictionary...
In [ ]:
programmers = {
"Charles": "Babbage",
"Ada": "Lovelace",
"Alan": "Turing"
}
print(programmers.keys())
Or maybe you just need to access all of the values without troubling to ask for each key:
In [ ]:
print(programmers.values())
Or maybe you even need to get them as pairs:
In [ ]:
# Output is a list of key-value pairs!
print(programmers.items())
In [ ]:
print("Charles" in programmers)
print("Babbage" in programmers)
print(True not in programmers)
One challenge with dictionaries is that sometimes we have no real idea if a key exists or not. With a list it's pretty easy to figure out whether or not an index exists because we can just ask Python to tell us the length of the list. So that makes it fairly easy to avoid having the list 'blow up' by throwing an exception.
It's rather harder for a dictionary though, so that's why we have the dedicated get()
method: it not only allows us to fetch the value associated with a key, it also allows us to specify a default value in case the key does not exist:
In [ ]:
print(programmers.get("Lady Ada", "Are you sure you spelled that right?") )
See how this works: they key doesn't exist, but unlike what happened when we asked for myDict[99]
we don't get an exception, we get the default value specified as the second input to the method get
.
So you've learned two things here: that functions can take more than one input (this one takes both the key that we're looking for, and a value to return if Python can't find the key); and that different types (or classes) of data have different methods (there's no get
for lists).
OK, this is where it's going to get a little weird but you're also going to see how programming is a litte like Lego: once you get the building blocks, you can make lots of cool/strange/useful contraptions from some pretty simple concepts.
Remember that a list or dictionary can store anything: so the first item in your list could itself be a list! For most people starting out on programming this is the point where their brain starts hurting (it happened to us) and you might want to throw up your hands in frustration thinking "I'm never going to understand this!" But if you stick with it, you will.
And this is really the start of the power of computation.
Let's start out with what some (annoying) people would call a 'trivial' example of how a list-of-lists (LoLs, though most people aren't laughing) can be useful. Let's think through what's going on below: what happens if we write cityData[0]
?
In [ ]:
# Format: city, country, population, area (km^2)
cityData = [
['London','U.K.',8673713,1572],
['Paris','France',2229621,105],
['Washington, D.C.','U.S.A.',672228,177],
['Abuja','Nigeria',1235880,1769],
['Beijing','China',21700000,16411],
]
print(cityData[0])
So how would we access something inside the list returned from cityData[0]
?
Why not try:
cityData[0][1]
See if you can figure out how to retrieve and print the following from cityData
:
In [ ]:
print(cityData[1][1])
print(cityData[4][3])
print(cityData[2][0])
In [ ]:
# American Emergency Number: 911
# British Emergency Number: 999
# Icelandic Emergency Number: 112
# French Emergency Number: 112
# Russian Emergency Number: 102
eNumbers = {
'IS': ['Icelandic',112],
'US': ['American',911],
'FR': ['French',112],
'RU': ['Russion',102],
'UK': ['British',999]
}
print("The " + eNumbers['IS'][0] + " emergency number is " + str(eNumbers['IS'][1]))
print("The " + eNumbers['RU'][0] + " emergency number is " + str(eNumbers['RU'][1]))
print("The " + eNumbers['UK'][0] + " emergency number is " + str(eNumbers['UK'][1]))
See if you can create the rest of the eNumbers
dictionary and then print out the Russian and British emergency numbers.
OK, this is the last thing we're going to through at you today – getting your head around 'nested' lists and dictionaries is hard. Really hard. But it's the all-important first step to thinking about data the way that computer 'thinks' about it. This is really abstract: something that you access by keys, which in turn give you access to other keys... it's got a name: recursion. And it's probably one of the cleverest thing about computing.
Here's a bit of a complex DoD, combined with a DoL, and other nasties:
In [ ]:
cityData2 = {
'London' : {
'population': 8673713,
'area': 1572,
'location': [51.507222, -0.1275],
'country': {
'ISO2': 'UK',
'Full': 'United Kingdom',
},
},
'Paris' : {
'population': 2229621,
'area': 105.4,
'location': [48.8567, 2.3508],
'country': {
'ISO2': 'FR',
'Full': 'France',
},
}
}
Try the following code in the code cell below:
print(cityData2['Paris'])
print(cityData2['Paris']['country']['ISO2'])
print(cityData2['Paris']['location'][0])
Now, figure out how to print:
The population of Paris, the capital of France (FR), is 2229621. It has a density of 21153.899 persons per square km.
Do the same for London.
In [ ]:
# Note that we can tweak the formatting a bit: Python is smart
# enough to understand that if you have a '+' on the end of a
# string and there next line is also a string then it'll
# continue to concatenate the string...
print("The population of " + 'London' + ", the capital of " +
cityData2['London']['country']['Full'] + " (" + cityData2['London']['country']['ISO2'] + "), is " +
str(cityData2['London']['population']) + ". It has a density of " +
str(cityData2['London']['population']/cityData2['London']['area']) + " persons per square km")
# But a _better_ way to do this might be one in which we don't
# hard-code 'London' into the output -- by changing the variable
# 'c' to Paris we can change the output completely...
c = 'Paris'
cd = cityData2[c]
print("The population of " + c + ", the capital of " +
cd['country']['Full'] + " (" + cd['country']['ISO2'] + "), is " +
str(cd['population']) + ". It has a density of " +
"{0:8.1f}".format(cd['population']/cd['area']) + " persons per square km")
Let's continue our trips around the world! This time though, we'll do things better, and instead of using a simple URL, we are going to use a real-word geographic data type, that you can use on a web-map or in your favourite GIS software.
If you look down below at the KCL_position
variable you'll see that we're assigning it an apparently complex and scary data structure. Don't be afraid! If you look closely enough you will notice that is just made out the "building blocks" that we've seen so far: floats
, lists
, strings
..all wrapped comfortably in a cozy dictionary
!
This is simply a formalised way to represent a geographic marker (a pin on the map!) in a format called GeoJSON
.
According to the awesome Lizy Diamond
GeoJSON is an open and popular geographic data format commonly used in web applications. It is an extension of a format called JSON, which stands for JavaScript Object Notation. Basically, JSON is a table turned on its side. GeoJSON extends JSON by adding a section called "geometry" such that you can define coordinates for the particular object (point, line, polygon, multi-polygon, etc). A point in a GeoJSON file might look like this:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-122.65335738658904,
45.512083676585156
]
},
"properties": {
"name": "Hungry Heart Cupcakes",
"address": "1212 SE Hawthorne Boulevard",
"website": "http://www.hungryheartcupcakes.com",
"gluten free": "no"
}
}
GeoJSON files have to have both a
"geometry"
section and a"properties"
section. The"geometry"
section houses the geographic information of the feature (its location and type) and the"properties"
section houses all of the descriptive information about the feature (like fields in an attribute table). Source
Now, in order to have our first "webmap", we have to re-create such GeoJSON
structure.
As you can see there are two variables containing King's College Longitute/Latitude coordinate position. Unfortunately they are in the wrong data type. Also, the variable longitude
is not included in the list KCLCoords
and the list itself is not assigned as a value to the KCLGeometry
dictionary.
Take all the necessary steps to fix the code, using the functions we've seen so far.
In [ ]:
# don't worry about the following line
# I'm simply requesting a module from Python
# to have additional functions at my disposal
# which usually are not immediately available
import json
# King's College coordinates
# What format are they in? Does it seem appropriate?
# How would you convert them back to numbers?
longitude = '-0.11596798896789551'
latitude = '51.51130657591914'
# Set this up as a coordinate pair
KCLCoords = [longitude, latitude ]
# How can you assign KCLCoords to
# the key KCLGeometry["coordinates"]?
KCLGeometry = {
"type": "Point",
"coordinates": KCLCoords
}
KCL_position = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"marker-color": "#7e7e7e",
"marker-size": "medium",
"marker-symbol": "building",
"name": "KCL"
},
"geometry": KCLGeometry
}
]
}
# OUTPUT
# -----------------------------------------------------------
# I'm justing using the "imported" module to print the output
# in a nice and formatted way
print(json.dumps(KCL_position, indent=4))
# here I'm saving the variable to a file on your local machine
# You should see it if you click on the 'Home' tab in your open
# browser window (it's the one where you started this notebook)
with open('my-first-marker.geojson', 'w') as outfile:
json.dump(json.dumps(KCL_position, indent=4), outfile)
# And we can also show this in Jupyter directly (it won't show
# up in the PDF version though)
from IPython.display import GeoJSON
GeoJSON(json.dumps(KCL_position, indent=4))
After you've run the code, Python will have saved a file called my-first-marker.geojson
in the folder where you are running the notebook. Try to upload it on this website (Geojson.io) and see what it shows!
The following individuals have contributed to these teaching materials:
The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.
Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.
This notebook may depend on the following libraries: None