In [1]:
%matplotlib inline

In [2]:
import numpy as np
import matplotlib.pyplot as pl

Dictionaries

A dictionary is a way to store data. It uses a key (usually text like a word or a number) that points to a value. Think of a real dictionary. You look up a word (the key) in the dictionary to find the definition (the value).

The keys can be int, float, str, or tuple.

Dictionaries can be nested like lists.

With a list, we must remember the order of the items in the list.

Let's look at average temperatures in European cities as an example.


In [3]:
cityList = ['Oslo', 'London', 'Paris']
tempList = [13, 15.4, 17.5]

In this case, if we were interested in picking out the average temperature in London, the lists are small enough that we can determine the corresponding index visually.

Q. How could we get to the average temperature of London?


In [5]:
print(cityList[1])
print(tempList[1])


London
15.4

Q. What if our "temps" and "cities" lists had 1000 elements? How could we pick out the average temperature in a city of our choosing?


In [7]:
# We could have sublists consisting of cities and temperatures
# and search for London, then its corresponding temperature.

# Alternatively, we could index the cities list:

i = cityList.index('London')
print("Index:", i)
print("Value:", tempList[i])


Index: 1
Value: 15.4
Or, we could do this using a dictionary -- note the brackets {}. Dictionaries are of the form {keyA: valueA, keyB: valueB, ...} The "keys" have associated "values". In this example, the keys are strings and the values are numbers.

In [8]:
tempDict = {'Oslo': 13, 'London': 15.4, 'Paris': 17.5}
tempDict


Out[8]:
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5}
or

In [9]:
temp2Dict = dict(Oslo=13, London=15.4, Paris=17.5)
temp2Dict


Out[9]:
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5}
Now, rather than first determining the index of a particular city, then accessing that element in "temps" we access it by name:

In [10]:
print("London:", tempDict['London'])


London: 15.4
If we'd like, we can add another city to tempDict:

In [12]:
tempDict['Madrid'] = 26.0
tempDict


Out[12]:
{'London': 15.4, 'Madrid': 26.0, 'Oslo': 13, 'Paris': 17.5}

In [13]:
print(cityList)
print(tempList)


['Oslo', 'London', 'Paris']
[13, 15.4, 17.5]

Q. What is the analogous list operation to do this?


In [14]:
cityList.append("Madrid")
tempList.append(26.0)
print(cityList)
print(tempList)


['Oslo', 'London', 'Paris', 'Madrid']
[13, 15.4, 17.5, 26.0]
Looping over the keys is analogous to looping over the indices of lists and arrays:

In [16]:
for city in sorted(tempDict, reverse=True):
    print('The average temperature in {} is {:g}.'.format(city, tempDict[city]))


The average temperature in Paris is 17.5.
The average temperature in Oslo is 13.
The average temperature in Madrid is 26.
The average temperature in London is 15.4.

We can check if 'Berlin' is in temps with a Boolean expression:


In [17]:
'Berlin' in tempDict


Out[17]:
False

or 'Oslo':


In [18]:
if 'Oslo' in tempDict:
    print('Oslo:', tempDict['Oslo'])


Oslo: 13

The keys and values can be extracted:


In [20]:
print(tempDict.keys())
print(tempDict.values())
print(tempDict)


dict_keys(['London', 'Oslo', 'Paris', 'Madrid'])
dict_values([15.4, 13, 17.5, 26.0])
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5, 'Madrid': 26.0}

There is an important distinction between dictionaries and lists:

In dictionaries, the order of the keys is not preserved!


In [22]:
for city in tempDict:
    print(city, tempDict[city])


London 15.4
Oslo 13
Paris 17.5
Madrid 26.0
If you want to force keys to be ordered, you can do so, e.g. using sorted():

In [23]:
# This accomplishes an alphabetical sorting:
for city in sorted(tempDict, reverse = False):
    print(city, tempDict[city])


London 15.4
Madrid 26.0
Oslo 13
Paris 17.5

To delete a key and its associated value:


In [24]:
del tempDict['Madrid']
tempDict


Out[24]:
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5}

Q. Now, what do you predict this will be?


In [25]:
len(tempDict)


Out[25]:
3

Like arrays, if we assign a new variable to a dictionary, changes to values in one will be changed for both


In [27]:
temp3Dict = tempDict
tempDict


Out[27]:
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5}

In [28]:
temp3Dict['London'] = 0.0

Q. What will this yield?


In [29]:
tempDict


Out[29]:
{'London': 0.0, 'Oslo': 13, 'Paris': 17.5}

The reason for this: in the above example, "temp3Dict" is a reference to "tempDict".

If this is not the desired behavior, create a copy:


In [31]:
tempCopyDict = tempDict.copy()

tempCopyDict['London'] = 1e6
tempDict['London'] = 15.4

print(tempCopyDict)
print(tempDict)


{'London': 1000000.0, 'Oslo': 13, 'Paris': 17.5}
{'London': 15.4, 'Oslo': 13, 'Paris': 17.5}

Earlier in the semester I pointed you to a blog post about this. We've now covered all concepts that appeared in that discussion:

http://nedbatchelder.com/text/names.html

Final note -- dictionary elements can be anything: ints, floats, lists, arrays, dictionaries, class instances... more in today's tutorial!

Strings

We will discuss string manipulation with a series of examples. There are many examples in Section 6.3 of the book; read these, we'll just show some important ones. Operations we'll skip are joining strings, replacing upper case with lower case and vice versa, stripping out spaces, and testing for the presence of numbers. Substrings can be specified like lists, where the index refers to the character number:

In [32]:
temp = 'One Two Three'

Q. What will this print?


In [34]:
temp[2:5]


Out[34]:
'e T'

To search for a substring, use the find() method, which reports the index of the start of the first appearance of the substring.

Q. What will this print?


In [35]:
temp.find('Two')


Out[35]:
4

If the string is not found:


In [36]:
temp.find('Four')


Out[36]:
-1

To test whether a substring occurs within a string:


In [37]:
'Four' in temp


Out[37]:
False

In [38]:
# An example:

if 'Four' in temp:
    print('Four is in temp')
else:
    print('Four not found')


Four not found

This syntax is acceptable for lists, dictionaries, and strings!

The startswith and endswith methods test whether a string starts or ends with a specified substring:


In [39]:
temp


Out[39]:
'One Two Three'

In [40]:
# Q. What should this be?

temp.endswith('Three')


Out[40]:
True

The replace method replaces substrings:


In [43]:
print(temp)
temp2 = temp.replace('e', '3')
print(temp2)
print(temp)


One Two Three
On3 Two Thr33
One Two Three

String splitting: very useful for reading text files.

The split method splits strings into words separated by spaces (by default, other characters can be used too).


In [44]:
new_list = temp.split()
new_list


Out[44]:
['One', 'Two', 'Three']

This also works for multiline files:


In [47]:
temp3 = 'One\r\nTwo\r\nThree'
temp4 = temp3.splitlines()
print(temp3)
print(temp4)


One
Two
Three
['One', 'Two', 'Three']
Joining strings together. The delimiter (the thing between each text string) is specified inbetween quotes. Then a .join with a list of the strings for input to join().

In [48]:
text1 = "May"
text2 = "the"
text3 = "force"
text4 = "be with you"

combined = " uh ".join([text1, text2, text3, text4])
combined


Out[48]:
'May uh the uh force uh be with you'

Manipulating text's uppercase/lowercase:


In [49]:
marvelText = "it's CLOBBERIN' time!"

print("Upper case:  ", marvelText.upper())
print("Lower case:  ", marvelText.lower())
print("Reverse case:", marvelText.swapcase())
print("Title case:  ", marvelText.title())
print("Capital Case:", marvelText.capitalize())


Upper case:   IT'S CLOBBERIN' TIME!
Lower case:   it's clobberin' time!
Reverse case: IT'S clobberin' TIME!
Title case:   It'S Clobberin' Time!
Capital Case: It's clobberin' time!

In [52]:
'Pl4nck'.upper()


Out[52]:
'PL4NCK'

Splitting strings up:


In [50]:
dcText = "Have you ever danced with the devil in the pale moonlight?"
print(dcText.split())
print(dcText.split("the"))


['Have', 'you', 'ever', 'danced', 'with', 'the', 'devil', 'in', 'the', 'pale', 'moonlight?']
['Have you ever danced with ', ' devil in ', ' pale moonlight?']

Let's do an example reading a data file string_example.dat and converting the data to numbers. First, let's see what's in the file:


In [58]:
cat string_example.dat


# x    y 
0.67614818  0.06612175
0.72452745  0.91697594
0.49721092  0.74321783
0.11647754  0.31299047
0.74159754  0.58894824
0.88066725  0.98678083
0.24524501  0.59950154
0.65850022  0.75670362
0.15512891  0.57762052
0.32754741  0.53567974

Q. Would this code successfully read in the data? Why or why not?


In [54]:
pairs = []

with open("string_example.dat", "r") as FILE:
    for line in FILE:
        words = line.split()
        
        number1 = float(words[0])
        number2 = float(words[1])
        pair = [number1, number2]
        pairs.append(pair)

pairs


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-0dc609cdcc10> in <module>()
      5         words = line.split()
      6 
----> 7         number1 = float(words[0])
      8         number2 = float(words[1])
      9         pair = [number1, number2]

ValueError: could not convert string to float: '#'

This is a nice place for a dictionary!


In [57]:
data = {'x': [], 'y': []}

with open("string_example.dat", "r") as FILE:
    for line in FILE:
        print(line)
        if line[0] == '#':
            continue
        
        words = line.split()
        
        data['x'].append(float(words[0]))
        data['y'].append(float(words[1]))

print(data)


# x    y 

0.67614818  0.06612175

0.72452745  0.91697594

0.49721092  0.74321783

0.11647754  0.31299047

0.74159754  0.58894824

0.88066725  0.98678083

0.24524501  0.59950154

0.65850022  0.75670362

0.15512891  0.57762052

0.32754741  0.53567974
{'y': [0.06612175, 0.91697594, 0.74321783, 0.31299047, 0.58894824, 0.98678083, 0.59950154, 0.75670362, 0.57762052, 0.53567974], 'x': [0.67614818, 0.72452745, 0.49721092, 0.11647754, 0.74159754, 0.88066725, 0.24524501, 0.65850022, 0.15512891, 0.32754741]}

In [59]:
pl.scatter(data['x'], data['y'])


Out[59]:
<matplotlib.collections.PathCollection at 0x118082be0>

In [ ]: