Week 02 - Basic Python data structures and Viz

Lists

Lists are collections of heterogeneous objects. They can be appended to, iterated over, etc, and we will use them for lots of fun things. They're useful especially when you don't know in advance how big something is going to be or what types of objects will be in it.

We'll set a simple one up that includes the numbers 1 through 9.



In [1]:

    
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Now let's call dir on it to see what things we can do to it. Note that this will include lots of things starting with two underscores; for the most part these are "hidden" methods that we will use implicitly when we do things. The main methods you'll use directly are the ones that don't start with underscores.



In [2]:

    
dir(a)









    Out[2]:





['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Lists can be reversed in-place. This means the return value is empty (None) but that the list has been changed. An important thing that this means is that lists are mutable -- you can change them without copying them into a new thing.



In [3]:

    
a.reverse()



In [4]:

    
a









    Out[4]:





[9, 8, 7, 6, 5, 4, 3, 2, 1]

We can sort them, too. Here the sorting is trivial -- it'll end up just reversing it back to what it was. But, we can sort a more complex list as well.



In [5]:

    
a.sort()



In [6]:

    
a









    Out[6]:





[1, 2, 3, 4, 5, 6, 7, 8, 9]

Because lists are mutable, we can insert things into them. Lists are zero-indexed, which means that the very first place is 0, not 1. This makes insertion a lot easier if you think about the position you're inserting at -- 0 is the first (so it pre-empts the first item in the list) and so on. Here, we'll insert at position 3, which is between the numbers 3 and 4 in this list.



In [7]:

    
a.insert(3, 3.9)



In [8]:

    
a









    Out[8]:





[1, 2, 3, 3.9, 4, 5, 6, 7, 8, 9]

We can also append values.



In [9]:

    
a.append(10)



In [10]:

    
a









    Out[10]:





[1, 2, 3, 3.9, 4, 5, 6, 7, 8, 9, 10]

We can also remove an item; note that using pop here will not only remove the item, but return it as a return value. If we were to use del then it would not return it.



In [11]:

    
a.pop(3)









    Out[11]:





3.9



In [12]:

    
a









    Out[12]:





[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

We can also use negative indices. This means "the last" item.



In [13]:

    
a.pop(-1)









    Out[13]:





10

Slices

We can also slice lists. This uses the square brackets [] to supply a start index, a stop index, and a step. This lets us choose subsets. If you leave one of the items out, it defaults to the maximum selection -- i.e., first, last, and step of 1.



In [14]:

    
a[1:5:2]









    Out[14]:





[2, 4]

Here we just start at the beginning and take every other item.



In [15]:

    
a[::2]









    Out[15]:





[1, 3, 5, 7, 9]

Every other item, starting from the second:



In [16]:

    
a[1::2]









    Out[16]:





[2, 4, 6, 8]

We can also iterate in reverse:



In [17]:

    
a[::-1]









    Out[17]:





[9, 8, 7, 6, 5, 4, 3, 2, 1]

In reverse, but every second.



In [18]:

    
a[::-2]









    Out[18]:





[9, 7, 5, 3, 1]

Lists can include objects of different types.



In [19]:

    
a.append("blast off")



In [20]:

    
a









    Out[20]:





[1, 2, 3, 4, 5, 6, 7, 8, 9, 'blast off']



In [21]:

    
a.pop(-1)









    Out[21]:





'blast off'

A common problem you may run into is that sometimes, numbers look like strings. This can cause problems, as we'll see:



In [22]:

    
a.append('10')



In [23]:

    
a









    Out[23]:





[1, 2, 3, 4, 5, 6, 7, 8, 9, '10']

If it were the number 10, this would work. Unfortunately, strings and numbers can't be sorted together.



In [24]:

    
a.sort()









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-e7eb8b51a6fa> in <module>()
----> 1 a.sort()

TypeError: unorderable types: str() < int()

Dictionaries

Dictionaries (dict objects) are hashes, where a key is looked up to find a value. Both keys and values can be of hetereogeneous types within a given dict; there are some restrictions on what can be used as a key. (The type must be "hashable," which among other things means that it can't be a list.)

We can initialize an empty dict with the curly brackets, {}, and then we can assign things to this dict.



In [25]:

    
b = {}

Here, we can just use an integer key that gets us to a string.



In [26]:

    
b[0] = 'a'

If we look at the dict, we can see what it includes.



In [27]:

    
b









    Out[27]:





{0: 'a'}

We can see a view on what all the keys are using .keys():



In [28]:

    
b.keys()









    Out[28]:





dict_keys([0])

If we just want to see what all the values are, we can use .values():



In [29]:

    
b.values()









    Out[29]:





dict_values(['a'])

If we ask for a key that doesn't exist, we get a KeyError:



In [30]:

    
b[1]









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-33e961e0e4ea> in <module>()
----> 1 b[1]

KeyError: 1

Earlier, I noted that lists can't be used as keys in dicts, but they can be used as values. For example:



In [31]:

    
b = {0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]}



In [32]:

    
b









    Out[32]:





{0: [1, 2, 3], 1: [4, 5, 6], 2: [7, 8, 9]}

We can also iterate over the keys in a dict, simply by iterating over the dict itself. This statement will return each of the keys in turn, and we can see what value it is associated with.



In [33]:

    
for key in b:
    print(b[key])









    



[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

Sets

Sometimes, we need to keep track of all unique values in something. This is where sets come in. These are unsorted objects that only contain one of every item. This means you can do neat set operations on them. Let's initialize two sets that overlap:



In [34]:

    
c = set([1,2,3,4,5])
d = set([4,5,6,7,8])

We can now subtract one from the other, to see all objects in one but not the other.



In [35]:

    
c - d









    Out[35]:





{1, 2, 3}

We can also union them:



In [36]:

    
e = c.union(d)



In [37]:

    
e









    Out[37]:





{1, 2, 3, 4, 5, 6, 7, 8}

An interesting component of sets is that they accept iterables. This means that if you supply to them strings, they will look at each character of the string as an independent object. So we can create two sets from two strings, and see what they contain -- all the unique values in each of the strings.



In [38]:

    
s1 = "Hello there, how are you?"
s2 = "I am fine, how are you doing today?"
v1 = set(s1)
v2 = set(s2)



In [39]:

    
v1









    Out[39]:





{' ', ',', '?', 'H', 'a', 'e', 'h', 'l', 'o', 'r', 't', 'u', 'w', 'y'}



In [40]:

    
v2









    Out[40]:





{' ',
 ',',
 '?',
 'I',
 'a',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'm',
 'n',
 'o',
 'r',
 't',
 'u',
 'w',
 'y'}

Let's see how many there are in each:



In [41]:

    
len(s1), len(v1)









    Out[41]:





(25, 14)



In [42]:

    
len(s2), len(v2)









    Out[42]:





(35, 19)

If we combine, we can see how many unique characters in the two strings combined there are:



In [43]:

    
len(v1.union(v2))









    Out[43]:





21

Iteration

We can use the for construct to iterate over objects. Depending on the object type, this has different meaning. If we iterate over a list, we get each item.



In [44]:

    
for value in a:
    print(value)

If we iterate over a dictionary, we get the keys. We can also explicitly iterate over keys:



In [45]:

    
for name in b.keys():
    print(b[name])









    



[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

If we iterate over a set, we get all the values in that set. Note, however, that this iteration order is not guaranteed to be consistent, and should not be relied upon.



In [46]:

    
for value in v1:
    print(value)









    



H
u
?
h
a
t
y
,
w
l
 
r
e
o

Dataset 1: Building Inventory in IL

We will start out using a dataset from the Illinois Open Data repository about the buildings under state ownership in Illinois. You can download it here: https://data.illinois.gov/Housing/Building-Inventory/utd5-tdr2 by clicking "Export" or by going to our class data repository.

At this point in the class, we will be utilizing very simple data reading and visualization techniques, so that we have the opportunity to see basic data structures, simple visualization, and so forth, before we start getting into pandas and other more advanced libraries.

We will use the built-in csv module to read in the data.



In [47]:

    
import csv

Here, we'll use next to get the first line, then we will proceed to read the rest. The first line in this file is the header.



In [48]:

    
f = open("Building_Inventory.csv")
csv_reader = csv.reader(f)
header = next(csv_reader)



In [49]:

    
header









    Out[49]:





['Agency Name',
 'Location Name',
 'Address',
 'City',
 'Zip code',
 'County',
 'Congress Dist',
 'Congressional Full Name',
 'Rep Dist',
 'Rep Full Name',
 'Senate Dist',
 'Senator Full Name',
 'Bldg Status',
 'Year Acquired',
 'Year Constructed',
 'Square Footage',
 'Total Floors',
 'Floors Above Grade',
 'Floors Below Grade',
 'Usage Description',
 'Usage Description 2',
 'Usage Description 3']

We will now pre-initialize a dict with the header values so that we can subsequently iterate and fill it. This will help us transform from a row-based store to a column-based store.



In [50]:

    
data = {}
for name in header:
    data[name] = []
data









    Out[50]:





{'Address': [],
 'Agency Name': [],
 'Bldg Status': [],
 'City': [],
 'Congress Dist': [],
 'Congressional Full Name': [],
 'County': [],
 'Floors Above Grade': [],
 'Floors Below Grade': [],
 'Location Name': [],
 'Rep Dist': [],
 'Rep Full Name': [],
 'Senate Dist': [],
 'Senator Full Name': [],
 'Square Footage': [],
 'Total Floors': [],
 'Usage Description': [],
 'Usage Description 2': [],
 'Usage Description 3': [],
 'Year Acquired': [],
 'Year Constructed': [],
 'Zip code': []}

We're going to use zip to simultaneously iterate over two iterables; this works like follows, where you can see it "zip" up the two items and yield each in turn.



In [51]:

    
list1 = ['a', 'b', 'c', 'd']
list2 = [1, 2, 3, 4]
for v in zip(list1, list2):
    print(v)









    



('a', 1)
('b', 2)
('c', 3)
('d', 4)

Now, for every row, we append to the appropriate list.



In [52]:

    
for row in csv_reader:
    for name, value in zip(header, row):
        data[name].append(value)

This gives us results like this:



In [53]:

    
data['Zip code']









    Out[53]:





['61501',
 '61501',
 '61501',
 '61501',
 '61501',
 '61501',
 '62938',
 '62938',
 '61373',
 '61373',
 '61373',
 '61373',
 '62231',
 '62311',
 '62311',
 '62311',
 '62311',
 '62311',
 '62311',
 '62311',
 '62311',
 '62534',
 '62534',
 '60420',
 '60432',
 '62203',
 '61844',
 '62454',
 '61341',
 '62832',
 '62832',
 '62832',
 '60466',
 '60466',
 '60466',
 '62685',
 '62707',
 '61943',
 '61943',
 '61943',
 '61943',
 '61943',
 '61943',
 '61943',
 '61943',
 '61858',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '60434',
 '60048',
 '62056',
 '62702',
 '62254',
 '62401',
 '61233',
 '61063',
 '62706',
 '61761',
 '61761',
 '61761',
 '61761',
 '61761',
 '61761',
 '62324',
 '61491',
 '61491',
 '61491',
 '61491',
 '61491',
 '61491',
 '61491',
 '61491',
 '62701',
 '61846',
 '62627',
 '62203',
 '61943',
 '61943',
 '61943',
 '61061',
 '61341',
 '62231',
 '62701',
 '62701',
 '60434',
 '60434',
 '61443',
 '61858',
 '60434',
 '61951',
 '61951',
 '62897',
 '62448',
 '62448',
 '61735',
 '61735',
 '61735',
 '61735',
 '61735',
 '61858',
 '60174',
 '60174',
 '62656',
 '62288',
 '62201',
 '62201',
 '60612',
 '62354',
 '60450',
 '62241',
 '62241',
 '62241',
 '62241',
 '62241',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '62354',
 '61350',
 '60481',
 '60481',
 '61443',
 '62995',
 '60196',
 '60196',
 '62901',
 '62864',
 '62901',
 '62958',
 '60612',
 '61542',
 '61542',
 '62644',
 '62958',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '62960',
 '62960',
 '62960',
 '62960',
 '62960',
 '62960',
 '62960',
 '62960',
 '62960',
 '61520',
 '61520',
 '62584',
 '61834',
 '61920',
 '62938',
 '62938',
 '62966',
 '62471',
 '62471',
 '62471',
 '62471',
 '62037',
 '62906',
 '62906',
 '60441',
 '62454',
 '62231',
 '62231',
 '62217',
 '62217',
 '62401',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62274',
 '62231',
 '62908',
 '62217',
 '62217',
 '62217',
 '62217',
 '62217',
 '62217',
 '62217',
 '62217',
 '61567',
 '61567',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '61615',
 '62037',
 '62233',
 '62233',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '62534',
 '61443',
 '62259',
 '62966',
 '60062',
 '60031',
 '62702',
 '62702',
 '62702',
 '61764',
 '61764',
 '62832',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '61820',
 '60607',
 '60612',
 '61820',
 '62901',
 '62901',
 '62901',
 '62901',
 '61761',
 '61301',
 '60612',
 '62448',
 '62448',
 '62919',
 '62919',
 '62919',
 '62919',
 '62919',
 '62919',
 '62656',
 '62656',
 '62656',
 '62656',
 '62821',
 '62837',
 '62821',
 '62821',
 '62832',
 '62832',
 '62832',
 '62832',
 '61350',
 '61858',
 '61858',
 '61858',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '61764',
 '62801',
 '62801',
 '60426',
 '61401',
 '61401',
 '62914',
 '62988',
 '62901',
 '60628',
 '60115',
 '60115',
 '60115',
 '60115',
 '60115',
 '60115',
 '60115',
 '60115',
 '60115',
 '62301',
 '62259',
 '62056',
 '61341',
 '62801',
 '61341',
 '61341',
 '62940',
 '62940',
 '62940',
 '61820',
 '61820',
 '60085',
 '60085',
 '60085',
 '60085',
 '60085',
 '60085',
 '60085',
 '60085',
 '60085',
 '61735',
 '61735',
 '61735',
 '61735',
 '61735',
 '62274',
 '62274',
 '62274',
 '62864',
 '62864',
 '62864',
 '62864',
 '62864',
 '62864',
 '62859',
 '62901',
 '61341',
 '60099',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '62046',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '60081',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62233',
 '61201',
 '62259',
 '62995',
 '62259',
 '61418',
 '62821',
 '62821',
 '62938',
 '62702',
 '62832',
 '62832',
 '61201',
 '62534',
 '61846',
 '61846',
 '61846',
 '61846',
 '61341',
 '61341',
 '61341',
 '62354',
 '62354',
 '61801',
 '60950',
 '61550',
 '62049',
 '61048',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62046',
 '62939',
 '60099',
 '62201',
 '61738',
 '62702',
 '61801',
 '62448',
 '62448',
 '62448',
 '62448',
 '62448',
 '62448',
 '60174',
 '60628',
 '60103',
 '62454',
 '62259',
 '62259',
 '61550',
 '61341',
 '61341',
 '61341',
 '61341',
 '61341',
 '61341',
 '62908',
 '60481',
 '60481',
 '60481',
 '60481',
 '60481',
 '60481',
 '60481',
 '61801',
 '62627',
 '62627',
 '62627',
 '62627',
 '62627',
 '62627',
 '62627',
 '62627',
 '62627',
 '61517',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '61235',
 '1235',
 '61235',
 '60432',
 '60432',
 '61310',
 '61329',
 '60556',
 '62363',
 '61341',
 '62026',
 '62471',
 '60644',
 '62901',
 '62954',
 '62954',
 '62954',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62704',
 '62656',
 '62301',
 '62301',
 '61701',
 '61455',
 '61063',
 '60607',
 '62919',
 '62919',
 '62919',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62441',
 '62440',
 '60950',
 '62465',
 '62048',
 '61752',
 '61752',
 '62448',
 '62448',
 '60083',
 '60083',
 '60181',
 '62205',
 '62685',
 '62685',
 '62685',
 '62685',
 '62685',
 '62685',
 '62685',
 '62685',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62231',
 '62988',
 '62702',
 '62702',
 '62702',
 '62702',
 '62702',
 '62702',
 '62702',
 '62901',
 '62703',
 '62859',
 '62859',
 '62859',
 '62859',
 '62859',
 '62859',
 '62859',
 '61727',
 '61727',
 '61727',
 '61727',
 '61727',
 '62946',
 '62812',
 '62812',
 '62812',
 '61021',
 '61021',
 '62702',
 '62901',
 '62812',
 '62812',
 '62812',
 '62656',
 '62656',
 '62656',
 '62656',
 '61832',
 '62530',
 '62901',
 '62301',
 '62850',
 '62812',
 '62812',
 '62812',
 '62812',
 '62812',
 '62812',
 '62812',
 '62812',
 '61501',
 '62534',
 '62534',
 '62433',
 '62433',
 '62433',
 '62433',
 '62433',
 '62433',
 '62433',
 '62433',
 '60628',
 '62702',
 '62704',
 '61401',
 '62354',
 '62354',
 '62354',
 '62812',
 '62910',
 '62294',
 '60634',
 '62201',
 '60477',
 '61520',
 '61341',
 '61341',
 '61341',
 '62703',
 '62274',
 '62274',
 '62274',
 '61485',
 '62897',
 '62897',
 '62897',
 '62897',
 '62850',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62897',
 '62701',
 '62465',
 '62850',
 '62850',
 '62850',
 '62850',
 '62850',
 '61356',
 '61265',
 '61858',
 '61858',
 '61858',
 '61360',
 '61360',
 '62627',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62286',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 '62260',
 ...]

We have one name/list pair for every header entry:



In [54]:

    
data.keys()









    Out[54]:





dict_keys(['Rep Dist', 'County', 'Rep Full Name', 'Year Acquired', 'Total Floors', 'Floors Below Grade', 'Bldg Status', 'Usage Description 3', 'Usage Description 2', 'Year Constructed', 'Senator Full Name', 'Floors Above Grade', 'Congressional Full Name', 'Usage Description', 'Location Name', 'Address', 'Zip code', 'Congress Dist', 'Senate Dist', 'Square Footage', 'Agency Name', 'City'])

We can see how many zip code entries (rows) there are:



In [55]:

    
len(data['Zip code'])









    Out[55]:





8849

As well as how many unique zip codes there are.



In [56]:

    
len(set(data['Zip code']))









    Out[56]:





457

Now, the same thing with congressional districts:



In [57]:

    
len(set(data['Congress Dist']))









    Out[57]:





19



In [58]:

    
len(set(data['Congressional Full Name']))









    Out[58]:





19

There's a special data structure called a Counter that we can use to figure out how many of each unique item there are.



In [59]:

    
from collections import Counter



In [60]:

    
c = Counter(data['Zip code'])

It associates each item with a count, as it iterates, and then we can get that information back.



In [61]:

    
max(c.values())









    Out[61]:





265

We can sort by the most common:



In [62]:

    
c.most_common()









    Out[62]:





[('62702', 265),
 ('62901', 258),
 ('62037', 235),
 ('61801', 220),
 ('62958', 199),
 ('61820', 146),
 ('62471', 130),
 ('62656', 126),
 ('62259', 126),
 ('62260', 123),
 ('60434', 106),
 ('61764', 104),
 ('61761', 102),
 ('60477', 100),
 ('61373', 99),
 ('62627', 96),
 ('62995', 92),
 ('62650', 89),
 ('62675', 88),
 ('62703', 84),
 ('61341', 83),
 ('61615', 83),
 ('61858', 83),
 ('62897', 81),
 ('60466', 79),
 ('62231', 79),
 ('61021', 76),
 ('62233', 76),
 ('62832', 75),
 ('61074', 73),
 ('60551', 71),
 ('61920', 70),
 ('60081', 68),
 ('62801', 67),
 ('60085', 67),
 ('60901', 67),
 ('60115', 64),
 ('62301', 63),
 ('62448', 60),
 ('60099', 60),
 ('61201', 59),
 ('62906', 58),
 ('60174', 57),
 ('61054', 57),
 ('60432', 56),
 ('61752', 55),
 ('60420', 52),
 ('61061', 52),
 ('62466', 51),
 ('62274', 50),
 ('62049', 50),
 ('62534', 50),
 ('60612', 49),
 ('60950', 49),
 ('62563', 49),
 ('60481', 48),
 ('60914', 48),
 ('62952', 47),
 ('62938', 44),
 ('62812', 44),
 ('61455', 44),
 ('61520', 44),
 ('61401', 43),
 ('62026', 43),
 ('62286', 43),
 ('61443', 42),
 ('62263', 41),
 ('62326', 41),
 ('61244', 40),
 ('61001', 40),
 ('61270', 40),
 ('61517', 40),
 ('60607', 40),
 ('61727', 39),
 ('62939', 36),
 ('60050', 36),
 ('62859', 36),
 ('62441', 36),
 ('61536', 36),
 ('60634', 35),
 ('60123', 34),
 ('62401', 34),
 ('61874', 34),
 ('62454', 33),
 ('62217', 33),
 ('62205', 32),
 ('62568', 31),
 ('62526', 30),
 ('60141', 30),
 ('62241', 30),
 ('60450', 30),
 ('61834', 30),
 ('62353', 29),
 ('62704', 29),
 ('62440', 29),
 ('60560', 28),
 ('60550', 28),
 ('62960', 28),
 ('61048', 27),
 ('62934', 26),
 ('62864', 26),
 ('62854', 26),
 ('62354', 25),
 ('62002', 25),
 ('62584', 24),
 ('60051', 24),
 ('62846', 24),
 ('62919', 24),
 ('62988', 23),
 ('62046', 23),
 ('61350', 23),
 ('62202', 22),
 ('61943', 22),
 ('62685', 22),
 ('60633', 22),
 ('62966', 22),
 ('61361', 21),
 ('60436', 21),
 ('62946', 21),
 ('61701', 21),
 ('62203', 20),
 ('61036', 20),
 ('61277', 20),
 ('62040', 19),
 ('60628', 19),
 ('62558', 19),
 ('60555', 19),
 ('62201', 18),
 ('60625', 17),
 ('61567', 17),
 ('61542', 17),
 ('62908', 17),
 ('61235', 16),
 ('61951', 16),
 ('62056', 16),
 ('62234', 15),
 ('62959', 15),
 ('62850', 15),
 ('60449', 15),
 ('62644', 15),
 ('61071', 15),
 ('62221', 15),
 ('61491', 15),
 ('61490', 14),
 ('61735', 14),
 ('62681', 14),
 ('61031', 14),
 ('62324', 13),
 ('62701', 13),
 ('60540', 13),
 ('62363', 13),
 ('62962', 13),
 ('61080', 13),
 ('61379', 13),
 ('62706', 13),
 ('62311', 13),
 ('61540', 13),
 ('62465', 12),
 ('61264', 12),
 ('61614', 12),
 ('61105', 12),
 ('61419', 12),
 ('61545', 12),
 ('62943', 12),
 ('62707', 11),
 ('61301', 11),
 ('60912', 11),
 ('61546', 11),
 ('61528', 11),
 ('62549', 11),
 ('60435', 11),
 ('61944', 10),
 ('62984', 10),
 ('62931', 10),
 ('61501', 10),
 ('61550', 10),
 ('61042', 10),
 ('60506', 10),
 ('62433', 10),
 ('62914', 10),
 ('62048', 9),
 ('60175', 9),
 ('61938', 9),
 ('61866', 9),
 ('61265', 9),
 ('61462', 9),
 ('61914', 9),
 ('61856', 9),
 ('61356', 9),
 ('62439', 8),
 ('61322', 8),
 ('62017', 8),
 ('60103', 8),
 ('61533', 8),
 ('61433', 8),
 ('61111', 8),
 ('62080', 8),
 ('62881', 8),
 ('61032', 7),
 ('61548', 7),
 ('60441', 7),
 ('60482', 7),
 ('62863', 7),
 ('62917', 7),
 ('61254', 7),
 ('60426', 7),
 ('60911', 7),
 ('60447', 7),
 ('60098', 7),
 ('61532', 7),
 ('60616', 7),
 ('60120', 6),
 ('60918', 6),
 ('62206', 6),
 ('62052', 6),
 ('60532', 6),
 ('62910', 6),
 ('60007', 6),
 ('62821', 6),
 ('61832', 6),
 ('62450', 6),
 ('62236', 6),
 ('62249', 6),
 ('62521', 6),
 ('61107', 6),
 ('62837', 6),
 ('62321', 6),
 ('60630', 6),
 ('61953', 6),
 ('62288', 5),
 ('61611', 5),
 ('62835', 5),
 ('62565', 5),
 ('61231', 5),
 ('62691', 5),
 ('62294', 5),
 ('60644', 5),
 ('60041', 5),
 ('62684', 5),
 ('60062', 5),
 ('61822', 5),
 ('62428', 5),
 ('61318', 5),
 ('60608', 5),
 ('62694', 5),
 ('62024', 5),
 ('62954', 5),
 ('61607', 5),
 ('60439', 5),
 ('60525', 5),
 ('61846', 5),
 ('62929', 5),
 ('62278', 5),
 ('60473', 5),
 ('60178', 5),
 ('62839', 5),
 ('60162', 4),
 ('62246', 4),
 ('61310', 4),
 ('61842', 4),
 ('60410', 4),
 ('62676', 4),
 ('61852', 4),
 ('61063', 4),
 ('60016', 4),
 ('60970', 4),
 ('62940', 4),
 ('61085', 4),
 ('61418', 4),
 ('61081', 4),
 ('62082', 4),
 ('60451', 4),
 ('61844', 4),
 ('60936', 4),
 ('61362', 4),
 ('60093', 4),
 ('61434', 4),
 ('62816', 4),
 ('60113', 4),
 ('62501', 4),
 ('61115', 4),
 ('62016', 4),
 ('62208', 4),
 ('61863', 4),
 ('61030', 4),
 ('61442', 4),
 ('61353', 4),
 ('61836', 4),
 ('62626', 4),
 ('61028', 4),
 ('61377', 4),
 ('62926', 4),
 ('60048', 4),
 ('61053', 4),
 ('60005', 4),
 ('60638', 3),
 ('62269', 3),
 ('60181', 3),
 ('63312', 3),
 ('61282', 3),
 ('62664', 3),
 ('60033', 3),
 ('61725', 3),
 ('61020', 3),
 ('60424', 3),
 ('61445', 3),
 ('61262', 3),
 ('61741', 3),
 ('60030', 3),
 ('61754', 3),
 ('60656', 3),
 ('61776', 3),
 ('62025', 3),
 ('61738', 3),
 ('62053', 3),
 ('61109', 3),
 ('60621', 3),
 ('61008', 3),
 ('62794', 3),
 ('62999', 3),
 ('60445', 3),
 ('61703', 2),
 ('62340', 2),
 ('61240', 2),
 ('61747', 2),
 ('61413', 2),
 ('60126', 2),
 ('62254', 2),
 ('61278', 2),
 ('60423', 2),
 ('60452', 2),
 ('61729', 2),
 ('60545', 2),
 ('60035', 2),
 ('60196', 2),
 ('60468', 2),
 ('60083', 2),
 ('61485', 2),
 ('61516', 2),
 ('62244', 2),
 ('61854', 2),
 ('61813', 2),
 ('60160', 2),
 ('60406', 2),
 ('62258', 2),
 ('62035', 2),
 ('60427', 2),
 ('60546', 2),
 ('61348', 2),
 ('61360', 2),
 ('61469', 2),
 ('62992', 2),
 ('60622', 2),
 ('62618', 2),
 ('61329', 2),
 ('61230', 2),
 ('60130', 2),
 ('60044', 2),
 ('61364', 2),
 ('61790', 2),
 ('61256', 2),
 ('60031', 2),
 ('61428', 2),
 ('60618', 2),
 ('60601', 2),
 ('60045', 2),
 ('60556', 2),
 ('60453', 1),
 ('61245', 1),
 ('61420', 1),
 ('61011', 1),
 ('62047', 1),
 ('62418', 1),
 ('60647', 1),
 ('62280', 1),
 ('62888', 1),
 ('60140', 1),
 ('61233', 1),
 ('60137', 1),
 ('61101', 1),
 ('60803', 1),
 ('60966', 1),
 ('61108', 1),
 ('61930', 1),
 ('60165', 1),
 ('60071', 1),
 ('62654', 1),
 ('61103', 1),
 ('62209', 1),
 ('61314', 1),
 ('61376', 1),
 ('62161', 1),
 ('62442', 1),
 ('60077', 1),
 ('62548', 1),
 ('62612', 1),
 ('60646', 1),
 ('60619', 1),
 ('60714', 1),
 ('60164', 1),
 ('60632', 1),
 ('62530', 1),
 ('60193', 1),
 ('61072', 1),
 ('60060', 1),
 ('60651', 1),
 ('62273', 1),
 ('60407', 1),
 ('60018', 1),
 ('61605', 1),
 ('60652', 1),
 ('62849', 1),
 ('62592', 1),
 ('62060', 1),
 ('61368', 1),
 ('60615', 1),
 ('60804', 1),
 ('62012', 1),
 ('60025', 1),
 ('60623', 1),
 ('62882', 1),
 ('60643', 1),
 ('61734', 1),
 ('60070', 1),
 ('62896', 1),
 ('62425', 1),
 ('62479', 1),
 ('61247', 1),
 ('62923', 1),
 ('62844', 1),
 ('68297', 1),
 ('61810', 1),
 ('60544', 1),
 ('61267', 1),
 ('60096', 1),
 ('60421', 1),
 ('60653', 1),
 ('61755', 1),
 ('62090', 1),
 ('60455', 1),
 ('61720', 1),
 ('60661', 1),
 ('62226', 1),
 ('60680', 1),
 ('60504', 1),
 ('60617', 1),
 ('60606', 1),
 ('1235', 1),
 ('60411', 1),
 ('60534', 1),
 ('62808', 1),
 ('60827', 1),
 ('62347', 1),
 ('62022', 1),
 ('62029', 1),
 ('60304', 1),
 ('60541', 1)]

We can try to compute the square footage of each building in multiples of 100, but we'll see that...



In [63]:

    
data['Square Footage'][0] / 100









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-63-8797ae715c03> in <module>()
----> 1 data['Square Footage'][0] / 100

TypeError: unsupported operand type(s) for /: 'str' and 'int'

...it's currently all strings. What this means is that we need to convert it. We could do this by making another list, but we'll try instead using numpy arrays. Numpy arrays can be thought of as "lists" that aren't expandable, that contain objects that are all the same size (except for "object" arrays, which we won't cover) and that have a number of operations that take advantage of these assumptions.

First we import numpy as np, as per convention.



In [64]:

    
import numpy as np

Now, we can convert our list to integers:



In [65]:

    
square_footage = np.array(data['Square Footage'], dtype='int')



In [66]:

    
square_footage









    Out[66]:





array([  144,   144,   144, ..., 13387,  3793,  3793])

There are a few operations we can call on numpy arrays, such as min/max, which we'll look at now:



In [67]:

    
square_footage.max()









    Out[67]:





1200000



In [68]:

    
square_footage.min()









    Out[68]:





0

Numpy arrays also allow for slicing. We'll look at every 10th value.



In [69]:

    
square_footage[::10]









    Out[69]:





array([   144,    144,    144,   3360,    220,    144,    900,    153,
        15000, 100000,   3520,    120,   2875,     24,    560,   2000,
          710,   2447,    665,    144,    616,   1400,   2159,    360,
         2418,    514,    702,    307,    308,    360,   1312,    144,
          144,    240,   3200,    720,   5655, 154000,    105,    950,
          144,    468,      0,    500,    120,   1958,   1050,    336,
          320,    150,    150,   1444,   1200,   2450, 118200,      0,
          400,    144,     18,    512,    144,    144,    144,    144,
         4956,    200,   4500,  12200,    104,    104,     64,    144,
          630,  45000,    144,    106,     80,    104,    452,  30000,
          420,    104,    144,    144,    104,    240,     75,     75,
           75,     75,     75,     75,     75,    290,    700,    700,
           75,     75,     75,     75,     75,     72,    512,    512,
         2520,    260,      0,     64,    144,   1680,   4335,    560,
          144,   4608,   3300,     20,      0,   2260,  11690,    126,
         8495,    520,    560,  11897,     27,   2040,    160,     30,
         3600,    254,   1360,    192,   1500,   2340,    800,    480,
          100,   1800,    120,     20,    300,    320,    560,   1264,
           19,    430,     18,     18,    154,    152,   2581,    504,
          320,   1012,   2700,   4420,   1920,     48,    943,    144,
         1950,     96,     20,     20,     78,    576,     80,   2656,
          615,    500,     80,    100,    850,    600,    228,    228,
          228,   2333,    640,    480,    144,    315,    228,   2196,
          900,    325,   1120,    360,    900,    560,     50,     48,
         3806,    576,    162,     12,    540,   1620,     16,    120,
         4400,     60,     40,     60,    110,   1152,    357,     20,
           48,     20,     20,    528,   4188,    408,    408,    672,
         9600,   1920,     36,   2700,    228,    228,    384,    800,
         2700,     60,    200,    600,  11700,      0,   3360,  45000,
         2710,    160,  16698,    993,  12336,   3540,  50885,    720,
         1089,      0,    207,    100,   8000,   1250,    480,     60,
         7980,   4000,    144,   1120,    108,    360,   8320,     40,
         1200,    560,   1980,     96,     35,    100,   1200,     20,
          230,   4712,     27,     36,   2300,    144,    864,    160,
         4200,    154,   3500,    100,   4900,    800,   4200,    200,
         3200,    392,      9,     42,   1922,     78,     24,   1006,
           42,    144,     77,    104,    144,    600,    630,     80,
          420,    144,   1680,  20221,    625,    680,    177,    200,
          120,   1000,   2322,     56,    150,    120,     28,   2268,
         7500,    750,     50,    750,    750,    384,     25,   3750,
          120,     16,    775,   1540,   1410,     99,   3520,     42,
          108,     84,    312,    128,    600,    984,    832,   2400,
           84,   1120,   1500,     80,    600,    144,     40,   3471,
         3950,   3000,     36,    196,     20,     20,     24,   1222,
           24,    140,     25,     25,   2400,    576,     20,    900,
          540,    100,    468,     20,     20,    660,     25,    300,
         1950,   1260,   3120,   4000,    200,     30,   4141,   2430,
         6000,    800,    768,    126,  34198,   2961,   1624,   2970,
         2920,   2970,   2970,   2970,   2920,   8350,    112,   2800,
         2800,   2260,   2800,   2800,   2800,   7300,   6860,    100,
        24121,  48525,  22689,  11112,   2268,  21888,    200,    422,
          144,     98,   6939,     80,  17831,  24431,   3683,   5476,
         8955,  28333,  25825,  27687,   2525,   4256,  24765,  23800,
        15000,   7390,   2909,   2909,   2909,   2909,   2909,  12089,
       128500,  36813,  13326, 104451,   6578,  49729,   4042,   7543,
       103740,  55527,   1500,   2600,  29375,  57569,  28993,   2510,
        13100,  11200,  44333,   7900,  17826,   1633,  44000,  19500,
        26775,  12625,    390,     52,    105,   8000,  12000,    362,
        23700,  69046,    117,    483,   3600,    204,   8604,   1518,
        10306,   1943,    158,   1575,    288,    144,    720,    144,
          432,    900,  50640,  21234,   3083,    432,   7940,    200,
         1466,    480,  11362,    900,   4338,    162,   7529,   9250,
          201,   6868,   4592,   2900,    446,   6261,    704,    100,
        31500,  16176,   4725,   2200,   7000,    225,   1600,    130,
         2460,   1440,  14400,    955,   1690,   3325,    346,   9915,
         4295,    782,  26230,   8135,   7770,    169,    174,    900,
         5192,    630,   7277,  10333,  49275,    560,   1800,  17549,
          330,    432,  17000,    576,  15335,   8895,    432,   1536,
        46000,    714,    100,   3450,   8895,  20250,    432,  44000,
        13266,    164,   1000,    288,    353,  43300,    144,   2800,
          120,  55500,    144,    240,  13000,    432,    280,  13266,
          164,  49275,    144,      0,  13000,   9000,    400,  49275,
          169,    144,  19894,    100,   1455,    560,   2400,  10526,
        30142,  29836,   3200,   4500,   1275,    450,     20,   4500,
         7776,   3600,  12000,   2000,   5280,    288,  16160,  10500,
         5930,   5936,    420,   1575,   4600,   4800,   1680,  58300,
         5184,   2400,     48,   1440,   5000,   2921,   2400,   6432,
         4069,   5760,   3116,   5408,   2921,   9462,   2400,   2160,
         8320,   7850,   8320,   1983,  58305,   4500,   4320,   6048,
         3000,   2922,   5281,   2880,  16000,  12500,   5760,   4032,
         4032,   2888,   2600,   4438,   2921,    532,   6720,  42715,
         5880,   2374,   7200,   1920,   5320,   4608,   4032,     64,
         5408,    720,   3000,  11132,   4069,   4100,   2921,    660,
         1920,  14000,   5976,   7376,   6840,  69600,   7500,   1216,
          294,    503,  36474,  24955,     64,  21860,   6008,  85312,
          956,    720,   1640,   5449,  18400,    168, 224959,    192,
          192,    192,  12272,  10944,  29828,   4600,    538,  57262,
         1092,  40852,    109,     80,   2940,  28962,   1250, 226062,
          150, 179118,  10500, 443865, 131400,  27672,  19303,    740,
         2114,    943,   2400,    851,   6480,  14000,   4716,   3600,
        14900, 107660,  56060, 166742,  96362,  80434,  98512,  45772,
       260237,   6314,   5925,   9688,  10478, 190977,  37761,   2419,
         1338,  21845,   4992, 171231,  11650,   4800,   6480,   9120,
          150,    912,   2072,   1552,  10265,   8627,  14484,    132,
          150,    220,  12894, 264105, 456722,  68343,   9104, 188839,
        50200,  21734,  19040,   8056,  54973, 331541,  43193,  98347,
        13863,  13488,   3307,   8217,   2265,   2076,   4551,  10112,
         9956,   1867,   4120,    132,    915,   5175, 166528,   1344,
        14400,   9344,  19318,    233,   1274,    280,    780,    780,
         1080,   1469,    500,  70995,  70995, 104520, 116334,   5970,
        36954,   7200,   3000, 183682,  63440,  16850,  20434,    495,
         3500,  12096,   2170, 132907,  49908, 109300,   2786, 110506,
           27,  27430,  18670,   4380,    224,   4904,   9704,  27465,
        16341,  36900,  17115,  17004,   2656,  34126,    156,   9200,
          218,   1136,  10000,    700,    300,    242,    720,    108,
          400,    341,    282,    639,   1500,    880,    240,   1152,
          300, 230665,     50,   3000,    144,    300,    144,    750,
         2100,    100,    108,    120,    500,    651,    400,    400,
          400,    600,    478,    665, 105000,   9420,   1315,      0,
            0,   4895,   1290,     10,  27500,    512,    800,    400,
        23040,  43000,    185,   2464,    180])

Now let's find out the 10 most common square footages.



In [70]:

    
Counter(data['Square Footage']).most_common(10)









    Out[70]:





[('144', 360),
 ('20', 143),
 ('75', 123),
 ('100', 86),
 ('2400', 77),
 ('228', 76),
 ('80', 71),
 ('560', 65),
 ('400', 65),
 ('2800', 65)]

Huh, that's odd! There's a lot of buildings that are all about 12"x12". Let's see more about them, and find out which agencies they are in.



In [71]:

    
agencies = Counter()
for agency, sqfoot in zip(data['Agency Name'], data['Square Footage']):
    if int(sqfoot) == 144:
        agencies[agency] += 1



In [72]:

    
agencies









    Out[72]:





Counter({'Department of Corrections': 33,
         'Department of Human Services': 1,
         'Department of Military Affairs': 1,
         'Department of Natural Resources': 302,
         'Department of State Police': 4,
         'Department of Transportation': 14,
         'Historic Preservation Agency': 4,
         'Southern Illinois University': 1})

Interesting. Lots of Department of Natural Resources. I bet these are picnic bench shelters!

Now let's get the year acquired in ints.



In [73]:

    
year_acquired = np.array(data['Year Acquired'], dtype='int')

We now need to set up our matplotlib plotting.



In [74]:

    
%matplotlib inline



In [75]:

    
import matplotlib.pyplot as plt

And, let's make our first plot! We will just pass the two arrays in, and every thing that could possibly go wrong will!



In [76]:

    
plt.plot(year_acquired, square_footage)









    Out[76]:





[<matplotlib.lines.Line2D at 0x7f4a089aeac8>]

Alright, let's try that again. This time, let's make it ever-so-slightly better. We'll use circle markers and we'll not connect the lines.

One item to note here is that these circle markers are set in plot coordinates, not data coordinates. This means that the relative size, overlap, etc, will all be related to the plot characteristics. This is not ideal.



In [77]:

    
plt.plot(year_acquired, square_footage, 'og')









    Out[77]:





[<matplotlib.lines.Line2D at 0x7f4a088ea320>]

We see some obvious outliers in this plot. Let's take a look and see if we can clean up the data a bit. We will use indexing by boolean arrays here.



In [78]:

    
square_footage == 144









    Out[78]:





array([ True,  True,  True, ..., False, False, False], dtype=bool)



In [79]:

    
year_acquired == 0









    Out[79]:





array([False, False, False, ..., False, False, False], dtype=bool)



In [80]:

    
good = (year_acquired > 0)



In [81]:

    
year_acquired[good].min()









    Out[81]:





1753



In [82]:

    
np.where(year_acquired == 1753)









    Out[82]:





(array([2799]),)



In [83]:

    
for h in header:
    print("{}: {}".format(h, data[h][2799]))









    



Agency Name: Historic Preservation Agency
Location Name: Fort De Chartres Historic Site - Prairie Du Rocher
Address: 1350 State Hwy 155
City: Prairie Du Rocher
Zip code: 62241
County: Randolph
Congress Dist: 12
Congressional Full Name: Mike Bost
Rep Dist: 116
Rep Full Name: Costello, II Jerry F.
Senate Dist: 58
Senator Full Name: David S. Luechtefeld
Bldg Status: In Use
Year Acquired: 1753
Year Constructed: 1753
Square Footage: 1200
Total Floors: 1
Floors Above Grade: 1
Floors Below Grade: 0
Usage Description: Assembly
Usage Description 2: Assembly
Usage Description 3: Not provided

Alright, we've done a bit of cleaning, seen that the state owns a building from 1753, and we can make some more plots. We'll also do some scale modification here.



In [84]:

    
plt.plot(year_acquired[good], square_footage[good], '.g')
plt.title("State Buildings")
plt.xlabel("Year Acquired")
plt.ylabel("Square Footage")
plt.yscale("log")

Let's pull out all the "zero square footage" bits, as well. We'll use a logical operation to do this.



In [85]:

    
good_sqf = square_footage > 0.0
gpos = good & good_sqf

The hexbin plot is our next step -- this shows the density of plots at any given point. We will make it relatively coarse, with 32 bins on each axis.



In [86]:

    
plt.clf()
plt.hexbin(year_acquired[gpos], square_footage[gpos], yscale='log', bins='log', gridsize=32, cmap='viridis')
plt.title("State Buildings")
plt.xlabel("Year Acquired")
plt.ylabel("Square Footage")
plt.colorbar()
plt.yscale("log")
fig = plt.gcf()

We can now make modifications to the plots to change different aspects. Let's get our first figure.



In [87]:

    
fig.axes









    Out[87]:





[<matplotlib.axes._subplots.AxesSubplot at 0x7f4a08701898>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7f4a08562fd0>]



In [88]:

    
ax = fig.axes[0]
fig









    Out[88]:

We will work more with ticks later, but for now, we'll just experiment a little bit with them and how they can be modified. Let's first see the current locations.



In [89]:

    
for xtick in ax.xaxis.majorTicks:
    print(xtick.get_loc())



In [90]:

    
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
fig









    Out[90]:



In [91]:

    
ax.xaxis.set_visible(True)
ax.yaxis.set_visible(True)
fig









    Out[91]:

As an example, we can also turn off our major ticks. We'll pick up from this to see more plot modifications next week! Note here that we're also setting them en masse rather than modifying in-place.



In [92]:

    
new_ticks = []
for tick in ax.yaxis.majorTicks:
    tick.set_visible(False)
    new_ticks.append(tick)
ax.yaxis.majorTicks = new_ticks
fig









    Out[92]:



In [ ]: