chorogrid tutorial, part 1:

Colorbin class

This class can be used independently of the Chorogrid class shown in part 2, and vice-versa, but they work well together.

This allows you to input a parallel list of ids (e.g. state abbreviations) and quantities (e.g. unemployment rate), and it will output a list of colors


In [6]:
# import the class
from chorogrid import Colorbin

# read the docs
help(Colorbin)


Help on class Colorbin in module chorogrid.Colorbin:

class Colorbin(builtins.object)
 |  Instantiate with a list of quantities and colors, then retrieve 
 |  the following attributes:
 |  .colors_out : output list of colors, same length as quantities
 |  .fenceposts : divisions between bins
 |  .labels: one per color
 |  .fencepostlabels: one per fencepost
 |  .complements: list of colors, see set_complements, below
 |  
 |  attributes that can be changed:
 |  .proportional : if True, all bins have fenceposts same distance
 |                  apart (with default bin_min, bin_mid and bin_max)
 |                : if False, all bins have (insofar as possible) the same
 |                  number of members
 |                : note that this can break if not every quantity is 
 |                  unique
 |  .bin_min, .bin_max, .bin_mid
 |  .decimals : if None, no rounding; otherwise round to this number
 |  
 |  methods:
 |  .set_decimals(n): just what it sounds like
 |  .recalc(fenceposts=True): recalculate colors (and fenceposts, if True)
 |   based on attributes
 |  .calc_complements(cutoff [between 0 and 1], color_below, color_above):
 |      if the greyscale color is below the cutoff (i.e. darker),
 |      complement is assigned color_below, otherwise color_above.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, quantities, colors_in, proportional=True, decimals=None)
 |  
 |  calc_complements(self, cutoff, color_below, color_above)
 |  
 |  count_bins(self)
 |  
 |  recalc(self, fenceposts=True)
 |  
 |  set_decimals(self, decimals)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

You'll need to supply a list of colors in hex format; I recommend the library colorlover. For now, I'll just enter a diverging list of six colors, ranging from dark brown to light brown/blue to dark blue.

With colorlover installed, they were produced with the following code:

import colorlover as cl
colors = cl.scales['6']['div']['PuOr']
def rgb2hex(list_of_tuples):
    output = []
    for r, g, b in list_of_tuples:
        output.append("#{0:02x}{1:02x}{2:02x}".format(int(r), int(g), int(b)))
    return output
colors = rgb2hex(cl.to_numeric(colors))

In [7]:
mycolors = ['#b35806', '#f1a340', '#fee0b6', '#d8daeb', '#998ec3', '#542788']

We also need some data. I've prepared one column from the U.S. Census


In [8]:
import pandas as pd
df = pd.read_csv('chorogrid/sample_data/sample_state_data.csv')
df.tail()


Out[8]:
state Percent_living_in_same_home_as_one_year_ago
46 CA 84.2
47 AZ 80.4
48 AR 83.6
49 AL 85.0
50 AK 80.3

Now we can instantiate the Colorbin class with the quantity column in the dataframe, plus our list of six colors. Note that you don't have to use pandas, you could pass two lists instead.


In [9]:
# I've stated the proportional and decimals args, but left them at their defaults for now.
mybin = Colorbin(df['Percent_living_in_same_home_as_one_year_ago'], mycolors, proportional=True, decimals=None)

Colorbin has automatically chosen six 'bins' representing the range of quantities. You can retrieve them with the .fenceposts attribute. Note that because there are six bins in this case, there are seven fenceposts: bin #1 is fencepost 0-1, bin #2 is 1-2 ... bin #6 is 6-7.


In [10]:
mybin.fenceposts


Out[10]:
[77.700000000000003,
 79.766666666666666,
 81.833333333333343,
 83.900000000000006,
 85.966666666666669,
 88.033333333333331,
 90.099999999999994]

Note that these are floats. To see what the class will output as text that can be used in a map legend, use .fencepostlabels


In [11]:
mybin.fencepostlabels


Out[11]:
['77.7',
 '79.7666666667',
 '81.8333333333',
 '83.9',
 '85.9666666667',
 '88.0333333333',
 '90.1']

We also have a list of colors the same length as the quantities we passed to the constructor, corresponding to the fenceposts.


In [12]:
print(mybin.colors_out[:10]+ ['...'])


['#f1a340', '#542788', '#d8daeb', '#fee0b6', '#998ec3', '#d8daeb', '#fee0b6', '#fee0b6', '#d8daeb', '#fee0b6', '...']

Back to the fenceposts. That's a lot of decimals, let's change it.

After changing an attribute, the output fenceposts/labels and/or colors will not be changed until the .recalc() method is called. Calling .recalc(fenceposts=False) just recalculates the colors (which would have little or no effect in this case, since we are just changing the decimals, so only colors right near a fencepost might change); calling .recalc(True) recalculates the fenceposts, and then the colors.


In [13]:
mybin.set_decimals(1)
mybin.recalc(fenceposts=True)
mybin.fencepostlabels


Out[13]:
['77.7', '79.8', '81.8', '83.9', '86.0', '88.0', '90.1']

You can see how many ids fall in each bin with the .count_bins() method.


In [14]:
mybin.count_bins()


count  label
=====  =====
    1  77.7-79.8
    4  79.8-81.8
   16  81.8-83.9
   17  83.9-86.0
    8  86.0-88.0
    5  88.0-90.1

If you want the fenceposts to be easier-to-read whole numbers, or any specific numbers, just reassign the .fenceposts attribute.


In [15]:
mybin.fenceposts = [77,80,82,84,86,88,91]
mybin.recalc(fenceposts=False) # if this were true, the fenceposts would be changed back to what they were originally
mybin.count_bins()


count  label
=====  =====
    1  77-80
    5  80-82
   15  82-84
   17  84-86
    8  86-88
    5  88-91

If you want each bin to contain(approximately) the same number of ids, you can pass proportional=False to the constructor. Note that having a choropleth with unequally-sized bins can be misleading to the viewer.


In [16]:
mybin = Colorbin(df['Percent_living_in_same_home_as_one_year_ago'], mycolors, proportional=False, decimals=None)

In [17]:
mybin.count_bins()


count  label
=====  =====
    8  77.7-82.7
    7  82.7-83.6
    8  83.6-84.7
   11  84.7-85.6
    8  85.6-86.6
    9  86.6-90.1

Finally, there is the function calc_complements that can be used to ensure text that overlays a choropleth is visible. Choose a level of 'darkness' of the greyscale version of the colors (0 is black, 1 is white), a font color for dark backgrounds and a font color for light backgrounds.


In [18]:
mybin.calc_complements(0.5, '#ffffff', '#000000')
for i, (color, complement) in enumerate(zip(mybin.colors_out, mybin.complements)):
    if i<10:
        print(color, complement)


#b35806 #ffffff
#542788 #ffffff
#998ec3 #000000
#f1a340 #000000
#542788 #ffffff
#d8daeb #000000
#f1a340 #000000
#f1a340 #000000
#fee0b6 #000000
#fee0b6 #000000

In [ ]: