chorogrid tutorial, part 2:

Chorogrid class

This class can be used independently of the Colorbin class shown in part 1, and vice-versa, but they work well together.



In [1]:

    
# import the classes
from chorogrid import Colorbin, Chorogrid

# read the docs
help(Chorogrid)









    



Help on class Chorogrid in module chorogrid.Chorogrid:

class Chorogrid(builtins.object)
 |  An object which makes choropleth grids, instantiated with:
 |      csv_path: the path to a csv data file with the following columns:
 |          * ids: e.g., states or countries, corresponding to
 |                 the Colorbin.colorlist
 |          * coordinates or path
 |      ids: a listlike object of ids corresponding to colors
 |      colors: a listlike object of colors in hex (#123456) format
 |              corresponding to ids
 |      id_column: the name of the column in csv_path containing ids
 |                 if there is not a 1:1 map between the ids object
 |                 and the contents of id_column, you will be warned
 |      
 |  Methods (introspect to see arguments)
 |     set_colors: pass a new list of colors to replace the one
 |                 used when the class was instantiated
 |     set_title: set a title for the map
 |     set_legend: set a legend
 |     add_svg(): add some custom svg code. This must be called
 |                after the draw_... method, because it needs to know
 |                the margins.
 |     
 |     draw_squares(): draw a square grid choropleth
 |     draw_hex(): draw a hex-based choropleth
 |     draw_multihex: draw a multiple-hex-based choropleth
 |     draw_map: draw a regular, geographic choropleth
 |     
 |     show(): display the result in IPython notebook
 |  
 |  Methods defined here:
 |  
 |  __init__(self, csv_path, ids, colors, id_column='abbrev')
 |  
 |  add_svg(self, text, offset=[0, 0])
 |      Adds svg text to the final output". Can be called more than once.
 |  
 |  done(self, show=True, save_filename=None)
 |      if show == True, displays the svg in IPython notebook. If save_filename
 |      is specified, saves svg file
 |  
 |  draw_hex(self, x_column='hex_x', y_column='hex_y', **kwargs)
 |      Creates an SVG file based on a hexagonal grid, with coordinates 
 |      from the specified columns in csv_path (specified when Chorogrid class
 |      initialized).
 |      
 |      Note on kwarg dicts: defaults will be used for all keys unless 
 |      overridden, i.e. you don't need to state all the key-value pairs.
 |      
 |      kwarg: font_dict
 |          default: {'font-style': 'normal', 'font-weight': 'normal', 
 |                    'font-size': '12px', 'line-height': '125%', 
 |                    'text-anchor': 'middle', 'font-family': 'sans-serif', 
 |                    'letter-spacing': '0px', 'word-spacing': '0px', 
 |                    'fill-opacity': 1, 'stroke': 'none', 
 |                    'stroke-width': '1px', 'stroke-linecap': 'butt', 
 |                    'stroke-linejoin': 'miter', 'stroke-opacity': 1,
 |                    'fill': '#000000'}
 |                    
 |      kwarg: spacing_dict
 |          default: {'margin_left': 30,  'margin_top': 60,  
 |                    'margin_right': 40,  'margin_bottom': 20,  
 |                    'cell_width': 40,  'title_y_offset': 30,  
 |                    'name_y_offset': 15,  'stroke_width': 0
 |                    'gutter': 1,  'stroke_color': '#ffffff',  
 |                    'missing_color': '#a0a0a0',
 |                    'legend_offset': [0, -10]}
 |                    
 |      kwarg: font_colors
 |          default: "#000000"
 |          if specified, must be either listlike object of colors 
 |          corresponding to ids, a dict of hex colors to font color, or a 
 |          string of a single color.
 |  
 |  draw_map(self, path_column='map_path', **kwargs)
 |      Creates an SVG file based on SVG paths delineating a map, 
 |          with paths from the specified columns in csv_path 
 |          (specified when Chorogrid class initialized).
 |      
 |      Note on kwarg dict: defaults will be used for all keys unless 
 |      overridden, i.e. you don't need to state all the key-value pairs.
 |      
 |      Note that the map does not have an option for font_dict, as
 |      it will not print labels.
 |                    
 |      kwarg: spacing_dict
 |          # Note that total_width and total_height will depend on where 
 |          # the paths came from.
 |          # For the USA map included with this python module,
 |          # they are 959 and 593.
 |          default: {'map_width': 959, 'map_height': 593,
 |                      'margin_left': 10,  'margin_top': 20,  
 |                      'margin_right': 80,  'margin_bottom': 20,  
 |                      'title_y_offset': 45,
 |                      'stroke_color': '#ffffff', 'stroke_width': 0.5, 
 |                      'missing_color': '#a0a0a0',
 |                      'legend_offset': [0, 0]}
 |  
 |  draw_multihex(self, x_column='fourhex_x', y_column='fourhex_y', contour_column='fourhex_contour', x_label_offset_column='fourhex_label_offset_x', y_label_offset_column='fourhex_label_offset_y', **kwargs)
 |      Creates an SVG file based on a hexagonal grid, with contours
 |          described by the following pattern:
 |              a: up and to the right
 |              b: down and to the right
 |              c: down
 |              d: down and to the left
 |              e: up and to the left
 |              f: up
 |      
 |      Note on kwarg dicts: defaults will be used for all keys unless 
 |      overridden, i.e. you don't need to state all the key-value pairs.
 |      
 |      kwarg: font_dict
 |          default: {'font-style': 'normal', 'font-weight': 'normal', 
 |                    'font-size': '12px', 'line-height': '125%', 
 |                    'text-anchor': 'middle', 'font-family': 'sans-serif', 
 |                    'letter-spacing': '0px', 'word-spacing': '0px', 
 |                    'fill-opacity': 1, 'stroke': 'none', 
 |                    'stroke-width': '1px', 'stroke-linecap': 'butt', 
 |                    'stroke-linejoin': 'miter', 'stroke-opacity': 1,
 |                    'fill': '#000000'}
 |                    
 |      kwarg: spacing_dict
 |          default: {'margin_left': 30,  'margin_top': 60,  
 |                    'margin_right': 40,  'margin_bottom': 20,  
 |                    'cell_width': 30,  'title_y_offset': 30,  
 |                    'name_y_offset': 15,  'stroke_width': 1
 |                    'stroke_color': '#ffffff',  'missing_color': '#a0a0a0',
 |                    'legend_offset': [0, -10]}
 |          (note that there is no gutter)
 |                    
 |      kwarg: font_colors
 |          default = "#000000"
 |          if specified, must be either listlike object of colors 
 |          corresponding to ids, a dict of hex colors to font color, or a 
 |          string of a single color.
 |  
 |  draw_squares(self, x_column='square_x', y_column='square_y', **kwargs)
 |      Creates an SVG file based on a square grid, with coordinates from 
 |      the specified columns in csv_path (specified when Chorogrid class
 |      initialized).
 |      
 |      Note on kwarg dicts: defaults will be used for all keys unless
 |      overridden, i.e. you don't need to state all the key-value pairs.
 |      
 |      kwarg: font_dict
 |          default: {'font-style': 'normal', 'font-weight': 'normal', 
 |                    'font-size': '12px', 'line-height': '125%', 
 |                    'text-anchor': 'middle', 'font-family': 'sans-serif', 
 |                    'letter-spacing': '0px', 'word-spacing': '0px', 
 |                    'fill-opacity': 1, 'stroke': 'none', 
 |                    'stroke-width': '1px', 'stroke-linecap': 'butt', 
 |                    'stroke-linejoin': 'miter', 'stroke-opacity': 1,
 |                    'fill': '#000000'}
 |                    
 |      kwarg: spacing_dict
 |          default: {'margin_left': 30,  'margin_top': 60,  
 |                    'margin_right': 40,  'margin_bottom': 20,  
 |                    'cell_width': 40,  'title_y_offset': 30,  
 |                    'name_y_offset': 15,  'roundedness': 3,  
 |                    'gutter': 1,  'stroke_color': '#ffffff',  
 |                    'stroke_width': 0, 'missing_color': '#a0a0a0',
 |                    'legend_offset': [0, -10]}
 |                    
 |      kwarg: font_colors
 |          default = "#000000"
 |          if specified, must be either listlike object of colors 
 |          corresponding to ids, a dict of hex colors to font color, or a 
 |          string of a single color.
 |  
 |  set_colors(self, colors)
 |      change colors list specified when Chorogrid is instantiated
 |  
 |  set_legend(self, colors, labels, title=None, width='square', height=100, gutter=2, stroke_width=0.5, label_x_offset=2, label_y_offset=3, stroke_color='#303030', **kwargs)
 |      Creates a legend that will be included in any draw method.
 |      * width can be the text "square" or a number of pixels.
 |      * a gradient can be made with a large number of colors, and ''
 |        for each label that is not specified, and non-square width
 |      * height does not include title
 |      * if len(labels) can be len(colors) or len(colors)+1; the labels
 |        will be aside the boxes, or at the interstices/fenceposts, 
 |        respectively; alternately, if len(labels) == 2, two fenceposts
 |        will be assigned
 |      
 |      kwarg: font_dict
 |          default: {'font-style': 'normal', 'font-weight': 'normal', 
 |                    'font-size': '12px', 'line-height': '125%', 
 |                    'text-anchor': 'left', 'font-family': 'sans-serif', 
 |                    'letter-spacing': '0px', 'word-spacing': '0px', 
 |                    'fill-opacity': 1, 'stroke': 'none', 
 |                    'stroke-width': '1px', 'stroke-linecap': 'butt', 
 |                    'stroke-linejoin': 'miter', 'stroke-opacity': 1,
 |                    'fill': '#000000'}
 |  
 |  set_title(self, title, **kwargs)
 |      Set a title for the grid
 |      kwargs:
 |           font_dict
 |           default = {'font-style': 'normal', 'font-weight': 'normal', 
 |                 'font-size': '21px', 'line-height': '125%', 
 |                 'text-anchor': 'middle', 'font-family': 'sans-serif', 
 |                 'letter-spacing': '0px', 'word-spacing': '0px', 
 |                 'fill-opacity': 1, 'stroke': 'none', 
 |                 'stroke-width': '1px', 'stroke-linecap': 'butt', 
 |                 'stroke-linejoin': 'miter', 'stroke-opacity': 1,
 |                 'fill': '#000000'}
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

We'll load some sample data that reports the number of people living in the same home as one year ago per state in the United States, and use Colorbin to associate these numbers to colors. See Tutorial 1 for information on how Colorbin works.



In [2]:

    
mycolors = ['#b35806', '#f1a340', '#fee0b6', '#d8daeb', '#998ec3', '#542788']
import pandas as pd
df = pd.read_csv('chorogrid/sample_data/sample_state_data.csv')
mybin = Colorbin(df['Percent_living_in_same_home_as_one_year_ago'], mycolors, proportional=True, decimals=None)
mybin.set_decimals(1)
mybin.recalc(fenceposts=True)
mybin.calc_complements(0.5, '#e0e0e0', '#101010')

Here's the tail of the dataframe.



In [3]:

    
df.tail()









    Out[3]:






  
    
      
      state
      Percent_living_in_same_home_as_one_year_ago
    
  
  
    
      46
      CA
      84.2
    
    
      47
      AZ
      80.4
    
    
      48
      AR
      83.6
    
    
      49
      AL
      85.0
    
    
      50
      AK
      80.3

Now here's a look at all the objects we'll use to make further maps, lists that are either the length of the number of observations (51) or the number of colors (6).



In [4]:

    
states = list(df.state)
colors_by_state = mybin.colors_out
font_colors_by_state = mybin.complements
legend_colors = mybin.colors_in
legend_labels = mybin.labels

for lst in ['states', 'colors_by_state', 'font_colors_by_state', 'legend_colors', 'legend_labels']:
    obj = eval(lst)
    print("{:>20}: len {:2}: {}...".format(lst, len(obj), obj[:3]))









    



              states: len 51: ['WY', 'WV', 'WI']...
     colors_by_state: len 51: ['#fee0b6', '#542788', '#d8daeb']...
font_colors_by_state: len 51: ['#101010', '#e0e0e0', '#101010']...
       legend_colors: len  6: ['#b35806', '#f1a340', '#fee0b6']...
       legend_labels: len  6: ['77.7-79.8', '79.8-81.8', '81.8-83.9']...

Here is the database within Chorogrid for U.S. states, containing identifying information and instructions for hex, square, multihex and map choropleths.



In [5]:

    
_ = pd.read_csv('chorogrid/databases/usa_states.csv')
print(_.iloc[0])









    



abbrev                                                                   AK
full_name                                                            Alaska
long_abbrev                                                           Alas.
FIPS                                                                      2
pop                                                                  710231
sqmi                                                               663267.3
map_path                  m 135.58488,358.02208 -0.24846,65.59232 1.2422...
map_fill_default                                                          2
map_label_x                                                        99.76261
map_label_y                                                        398.1729
map_label_text_anchor                                                middle
map_label_line_path                                                     NaN
altmap_path               m 151.26632,459.09682 -0.31386,83.24785 1.5692...
square_x                                                                  0
square_y                                                                  0
altsquare_x                                                               0
altsquare_y                                                               0
hex_x                                                                     1
hex_y                                                                     0
althex_x                                                                  0
althex_y                                                                  0
fourhex_x                                                                 2
fourhex_y                                                                 1
fourhex_contour                                              ababcdcdedefaf
fourhex_label_offset_x                                                 0.25
fourhex_label_offset_y                                                  0.5
Name: 0, dtype: object

And here's a help file with descriptions of all the columns in the cell above.



In [6]:

    
with open('chorogrid/databases/usa_states_column_descriptions.txt') as f:
    print(f.read())









    



abbrev                       Postal abbreviation for 50 states and D.C.
full_name                    Full name
long_abbrev                  Abbreviation, based on but not identical to recommendations of Associated Press
FIPS                         Federal Information Processing Standards
pop                          Population in 2013
sqmi                         Area in square miles

map_path                     SVG path for geographic map
map_fill_default             Number, 1-4, so that no states sharing a border will have same fill
map_label_x                  X-coordinate for map label, e.g. state name
map_label_y                  Y-coordinate for map label
map_label_text_anchor        Text anchor (start, middle, end) for label
map_label_line_path          Path for line connecting state and label, if applicable

altmap_path                  Alternate SVG path, without labels

square_x                     Horizontal position of square grid
square_y                     Vertical position of square grid

altsquare_x                  Alternate horizontal position of square grid
altsquare_y                  Alternate vertical position of square grid

hex_x                        Horizontal position of hex grid
hex_y                        Vertical position of hex grid

fourhex_x                    Horizontal position of topmost, then leftmost, hex in four-hex multihex layout
fourhex_y                    Vertical position of topmost, then leftmost, hex in four-hex multihex layout
fourhex_contour              Contour of four-hex layout: a=up&right, b=down&right, c=down, d=down&left, e=up&left, f=up
fourhex_label_offset_x       Horizontal offset of label, in terms of hex width
fourhex_label_offset_y       Vertical offset of label, in terms of hex width



In [7]:

    
cg = Chorogrid('chorogrid/databases/usa_states.csv', states, colors_by_state)
cg.set_title('% Living at same address as one year ago', font_dict={'font-size': 19})
cg.set_legend(legend_colors, legend_labels, title='% of population')
cg.draw_squares(spacing_dict={'margin_right': 150}) # otherwise legend will be cut off
    # another strategy would be to pass a legend_offset to spacing_dict
cg.done(show=True)

Note that it's very difficult to see the text in the darkest-colored states. Luckily we've create a list of font colors based on Colorbin's complement method. Let's rerun the last two lines of the cell above, but with font colors specified.



In [8]:

    
cg.draw_squares(spacing_dict={'margin_right': 150}, font_colors=font_colors_by_state)
cg.done(show=True)

Here's an alternate layout of squares.



In [9]:

    
cg.draw_squares(x_column='altsquare_x', y_column='altsquare_y', spacing_dict={'margin_right': 150},
                font_colors=font_colors_by_state)
cg.done(show=True)

And here's a hex layout.



In [10]:

    
cg.draw_hex(spacing_dict={'margin_right': 150}, font_colors=font_colors_by_state)
cg.done(show=True)

And an alternate hex layout



In [11]:

    
cg.draw_hex(x_column='althex_x', y_column='althex_y', spacing_dict={'margin_right': 150},
            font_colors=font_colors_by_state)
cg.done(show=True)

And a traditional choropleth map.



In [12]:

    
cg = Chorogrid('chorogrid/databases/usa_states.csv', states, colors_by_state)
cg.draw_map(spacing_dict={'legend_offset': [-150,-25]})
cg.done(show=True)

# To Do: Add state names. The required data is there in the database, but hasn't been implemented in code.

And a fancy one where states are represented by four hexes each.



In [13]:

    
cg.draw_multihex()
cg.done(show=True)

There are currently three databases in Chorogrid: USA by state, as seen above, USA by county, which we'll look at ehre, and finally Europe by country, which will be shown below. FIPS (Federal Information Processing Codes) standards are used to identify counties.



In [14]:

    
df = pd.read_csv('chorogrid/sample_data/sample_county_data.csv', encoding='latin-1')

Let's have a look at the data. There are 3143 counties (including the 0-indexed one): counties are redefined every few years, this is the most recent count as of June 2015. If you have older data or if the borders change after that date, the mapping between data and map may not be perfect.



In [15]:

    
df.tail()









    Out[15]:






  
    
      
      County and state name
      fips
      Median_value_of_owner-occupied_housing_units_2009-2013
    
  
  
    
      3138
      Yuma County, AZ
      4027
      118000
    
    
      3139
      Yuma County, CO
      8125
      136600
    
    
      3140
      Zapata County, TX
      48505
      55700
    
    
      3141
      Zavala County, TX
      48507
      39900
    
    
      3142
      Ziebach County, SD
      46137
      70100

Use colorbin to divide the quantities into six bins



In [16]:

    
mybin = Colorbin(df['Median_value_of_owner-occupied_housing_units_2009-2013'], mycolors, proportional=False, decimals=None)
mybin.fenceposts









    Out[16]:





[0, 76400, 90500, 108700, 136200, 176800, 929700]

Reset the fenceposts in order to have whole numbers



In [17]:

    
mybin.fenceposts = [0, 50000, 100000, 150000, 250000, 500000, 1000000]
mybin.recalc(False)
mybin.labels









    Out[17]:





['0-50000',
 '50000-100000',
 '100000-150000',
 '150000-250000',
 '250000-500000',
 '500000-1000000']

Here are the columns for the counties database. Note that it only has maps, not squares or hexes or multihexes (for now, anyway).



In [18]:

    
with open('chorogrid/databases/usa_counties_column_descriptions.txt', 'r', encoding='utf-8') as f:
    print(f.read())









    



name          County name, e.g. "Santa Barbara"
state         2-letter state abbreviation, e.g. CA
map_path      SVG path for county outline
fips          Federal Information Processing Standards code for county in string format, e.g. "06083"
fips_integer  Federal Information Processing Standards code for county in integer format, e.g. 6083
middle_x      horizontal coordinate of center of county; not used for anything in this script, but provided just in case it's useful
middle_y      vertical coordinate of center of county; not used for anything in this script, but provided just in case it's useful

And here's the map.



In [19]:

    
cg = Chorogrid('chorogrid/databases/usa_counties.csv', df.fips, mybin.colors_out, 'fips_integer')
cg.set_title('Median value of owner-occupied housing units, 2009-2013', font_dict={'font-size': 19})
cg.set_legend(mybin.colors_in, mybin.labels, title='US dollars')
cg.draw_map(spacing_dict={'legend_offset':[-300,-200], 'margin_top': 50}) # otherwise legend will be cut off
    # another strategy would be to pass a legend_offset to spacing_dict
cg.done(show=True, save_filename='eraseme')

Note that there are no state borders in the above graph; we can add it by adding the statelines.txt file, which has the appropriate path descriptions.



In [20]:

    
with open('chorogrid/databases/usa_counties_statelines.txt', 'r') as f:
    statelines = f.read()
cg.add_svg(statelines)
cg.done(show=True, save_filename='eraseme')

### Known bug: if you run this cell again, the borders will be offset.
### For now, you'll have to restart the kernel to get the state borders in the right place.

And now Europe. Our source data has two- and three-letter county abbreviations; for now, the database only has two-letter abbreviations.



In [21]:

    
with open('chorogrid/databases/europe_countries_column_descriptions.txt', 'r', encoding='utf-8') as f:
    print(f.read())









    



abbrev                       Two-letter ISO country code
full_name                    Full name

map_path                     SVG path for geographic map
map_fill_default             Number, 1-4, so that no countries sharing a border will have same fill
map_label_x                  X-coordinate for map label, e.g. country name
map_label_y                  Y-coordinate for map label
map_label_text_anchor        Text anchor (start, middle, end) for label
map_label_line_path          Path for line connecting country and label, if applicable

hex_x                        Horizontal position of hex grid
hex_y                        Vertical position of hex grid



In [22]:

    
df = pd.read_csv('chorogrid/sample_data/sample_europe_data.csv', encoding='latin-1')
df.tail()









    Out[22]:






  
    
      
      Country
      abbrev2
      abbrev3
      Pct Internet users
    
  
  
    
      48
      Switzerland
      CH
      CHE
      89.1
    
    
      49
      Turkey
      TR
      TUR
      56.7
    
    
      50
      Ukraine
      UA
      UKR
      41.8
    
    
      51
      United Kingdom
      GB
      GBR
      89.8
    
    
      52
      Vatican City State
      VU
      VUT
      57.0

And here's the choropleth.



In [23]:

    
mybin = Colorbin(df['Pct Internet users'], mycolors, proportional=False, decimals=None)
mybin.fenceposts = [0, 40, 60, 70, 80, 90, 100]
mybin.recalc(False)
font_colors_europe = mybin.complements

cg = Chorogrid('chorogrid/databases/europe_countries.csv', df.abbrev2, mybin.colors_out, 'abbrev')
cg.set_title('% Internet users', font_dict={'font-size': 19})
cg.set_legend(mybin.colors_in, mybin.labels, title='% of population')
cg.draw_map(spacing_dict={'legend_offset':[-200,-100], 'margin_top': 50}) # otherwise legend will be cut off
    # another strategy would be to pass a legend_offset to spacing_dict
cg.done(show=True)









    



WARNING: The following are not recognized ids: {'SS', 'KR', 'GG', 'JE', 'VU', 'MO', 'FO', 'GI', 'FM'}
WARNING: The following ids in the csv are not included: {'KS', 'AM', 'MD', 'GE', 'AZ', 'MK', 'VA', 'RU'}

Note that in this case there was not a perfect mapping of country abbreviations between the source data and the database. Some small 'countries' like the Faroe Islands [FO] (which is part of the Danish Realm but is self-governing) are not in the database, while several countries including Russia [RU] are not in the data.

Finally, here's the hex grid:



In [24]:

    
cg.draw_hex(spacing_dict={'legend_offset':[0, -100], 'margin_top': 50, 'margin_right': 200})
cg.done(show=True)

# Known issue: font_colors does not work for this Europe hex map.



In [ ]:

	County and state name	fips	Median_value_of_owner-occupied_housing_units_2009-2013
3138	Yuma County, AZ	4027	118000
3139	Yuma County, CO	8125	136600
3140	Zapata County, TX	48505	55700
3141	Zavala County, TX	48507	39900
3142	Ziebach County, SD	46137	70100

	Country	abbrev2	abbrev3	Pct Internet users
48	Switzerland	CH	CHE	89.1
49	Turkey	TR	TUR	56.7
50	Ukraine	UA	UKR	41.8
51	United Kingdom	GB	GBR	89.8
52	Vatican City State	VU	VUT	57.0