make category map

This tutorial is going to make a map with a grid-shape polygon shapefile that has a column name "com", which contained the category of each cell.

The grids are the 500mx500m grids that cover the Tokyo 23 special wards.

In this tutorials, the following functions will be covered:

  1. mpoly.prepare_map: for preparing the matplotlib figures+ax for drawing map (equal aspect, geometry context, background...)
  2. mpoly.map_shape: simply map the shape of the polygon in the shapefile (just like what you get when you add a layer into qgis/arcmap)
  3. mpoly.map_category: map a column with the name of the category it belongs to
  4. mpoly.map_colour: map a column that include the color code (e.g. hexcode)
  5. mpoly.add_border: add the polygon with none colour, that is, just the border of the polygon
  6. mpoly.add_label: add the label (from a column) to the polygons

Lets start mapping!

First, import things that is needed for the following steps.


In [1]:
import geopandas as gpd # for reading and manupulating shapefile
import matplotlib.pyplot as plt # for making figure

import colouringmap.mapping_polygon as mpoly # for making maps

# magic line for matlotlib figure to be shown inline in jupyter cell
%matplotlib inline

In [2]:
from palettable.colorbrewer.qualitative import Dark2_7 # to get the colormap for more custom manupulation in the last step

Second, read the file, and take a look on the attribute table of the shapefile.


In [3]:
grid_res = gpd.read_file('data/community_results.shp')
grid_res.head()


Out[3]:
com geometry node tweets usercount xcor ycor
0 14 POLYGON ((175239.9457184017 3947195.841823581,... 0 1 1 139.939807 35.640542
1 56 POLYGON ((175239.9457767347 3947695.841815081,... 1 0 0 139.939919 35.645048
2 1 POLYGON ((142239.9457464929 3956695.841823446,... 10 35 21 139.576848 35.731640
3 18 POLYGON ((144239.9457266586 3959695.841818351,... 100 40 32 139.599535 35.758373
4 4 POLYGON ((154239.9457194024 3947195.841822605,... 1000 1898 660 139.707733 35.644166

Then, start playing with the shapefile. The following show how to prepare the figure before drawing the map on the figure.

The first line in the following cell is just a standard way to create a matplotlib figure, along with an "ax".
The second line prepare the "ax" for mapping things. Setting map_context to the context of the shapefile is to make sure the shapefile is within the figure, so it will be shown within the figure.

The figure is just a matplotlib figure & ax, so you can also set the map context manually by something like this:

ax.set_xlim([minx, maxx])
ax.set_ylim([miny, maxy])

In [4]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)


The following show how to create a map with just the shape of the polygons with a same colour.


In [5]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_shape(grid_res, ax, lw=.1, alpha=.7)


map the categories

So, lets try to map the 'com' column to colours.

use mpoly.map_category function, throw in the geopandas gdf, the column name, and the ax.


In [6]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_category(grid_res, 'com', ax)


!!!
number of colour is less then number of category
colours will be repeating
!!!

The resulting map is not so good.

The map return that there are too many categories.
Normally, there are two ways to coupe with this problem:

  1. add more colour to the map, which can be done by adding colour_group and colour_name parameters;
  2. reduce the number of category to something less than 7, which is the magic number of the number of colour on a map.

In this case, the number of category is way too high, this can be observe by the legend. There are no colormap that can support so much number of categories.

So, in the following, I try to reduce the number of category to 6 major cats, plus one category as "other".

The first part in the following cell is to find the "major" categories, which is determined by the appearance frequency of the "com".

The second part is to create a new column in the attribute table, to store the major cats data.


In [7]:
## find major categories
coms = grid_res.com.tolist() # get the cats column from the attribute table
comset = list(set(coms)) # get the unique cat id
comcount = [ coms.count(c) for c in comset ] # count the appearance of the cats
comset2 = [ (c,n) for n,c in sorted(zip(comcount,comset), reverse=True) ] # sort the cats by the frequency
major = [c[0] for c in comset2[:6]] # 6 major cats
print major

## create new column of categories with major/other
collist = []
for c in coms:
    if c in major:
        collist.append(c) # the cat id if it is in the major cats
    else:
        collist.append(-1) # or -1 for other cats
print len(collist)==len(grid_res) # just a check
grid_res['com2'] = collist # put the new column into the attribute table


[0, 11, 18, 3, 1, 14]
True

Then, map the major categories, with some modification (lw, ec, alpha) for better looking on the edge of the polygon.


In [8]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_category(grid_res, 'com2', ax, lw=.1, ec='k', alpha=.7)


Actually, this can also be done by setting the cat_order paramter to the desire category list like this. (Experimental)


In [9]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_category(grid_res, 'com2', ax, lw=.1, ec='k', alpha=.7, cat_order=[0,1,3,11,14,18])


customize the categories manually

The result is not perfect yet. The colour of the "other" cat cannot be controlled. And, its colour looks just as it is as important as the other cats.

So, lets manually assign colours to the categories, and set the "other" cat to something less attractive.


In [10]:
colors = Dark2_7.mpl_colors # get this colormap from palletable

collist2 = []
for c in grid_res.com.tolist():
    if c in major:
        collist2.append(colors[major.index(c)]) # if the cat is a major, then use the colormap
    else:
        collist2.append('lightgrey') # silver colour for those "other" cat
len(collist2)==len(grid_res)
grid_res['colour'] = collist2 # assign the new list with color to the attribute table

In [11]:
grid_res.head() # lets take a look


Out[11]:
com geometry node tweets usercount xcor ycor com2 colour
0 14 POLYGON ((175239.9457184017 3947195.841823581,... 0 1 1 139.939807 35.640542 14 (0.901960784314, 0.670588235294, 0.0078431372549)
1 56 POLYGON ((175239.9457767347 3947695.841815081,... 1 0 0 139.939919 35.645048 -1 lightgrey
2 1 POLYGON ((142239.9457464929 3956695.841823446,... 10 35 21 139.576848 35.731640 1 (0.4, 0.650980392157, 0.117647058824)
3 18 POLYGON ((144239.9457266586 3959695.841818351,... 100 40 32 139.599535 35.758373 18 (0.458823529412, 0.439215686275, 0.701960784314)
4 4 POLYGON ((154239.9457194024 3947195.841822605,... 1000 1898 660 139.707733 35.644166 -1 lightgrey

This time, map the polygon directly with the colour column. Note that the function is "map_colour()".


In [12]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_colour(grid_res, 'colour', ax, lw=.1, ec='k', alpha=.7)


The map looks better now.

adding meaningful boundaries

But, it would be better if we also map the administrative boundaries to the map, to show which colour belong to which area. So lets add the boundary shapefile.

The following read the administrative boundary of the Tokyo 23 special wards.


In [13]:
borders = gpd.read_file('data/tokyo_special_ward.shp')
borders.head()


Out[13]:
CC_1 CC_2 ENGTYPE4 ENGTYPE_1 ENGTYPE_2 ENGTYPE_3 ENGTYPE_4 ENGTYPE_5 HASC_1 HASC_2 ... VALIDFR_4 VALIDTO_1 VALIDTO_2 VALIDTO_3 VALIDTO_4 VARNAME_1 VARNAME_2 VARNAME_3 VARNAME_4 geometry
0 None None None Metropolis Special Ward None None None JP.TK None ... None Unknown Present None Unknown Edo|Yedo|Tokio|T┼uio None None None (POLYGON ((139.7594604492192 35.61920547485357...
1 None None None Metropolis Special Ward None None None JP.TK None ... None Unknown Present None Unknown Edo|Yedo|Tokio|T┼uio None None None (POLYGON ((139.756988525391 35.61753082275391,...
2 None None None Metropolis Special Ward None None None JP.TK None ... None Unknown Present None Unknown Edo|Yedo|Tokio|T┼uio None None None POLYGON ((139.6250152587891 35.76376342773449,...
3 None None None Metropolis Special Ward None None None JP.TK None ... None Unknown Present None Unknown Edo|Yedo|Tokio|T┼uio None None None POLYGON ((139.6917114257814 35.68527603149425,...
4 None None None Metropolis Special Ward None None None JP.TK None ... None Unknown Present None Unknown Edo|Yedo|Tokio|T┼uio None None None (POLYGON ((139.7405700683598 35.5415992736817,...

5 rows × 53 columns

Before we map the boundaries to the map, lets check if the two shapefile have the same projection.


In [14]:
print borders.crs==grid_res.crs # check if the two shapefile have the same projection
print borders.crs # check the two projections
print grid_res.crs


False
{'init': u'epsg:4326'}
{u'lon_0': 138, u'ellps': u'WGS84', u'y_0': 0, u'no_defs': True, u'proj': u'eqdc', u'x_0': 0, u'units': u'm', u'lat_2': 40, u'lat_1': 34, u'lat_0': 0}

Turns out they are not same, so lets do some projection.


In [15]:
borders = borders.to_crs(grid_res.crs) # convert the borders projection to the same as the grid_res
print borders.crs==grid_res.crs # now check again if the two shapefile have the same projection


True

So they are now in the same projection.

Now, lets add the administrative boundaries to the map.
Because the boundaries are set to black colour (ec='k'), so the grid edge colour is changed to white (ec='w')


In [16]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_colour(grid_res, 'colour', ax, lw=.1, ec='w', alpha=.7)
ax = mpoly.add_border(borders, ax, ec='k', alpha=.4)


The map is better now.

Sometimes, we may need to add the labels of the area name to the map. So, lets try to find the name of each administrative.


In [17]:
print borders['NAME_2']


0         Minato
1      Shinagawa
2         Nerima
3        Shibuya
4            Ota
5       Setagaya
6        Edogawa
7        Chiyoda
8       Shinjuku
9         Bunkyo
10        Adachi
11         Taito
12        Nakano
13      Suginami
14      Itabashi
15          Kita
16          Chuo
17          Koto
18       Arakawa
19    Katsushika
20        Meguro
21       Toshima
22        Sumida
Name: NAME_2, dtype: object

The column that record the names seems to be in the "NAME_2" column.

So, lets try to add it to the previous map.


In [23]:
fig,ax = plt.subplots(figsize=(7,7))
ax = mpoly.prepare_map(ax, map_context=grid_res)
ax = mpoly.map_colour(grid_res, 'colour', ax, lw=.1, ec='w', alpha=.7)
ax = mpoly.add_border(borders, ax, ec='k', alpha=.4)
ax = mpoly.add_label(borders, ax, 'NAME_2', font_colour='k', font_size=10)


NAME_2

To be honest, this map can be helpful for observation, but it is not very nice looking.
So, maybe use the previous map for publication, and this for discussion. XD

That is all in this tutorial.

end of this tutorial...


In [ ]: