matta - view and scaffold d3.js visualizations in IPython notebooks

Let's Make a Map Too

Inspired by Mike Bostock's Let's Make a Map, we want to make a map too using matta. We will display the communes of Santiago, Chile. To do that we will perform the following steps:

Download the administrative borders of Chile in a shapefile from the Library of Congress.
Use ogr2ogr to filter and clip the borders of the city of Santiago, as well as converting the result to GeoJSON.
Convert the GeoJSON file to TopoJSON.
Display the TopoJSON file using matta in the IPython notebook.
Download Human Development Index from Wikipedia and make a choropleth/symbol map using matta.



In [1]:

    
import matta
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs/')









    Out[1]:






matta Javascript code added.

Download Shapefiles

Note We delete data to start from 0



In [2]:

    
!rm -fr data
!mkdir data
!wget http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip -O data/division_comunal.zip









    



--2015-01-08 13:34:16--  http://siit2.bcn.cl/obtienearchivo?id=repositorio/10221/10396/1/division_comunal.zip
Resolviendo siit2.bcn.cl (siit2.bcn.cl)... 200.0.66.71
Conectando con siit2.bcn.cl (siit2.bcn.cl)[200.0.66.71]:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 29000232 (28M) [application/zip]
Grabando a: “data/division_comunal.zip”

100%[======================================>] 29.000.232  1,26MB/s   en 22s    

2015-01-08 13:34:39 (1,26 MB/s) - “data/division_comunal.zip” guardado [29000232/29000232]



In [3]:

    
!unzip data/division_comunal.zip -d data/









    



Archive:  data/division_comunal.zip
  inflating: data/Disclaimer.txt     
  inflating: data/division_comunal.dbf  
  inflating: data/division_comunal.prj  
  inflating: data/division_comunal.sbn  
  inflating: data/division_comunal.sbx  
  inflating: data/division_comunal.shp  
  inflating: data/division_comunal.shp.xml  
  inflating: data/division_comunal.shx

Convert to GeoJSON

You can use ogrinfo to see the structure of the source shapefile.



In [4]:

    
!ogrinfo data/division_comunal.shp 'division_comunal' | head -n 30









    



INFO: Open of `data/division_comunal.shp'
      using driver `ESRI Shapefile' successful.

Layer name: division_comunal
Geometry: Polygon
Feature Count: 346
Extent: (-3701712.293900, 3794823.357600) - (704690.560200, 8065196.816300)
Layer SRS WKT:
PROJCS["WGS_1984_UTM_Zone_19S",
    GEOGCS["GCS_WGS_1984",
        DATUM["WGS_1984",
            SPHEROID["WGS_84",6378137.0,298.257223563]],
        PRIMEM["Greenwich",0.0],
        UNIT["Degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",10000000.0],
    PARAMETER["Central_Meridian",-69.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0]]
NOM_REG: String (50.0)
NOM_PROV: String (20.0)
NOM_COM: String (30.0)
SHAPE_LENG: Real (19.11)
DIS_ELEC: Integer (4.0)
CIR_SENA: Integer (4.0)
COD_COMUNA: Integer (4.0)
SHAPE_Le_1: Real (19.11)
SHAPE_Area: Real (19.11)

Now we use ogr2ogr to convert the shapefile into GeoJSON.

Notes:

After some manual inspection, we know that NOM_PROV contains the name of the parent administrative divisions of the city. We use a where clause to filter those.
We delete santiago-comunas.json in case it exists (that is, when we re-run the notebook :) ).
We use the -clipdst option to specify a bounding box obtained in this site.
We also use the -t_srs EPSG:4326-o option to convert the data coordinates to (longitude,latitude) pairs.



In [5]:

    
!rm data/santiago-comunas.json
!ogr2ogr -where "NOM_PROV IN ('Santiago', 'Maipo', 'Cordillera')" -f GeoJSON \
    -clipdst -70.828155 -33.635036 -70.452573 -33.302953 -t_srs EPSG:4326-o \
    data/santiago-comunas.json data/division_comunal.shp









    



rm: no se puede borrar «data/santiago-comunas.json»: No existe el archivo o el directorio

From GeoJSON to TopoJSON



In [6]:

    
!topojson -p --id-property NOM_COM -s 0 -o data/topojson-santiago-comunas.json data/santiago-comunas.json









    



bounds: -70.828155 -33.635036 -70.452573 -33.302953 (spherical)
pre-quantization: 0.0418m (3.76e-7°) 0.0369m (3.32e-7°)
topology: 253 arcs, 5154 points
post-quantization: 4.18m (0.0000376°) 3.69m (0.0000332°)
prune: retained 253 / 253 arcs (100%)



In [7]:

    
import json
import unicodedata

def strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')

with open('data/topojson-santiago-comunas.json', 'r') as f:
    stgo = json.load(f)
    
for g in stgo['objects']['santiago-comunas']['geometries']:
    g['id'] = strip_accents(g['id'].upper())
    g['properties']['id'] = g['id']



In [8]:

    
stgo['objects']['santiago-comunas']['geometries'][7]









    Out[8]:





{u'arcs': [[61, 62, 63, -15, 64, 65, 66, 67, 68, 69, 70, 71]],
 u'id': u'SAN JOAQUIN',
 u'properties': {u'CIR_SENA': 8,
  u'COD_COMUNA': 1312,
  u'DIS_ELEC': 25,
  u'NOM_COM': u'San Joaqu\xedn',
  u'NOM_PROV': u'Santiago',
  u'NOM_REG': u'Regi\xf3n Metropolitana de Santiago',
  u'SHAPE_Area': 9876876.69845,
  u'SHAPE_LENG': 13987.3267808,
  u'SHAPE_Le_1': 13986.8273946,
  'id': u'SAN JOAQUIN'},
 u'type': u'Polygon'}

Display TopoJSON using matta



In [9]:

    
from matta import topojson

topojson(geometry=stgo)

Choroplet and Symbol Map



In [10]:

    
import requests
wikipage = requests.get('https://es.wikipedia.org/wiki/Anexo:Comunas_de_Santiago_de_Chile')
wikipage









    Out[10]:





<Response [200]>



In [11]:

    
%load_ext autoreload
%autoreload 2



In [12]:

    
import pandas as pd
df = pd.read_html(wikipage.text, attrs={'class': 'sortable'}, header=0)[0]
df.head()









    Out[12]:






  
    
      
      Comuna
      Ubicación?
      Población?
      Viviendas?
      Densidad poblacional?
      Crecimiento demográfico?
      IDH?
      Pobreza?
    
  
  
    
      0
              Cerrillos
       Surponiente
        71.906
       19.811
        4.32908
       -10
        0,743 (54)
        83
    
    
      1
            Cerro Navia
       Norponiente
       148.312
       35.277
       13.48291
       -48
       0,683 (165)
       175
    
    
      2
               Conchalí
             Norte
       133.256
       32.609
       12.07029
      -129
       0,707 (118)
        80
    
    
      3
              El Bosque
               Sur
       175.594
       42.808
       12.27072
        16
       0,711 (106)
       158
    
    
      4
       Estación Central
       Surponiente
       130.394
       32.357
        9.03631
       -75
        0,735 (60)
        73

Data is not clean. Fortunately, we just want the IDH column, which should be easy to convert to a meaningful float.



In [13]:

    
df['Comuna'] = [strip_accents(c).replace('?', '').upper() for c in df['Comuna']]
df['IDH'] = [float(c.split()[0].replace(',', '.')) for c in df['IDH?']]
del df['IDH?']
df.head()









    Out[13]:






  
    
      
      Comuna
      Ubicación?
      Población?
      Viviendas?
      Densidad poblacional?
      Crecimiento demográfico?
      Pobreza?
      IDH
    
  
  
    
      0
              CERRILLOS
       Surponiente
        71.906
       19.811
        4.32908
       -10
        83
       0.743
    
    
      1
            CERRO NAVIA
       Norponiente
       148.312
       35.277
       13.48291
       -48
       175
       0.683
    
    
      2
               CONCHALI
             Norte
       133.256
       32.609
       12.07029
      -129
        80
       0.707
    
    
      3
              EL BOSQUE
               Sur
       175.594
       42.808
       12.27072
        16
       158
       0.711
    
    
      4
       ESTACION CENTRAL
       Surponiente
       130.394
       32.357
        9.03631
       -75
        73
       0.735



In [14]:

    
df.IDH.describe()









    Out[14]:





count    37.000000
mean      0.762865
std       0.076155
min       0.657000
25%       0.709000
50%       0.737000
75%       0.804000
max       0.949000
Name: IDH, dtype: float64

We use seaborn to create a color palette.



In [15]:

    
%matplotlib inline
import seaborn as sns
palette = sns.color_palette("GnBu_d", 5)
sns.palplot(palette)



In [16]:

    
from matta.scales import threshold_scale

scale = threshold_scale(df.IDH, palette, extend_by=0.05)
scale









    Out[16]:





{'domain': [0.65700000000000003,
  0.7543333333333333,
  0.85166666666666668,
  0.94899999999999995],
 'extent': [0.64240000000000008, 0.9635999999999999],
 'range': [u'#385965', u'#3d8099', u'#43a6cc', u'#68bac6', u'#8fcec0']}



In [17]:

    
topojson(geometry=stgo, area_dataframe=df, area_feature_name='Comuna', area_value='IDH', area_color_scale_domain=scale['domain'],
         area_color_scale_range=scale['range'], area_color_scale_extent=scale['extent'], leaflet=False)



In [18]:

    
topojson(geometry=stgo, mark_dataframe=df, mark_feature_name='Comuna', mark_value='Pobreza?', 
         mark_color='indigo', mark_scale=0.5)



In [19]:

    
topojson(geometry=stgo, area_dataframe=df, area_feature_name='Comuna', area_value='IDH', area_color_scale_domain=scale['domain'],
         area_color_scale_range=scale['range'], area_color_scale_extent=scale['extent'],
         mark_dataframe=df, mark_feature_name='Comuna', mark_value='Pobreza?', mark_color='indigo', mark_scale=0.5,
         mark_max_ratio=15, mark_min_ratio=0, mark_opacity=0.5, leaflet=True)

The mixture of both choropleth and symbol map does not make sense in our case. But surely you have a more interesting use case!

You can see an example of scaffolded visualizations using matta.topojson here.

	Comuna	Ubicación?	Población?	Viviendas?	Densidad poblacional?	Crecimiento demográfico?	IDH?	Pobreza?
0	Cerrillos	Surponiente	71.906	19.811	4.32908	-10	0,743 (54)	83
1	Cerro Navia	Norponiente	148.312	35.277	13.48291	-48	0,683 (165)	175
2	Conchalí	Norte	133.256	32.609	12.07029	-129	0,707 (118)	80
3	El Bosque	Sur	175.594	42.808	12.27072	16	0,711 (106)	158
4	Estación Central	Surponiente	130.394	32.357	9.03631	-75	0,735 (60)	73