Python GeoPandas appetizer

Introduction to GIS scripting
May, 2017

© 2017, Stijn Van Hoey (mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons



In [4]:

    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In this notebook, we will explore some of the functionalities of GeoPandas as a first introduction to how we can build on top of the Python standard library and the scientific Python ecosystem.

Some new functionalities are required for this showcase (but will be discussed later in more detail):



In [5]:

    
import mplleaflet
import geopandas as gpd

Similar to reading in csv files with Pandas, geoPandas provides reading methods/functions to read GIS-formats from file:



In [6]:

    
deelbekkens = gpd.read_file("../data/deelbekkens/Deelbekken.shp")



In [7]:

    
deelbekkens.head()









    Out[7]:






  
    
      
      BEKNAAM
      BEKNR
      DEELBEKKEN
      LENGTE
      OIDN
      OPPERVL
      STRMGEB
      UIDN
      geometry
    
  
  
    
      0
      Boven-Scheldebekken
      6
      06-02
      35788.31
      41
      3.424316e+07
      Schelde
      142
      POLYGON ((98710.32800000161 162573.7030000016,...
    
    
      1
      Demerbekken
      9
      09-02
      73615.82
      67
      1.412802e+08
      Schelde
      812
      POLYGON ((201200.3610000014 183565.3020000011,...
    
    
      2
      Leiebekken
      5
      05-04
      65791.53
      37
      8.875261e+07
      Schelde
      936
      POLYGON ((74660.39793749899 190219.0009999983,...
    
    
      3
      Boven-Scheldebekken
      6
      06-03
      71432.94
      42
      1.782935e+08
      Schelde
      939
      POLYGON ((101010.7037499994 175254.0949999988,...
    
    
      4
      Boven-Scheldebekken
      6
      06-04
      71988.34
      43
      1.584879e+08
      Schelde
      940
      POLYGON ((105339.2030000016 188480.7000624985,...

REMEMBER:

It is like a DataFrame, but with a Geometry column attached to it!

We can check the geometry information in more detail, to see if we are dealing with POINT, LINE or POLYGON data:



In [8]:

    
deelbekkens.geometry.head()









    Out[8]:





0    POLYGON ((98710.32800000161 162573.7030000016,...
1    POLYGON ((201200.3610000014 183565.3020000011,...
2    POLYGON ((74660.39793749899 190219.0009999983,...
3    POLYGON ((101010.7037499994 175254.0949999988,...
4    POLYGON ((105339.2030000016 188480.7000624985,...
Name: geometry, dtype: object



In [9]:

    
deelbekkens.plot()









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fa81670d9e8>

Selecting the Nete from this shapefile (any BEKNAAM that contains Nete as name), just as we would do in Pandas:



In [10]:

    
nete = deelbekkens[deelbekkens["BEKNAAM"].str.lower().str.contains('nete')].copy()
# Remark: you can off course also use: deelbekkens[deelbekkens["BEKNAAM"] == "Netebekken"]
nete.plot()









    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fa8072cea58>

REMEMBER:

The usage of boolean indexing is just the same as we did it with Numpy/Pandas...

Getting the current projection information



In [11]:

    
nete.crs









    Out[11]:





{'ellps': 'intl',
 'lat_0': 90,
 'lat_1': 49.8333339,
 'lat_2': 51.16666723333333,
 'lon_0': 4.367486666666666,
 'no_defs': True,
 'proj': 'lcc',
 'units': 'm',
 'x_0': 150000.01256,
 'y_0': 5400088.4378}

ATTENTION:

As we are dealing with spatial information, the **projection information** is crucial!

Converting to WGS84, which is represented by the EPSG code 4326 enables us easy combination with open street map backgrounds:



In [12]:

    
nete_wgs84 = nete.to_crs({'init': 'epsg:4326'}) # method that converts the projection to the given EPSG code



In [13]:

    
nete_wgs84.crs









    Out[13]:





{'init': 'epsg:4326'}

We want to plot the data on an interactive map. For small datasets, a convenient Package, called mplleaflet exists:



In [14]:

    
fig, ax = plt.subplots()
nete_wgs84.plot(ax=ax)
nete_wgs84.centroid.plot(ax=ax, markersize=10)  # Add the centroids of the individual polygons to the plot
mplleaflet.display()









    Out[14]:

Some derived spatial attributes are directly available to the object, such as the area of each polygon:



In [15]:

    
nete_wgs84.area









    Out[15]:





10     0.015730
12     0.015746
14     0.015385
41     0.011933
69     0.017026
70     0.022380
80     0.011189
89     0.016462
90     0.008961
91     0.012096
92     0.018783
93     0.029109
101    0.020618
dtype: float64

Adding a buffer to our polygons is a method available to the geometry of the geoDataFrame:

Add a buffered version of the geometry as an additional column to the data:



In [16]:

    
nete["buffered_1000"] = nete.geometry.buffer(1000)



In [17]:

    
nete.head()









    Out[17]:






  
    
      
      BEKNAAM
      BEKNR
      DEELBEKKEN
      LENGTE
      OIDN
      OPPERVL
      STRMGEB
      UIDN
      geometry
      buffered_1000
    
  
  
    
      10
      Netebekken
      10
      10-03
      59209.30
      82
      1.227172e+08
      Schelde
      982
      POLYGON ((188940.7047500014 198339.7019375004,...
      POLYGON ((177054.486954743 187150.9925692019, ...
    
    
      12
      Netebekken
      10
      10-09
      64826.31
      88
      1.227165e+08
      Schelde
      1020
      POLYGON ((204226.0362500027 203324.5498124994,...
      POLYGON ((189444.9653970161 198035.0961898574,...
    
    
      14
      Netebekken
      10
      10-07
      66432.43
      86
      1.196548e+08
      Schelde
      1041
      POLYGON ((194785.0951875001 214766.2020624988,...
      POLYGON ((176974.5782636238 208243.674105004, ...
    
    
      41
      Netebekken
      10
      10-08
      49095.42
      87
      9.294065e+07
      Schelde
      1169
      POLYGON ((183516.3622500002 205858.0518124998,...
      POLYGON ((175694.5721767615 199957.1560016522,...
    
    
      69
      Netebekken
      10
      10-01
      58932.83
      80
      1.326798e+08
      Schelde
      1227
      POLYGON ((160260.2029374987 205907.5017499998,...
      POLYGON ((152818.0146732014 196691.2176390541,...



In [18]:

    
# Plot both of the polygons together:
fig, ax = plt.subplots()
nete.plot(ax=ax)
nete.set_geometry("buffered_1000").plot(ax=ax)
ax.set_aspect('equal', 'datalim')

Let's introduce a second vector file, saved as a Geojson file (filename.geojson) with the EU defined 10K grid cells:



In [19]:

    
eu_10k_grids = gpd.read_file("../data/EUgrid10.geojson")
eu_10k_grids.crs # the projection is contained in the geojson file









    Out[19]:





{'init': 'epsg:4326'}



In [20]:

    
eu_10k_grids.plot()









    Out[20]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fa805754a20>



In [21]:

    
eu_10k_grids.head(5)









    Out[21]:






  
    
      
      CellCode
      EofOrigin
      NofOrigin
      geometry
    
  
  
    
      0
      10kmE379N312
      3790000.0
      3120000.0
      POLYGON ((2.429361848365367 50.9437587892485, ...
    
    
      1
      10kmE379N313
      3790000.0
      3130000.0
      POLYGON ((2.414703469265138 51.03309331423895,...
    
    
      2
      10kmE379N314
      3790000.0
      3140000.0
      POLYGON ((2.39997479452166 51.1224234251688, 2...
    
    
      3
      10kmE380N309
      3800000.0
      3090000.0
      POLYGON ((2.613788188448232 50.68487195072306,...
    
    
      4
      10kmE380N310
      3800000.0
      3100000.0
      POLYGON ((2.599603514850633 50.77424004455327,...

We can make selections (cfr. queries) to our data on some kind of condition:



In [22]:

    
eu_10k_grids[eu_10k_grids["CellCode"].str.contains('N311')].plot()
mplleaflet.display()









    Out[22]:

Using a spatial join, extract those 10K gridcells that intersect with the polygon of the Grote Nete study area (we use the original nete variable):



In [23]:

    
nete_10k = gpd.sjoin(eu_10k_grids, nete_wgs84, how="inner", op='intersects')

A plot to control the outcome:



In [24]:

    
fig, ax = plt.subplots()
nete_10k.plot(ax=ax)
nete_wgs84.plot(ax=ax)
mplleaflet.display()









    Out[24]:



In [25]:

    
nete_10k.crs









    Out[25]:





{'init': 'epsg:4326'}

Save the result of our analysis as a shapefile:



In [55]:

    
nete_10k.to_file("../nete_10k_grid.shp")  # But maybe a geojson is easier to manage (as it is one file...)

Nice, but.... how?

	BEKNAAM	BEKNR	DEELBEKKEN	LENGTE	OIDN	OPPERVL	STRMGEB	UIDN	geometry
0	Boven-Scheldebekken	6	06-02	35788.31	41	3.424316e+07	Schelde	142	POLYGON ((98710.32800000161 162573.7030000016,...
1	Demerbekken	9	09-02	73615.82	67	1.412802e+08	Schelde	812	POLYGON ((201200.3610000014 183565.3020000011,...
2	Leiebekken	5	05-04	65791.53	37	8.875261e+07	Schelde	936	POLYGON ((74660.39793749899 190219.0009999983,...
3	Boven-Scheldebekken	6	06-03	71432.94	42	1.782935e+08	Schelde	939	POLYGON ((101010.7037499994 175254.0949999988,...
4	Boven-Scheldebekken	6	06-04	71988.34	43	1.584879e+08	Schelde	940	POLYGON ((105339.2030000016 188480.7000624985,...

	BEKNAAM	BEKNR	DEELBEKKEN	LENGTE	OIDN	OPPERVL	STRMGEB	UIDN	geometry	buffered_1000
10	Netebekken	10	10-03	59209.30	82	1.227172e+08	Schelde	982	POLYGON ((188940.7047500014 198339.7019375004,...	POLYGON ((177054.486954743 187150.9925692019, ...
12	Netebekken	10	10-09	64826.31	88	1.227165e+08	Schelde	1020	POLYGON ((204226.0362500027 203324.5498124994,...	POLYGON ((189444.9653970161 198035.0961898574,...
14	Netebekken	10	10-07	66432.43	86	1.196548e+08	Schelde	1041	POLYGON ((194785.0951875001 214766.2020624988,...	POLYGON ((176974.5782636238 208243.674105004, ...
41	Netebekken	10	10-08	49095.42	87	9.294065e+07	Schelde	1169	POLYGON ((183516.3622500002 205858.0518124998,...	POLYGON ((175694.5721767615 199957.1560016522,...
69	Netebekken	10	10-01	58932.83	80	1.326798e+08	Schelde	1227	POLYGON ((160260.2029374987 205907.5017499998,...	POLYGON ((152818.0146732014 196691.2176390541,...

	CellCode	EofOrigin	NofOrigin	geometry
0	10kmE379N312	3790000.0	3120000.0	POLYGON ((2.429361848365367 50.9437587892485, ...
1	10kmE379N313	3790000.0	3130000.0	POLYGON ((2.414703469265138 51.03309331423895,...
2	10kmE379N314	3790000.0	3140000.0	POLYGON ((2.39997479452166 51.1224234251688, 2...
3	10kmE380N309	3800000.0	3090000.0	POLYGON ((2.613788188448232 50.68487195072306,...
4	10kmE380N310	3800000.0	3100000.0	POLYGON ((2.599603514850633 50.77424004455327,...