Examples: GeoPandas Spatial Joins

A spatial join uses binary predicates such as intersects and crosses to combine two GeoDataFrames based on the spatial relationship between their geometries.

A common use case might be a spatial join between a point layer and a polygon layer where you want to retain the point geometries and grab the attributes of the intersecting polygons.


From https://github.com/geopandas/geopandas/blob/master/examples/spatial_joins.ipynb. Re-ran and tweaked by Emilio Mayorga, 12/10/2015, on my geopandas conda environment.


In [1]:
from IPython.core.display import Image 
Image(url="https://dl.dropboxusercontent.com/u/6769420/sjoin_test.png")


Out[1]:

Types of spatial joins

We currently support the following methods of spatial joins. We refer to the left_df and right_df which are the correspond to the two dataframes passed in as args.

Left outer join

In a LEFT OUTER JOIN (how='left'), we keep all rows from the left and duplicate them if necessary to represent multiple hits between the two dataframes. We retain attributes of the right if they intersect and lose right rows that don't intersect. A left outer join implies that we are interested in retaining the geometries of the left.

This is equivalent to the PostGIS query:

SELECT pts.geom, pts.id as ptid, polys.id as polyid  
FROM pts
LEFT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid 
--------------------------------------------+------+--------
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
 0101000000818693BA2F8FF7BF4ADD97C75604E9BF |    1 |       
(5 rows)

Right outer join

In a RIGHT OUTER JOIN (how='right'), we keep all rows from the right and duplicate them if necessary to represent multiple hits between the two dataframes. We retain attributes of the left if they intersect and lose left rows that don't intersect. A right outer join implies that we are interested in retaining the geometries of the right.

This is equivalent to the PostGIS query:

SELECT polys.geom, pts.id as ptid, polys.id as polyid  
FROM pts
RIGHT OUTER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

  geom    | ptid | polyid 
----------+------+--------
 01...9BF |    4 |     10
 01...9BF |    3 |     10
 02...7BF |    3 |     20
 02...7BF |    2 |     20
 00...5BF |      |     30
(5 rows)

Inner join

In an INNER JOIN (how='inner'), we keep rows from the right and left only where their binary predicate is True. We duplicate them if necessary to represent multiple hits between the two dataframes. We retain attributes of the right and left only if they intersect and lose all rows that do not. An inner join implies that we are interested in retaining the geometries of the left.

This is equivalent to the PostGIS query:

SELECT pts.geom, pts.id as ptid, polys.id as polyid  
FROM pts
INNER JOIN polys
ON ST_Intersects(pts.geom, polys.geom);

                    geom                    | ptid | polyid 
--------------------------------------------+------+--------
 010100000040A9FBF2D88AD03F349CD47D796CE9BF |    4 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     10
 010100000048EABE3CB622D8BFA8FBF2D88AA0E9BF |    3 |     20
 0101000000F0D88AA0E1A4EEBF7052F7E5B115E9BF |    2 |     20
(4 rows)

Spatial Joins between two GeoDataFrames

Let's take a look at how we'd implement these using GeoPandas. First, load up the NYC test data into GeoDataFrames:

Downloaded sample NYC Boroughs shape file from http://www.nyc.gov/html/dcp/download/bytes/nybb_14aav.zip


In [2]:
%matplotlib inline
import os
from shapely.geometry import Point
from geopandas import GeoDataFrame, read_file
from geopandas.tools import overlay, sjoin

# NYC Boros
#zippath = os.path.abspath('../examples/nybb_14aav.zip')
zippath = os.path.abspath('/home/mayorga/Desktop/MySciPythonApps/nybb_14aav.zip')
polydf = read_file('/nybb_14a_av/nybb.shp', vfs='zip://' + zippath)

# Generate some points
b = [int(x) for x in polydf.total_bounds]
N = 8
pointdf = GeoDataFrame([
    {'geometry' : Point(x, y), 'value1': x + y, 'value2': x - y}
    for x, y in zip(range(b[0], b[2], int((b[2]-b[0])/N)),
                    range(b[1], b[3], int((b[3]-b[1])/N)))])

In [3]:
pointdf


Out[3]:
geometry value1 value2
0 POINT (913175 120121) 1033296 793054
1 POINT (932450 139211) 1071661 793239
2 POINT (951725 158301) 1110026 793424
3 POINT (971000 177391) 1148391 793609
4 POINT (990275 196481) 1186756 793794
5 POINT (1009550 215571) 1225121 793979
6 POINT (1028825 234661) 1263486 794164
7 POINT (1048100 253751) 1301851 794349
8 POINT (1067375 272841) 1340216 794534

In [4]:
pointdf.plot();



In [5]:
polydf


Out[5]:
BoroCode BoroName Shape_Area Shape_Leng geometry
0 5 Staten Island 1.623847e+09 330454.175933 (POLYGON ((970217.0223999023 145643.3322143555...
1 3 Brooklyn 1.937810e+09 741227.337073 (POLYGON ((1021176.479003906 151374.7969970703...
2 4 Queens 3.045079e+09 896875.396449 (POLYGON ((1029606.076599121 156073.8142089844...
3 1 Manhattan 6.364308e+08 358400.912836 (POLYGON ((981219.0557861328 188655.3157958984...
4 2 Bronx 1.186822e+09 464475.145651 (POLYGON ((1012821.805786133 229228.2645874023...

In [6]:
polydf.plot();


Joins


In [7]:
join_left_df = sjoin(pointdf, polydf, how="left")
join_left_df
# Note the NaNs where the point did not intersect a boro


Warning: CRS does not match!
Out[7]:
geometry value1 value2 index_right BoroCode BoroName Shape_Area Shape_Leng
0 POINT (913175 120121) 1033296 793054 NaN NaN NaN NaN NaN
1 POINT (932450 139211) 1071661 793239 0 5 Staten Island 1.623847e+09 330454.175933
2 POINT (951725 158301) 1110026 793424 0 5 Staten Island 1.623847e+09 330454.175933
3 POINT (971000 177391) 1148391 793609 NaN NaN NaN NaN NaN
4 POINT (990275 196481) 1186756 793794 NaN NaN NaN NaN NaN
5 POINT (1009550 215571) 1225121 793979 2 4 Queens 3.045079e+09 896875.396449
6 POINT (1028825 234661) 1263486 794164 4 2 Bronx 1.186822e+09 464475.145651
7 POINT (1048100 253751) 1301851 794349 NaN NaN NaN NaN NaN
8 POINT (1067375 272841) 1340216 794534 NaN NaN NaN NaN NaN

In [8]:
join_left_df.plot();



In [9]:
join_right_df = sjoin(pointdf, polydf, how="right")
join_right_df
# Note Staten Island is repeated


Warning: CRS does not match!
Out[9]:
value1 value2 index_left BoroCode BoroName Shape_Area Shape_Leng geometry
index_right
0 1071661 793239 1 5 Staten Island 1.623847e+09 330454.175933 (POLYGON ((970217.0223999023 145643.3322143555...
0 1110026 793424 2 5 Staten Island 1.623847e+09 330454.175933 (POLYGON ((970217.0223999023 145643.3322143555...
2 1225121 793979 5 4 Queens 3.045079e+09 896875.396449 (POLYGON ((1029606.076599121 156073.8142089844...
4 1263486 794164 6 2 Bronx 1.186822e+09 464475.145651 (POLYGON ((1012821.805786133 229228.2645874023...
1 NaN NaN NaN 3 Brooklyn 1.937810e+09 741227.337073 (POLYGON ((1021176.479003906 151374.7969970703...
3 NaN NaN NaN 1 Manhattan 6.364308e+08 358400.912836 (POLYGON ((981219.0557861328 188655.3157958984...

In [10]:
join_right_df.plot();



In [11]:
join_inner_df = sjoin(pointdf, polydf, how="inner")
join_inner_df
# Note the lack of NaNs; dropped anything that didn't intersect


Warning: CRS does not match!
Out[11]:
geometry value1 value2 index_right BoroCode BoroName Shape_Area Shape_Leng
1 POINT (932450 139211) 1071661 793239 0 5 Staten Island 1.623847e+09 330454.175933
2 POINT (951725 158301) 1110026 793424 0 5 Staten Island 1.623847e+09 330454.175933
5 POINT (1009550 215571) 1225121 793979 2 4 Queens 3.045079e+09 896875.396449
6 POINT (1028825 234661) 1263486 794164 4 2 Bronx 1.186822e+09 464475.145651

In [12]:
join_inner_df.plot();


We're not limited to using the intersection binary predicate. Any of the Shapely geometry methods that return a Boolean can be used by specifying the op kwarg.


In [13]:
sjoin(pointdf, polydf, how="left", op="within")


Warning: CRS does not match!
Out[13]:
geometry value1 value2 index_right BoroCode BoroName Shape_Area Shape_Leng
0 POINT (913175 120121) 1033296 793054 NaN NaN NaN NaN NaN
1 POINT (932450 139211) 1071661 793239 0 5 Staten Island 1.623847e+09 330454.175933
2 POINT (951725 158301) 1110026 793424 0 5 Staten Island 1.623847e+09 330454.175933
3 POINT (971000 177391) 1148391 793609 NaN NaN NaN NaN NaN
4 POINT (990275 196481) 1186756 793794 NaN NaN NaN NaN NaN
5 POINT (1009550 215571) 1225121 793979 2 4 Queens 3.045079e+09 896875.396449
6 POINT (1028825 234661) 1263486 794164 4 2 Bronx 1.186822e+09 464475.145651
7 POINT (1048100 253751) 1301851 794349 NaN NaN NaN NaN NaN
8 POINT (1067375 272841) 1340216 794534 NaN NaN NaN NaN NaN

In [ ]: