rasterstats has mainly operated on file-based vector and raster data sources. As of version 0.4, we've added the ability to work with in-memory representations of vector and raster data:
GeoDataFrame for vector datandarrays which can be used as a raster data source (with some additional metadata for georeferencing)In this example you'll learn how to load vectors into GeoDataFrames and rasters into ndarrays and inspect the results. Then we'll run zonal statistics on the two datasets, bring it back into a GeoDataFrames and write the results to a GeoJSON file.
In [9]:
import geopandas as gpd
geodf = gpd.GeoDataFrame.from_file('../tests/data/polygons.shp')
geodf.plot()
Out[9]:
... or view it as a table
In [10]:
geodf
Out[10]:
In [12]:
import rasterio
with rasterio.open("../tests/data/slope.tif") as src:
transform = src.meta['transform']
array = src.read_band(1)
print (transform)
array
Out[12]:
Because it's an ndarray, we have the entire numpy and scipy world available to us to work on the data.
In [13]:
array_chop = array[5:-5,5:-5] # chop off the outer 5 pixels
With a few lines of matplotlib glue, we can plot a map of the data.
In [14]:
import matplotlib.pyplot as plt
plt.imshow(array_chop, interpolation='nearest')
plt.colorbar()
plt.title('Slope Map')
plt.show()
In [11]:
from rasterstats import zonal_stats
zonal_stats(geodf, array)
What happened? Because the ndarrays are not geo-aware (they are just in array coordinates) we need to specify the transform tuple to specify exactly how the cells align with our spatial reference system.
The tuple contains 6 elements which correspond to:
You can construct this by hand if necessary but it's handy to use the coordinates from the source dataset, provided the dimensions of the array have not changed.
In [15]:
transform
Out[15]:
With the array and the transform information, we can run zonal_stats without complaint.
In [19]:
stats = zonal_stats(geodf, array, transform=transform)
stats
Out[19]:
The "list of dicts" representation of zonal statistics is helpful but we will often want to join these attributes back to the original GeoDataFrame in order to continue working with them. For example, we may want to do some additional calculations and write it out to a GeoJSON file.
Luckily (geo)pandas has some very intuitive constructors and methods that make this simple:
In [27]:
import pandas as pd
new_geodf = geodf.join(pd.DataFrame(stats))
new_geodf
Out[27]:
And to write it out to a vector file like GeoJSON
This currently does not work because the geodf.join method returns as standard DataFrame, not a GeoDataFrame! This bug should be fixed shortly...
In [33]:
## Hack around it for now
new_geodf.__class__ = gpd.GeoDataFrame
new_geodf.crs = {}
new_geodf.set_geometry('geometry')
## /hack
! rm /tmp/polygons_with_slope.geojson
new_geodf.to_file('/tmp/polygons_with_slope.geojson', driver="GeoJSON")
# Confirm that everything works according to OGR
! ogrinfo -ro -al /tmp/polygons_with_slope.geojson