In [2]:
import ogr
import subprocess
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
GDAL is a really powerful library for handling GIS data. It provides a number of functionalities to interact with spatial data. As a typical example, take the reprojection of a shapefile to another CRS:
In [3]:
!ogr2ogr ../scratch/deelbekkens_wgs84 -t_srs "EPSG:4326" ../data/deelbekkens/Deelbekken.shp
What is this combination of commands?
!
this is a jupyter notebook-thing, telling it we're running something on the command line instead of in Python../scratch/deelbekkens_wgs84
the output location of the created file-t_srs "EPSG:4326"
the CRS information for to which the data should be projected../data/deelbekkens/Deelbekken.shp
the source file locationThe documentation is a bit overwhelming:
In [4]:
!ogr2ogr --help
but there are great online resources with good examples you can easily copy paste for your own applications...
A lot of expensive terminology...
Let's illustrate this with an example: The information about municipalities is available as open data on geopunt (coming from informatie Vlaanderen). The publication is provided as a WFS service...
Take home message -> GDAL can handle WFS web services ;-)
Downloading the province boundaries from the WFS service provided by informatie Vlaanderen/Geopunt to a geojson file is as follows:
In [5]:
!ogr2ogr -f 'Geojson' ../scratch/provinces.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/VRBG/wfs" Refprv
We can start working with this date...
In [6]:
provinces = gpd.read_file("../scratch/provinces.geojson")
provinces.plot()
Out[6]:
Actually, GDAL can directly query the WFS data:
Let's say I only need the province of Antwerp
:
In [7]:
!ogr2ogr -f 'Geojson' ../scratch/antwerp_prov.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/VRBG/wfs" Refprv -where "NAAM = 'Antwerpen'"
In [8]:
antwerp = gpd.read_file("../scratch/antwerp_prov.geojson")
antwerp.plot()
Out[8]:
I do know that the Meetplaatsen Oppervlaktewaterkwaliteit are also available as a WFS web service. However, I'm only interested in the locations for fytoplankton:
In [9]:
!ogr2ogr -f 'Geojson' ../scratch/metingen_fytoplankton.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/MeetplOppervlwaterkwal/wfs" Mtploppw -where "FYTOPLANKT = '1'"
In [10]:
import mplleaflet
fyto = gpd.read_file("../scratch/metingen_fytoplankton.geojson")
fyto.head()
fyto.to_crs('+init=epsg:4326').plot(markersize=5)
mplleaflet.display()
Out[10]:
In [11]:
fyto.head()
Out[11]:
Actually, the same type of subselection are also possible on shapefiles,...
Extracting a specific DEELBEKKEN from the deelbekken shapefile:
In [12]:
!ogr2ogr ../scratch/subcat.shp ../data/deelbekkens/Deelbekken.shp -where "DEELBEKKEN = '10-10'"
(If you're wondering how I know how to setup these commands and arguments, check the (draft) introduction_webservices.ipynb
in the scratch
folder.
I use the python interafce of GDAL/OGR and the package owslib
to find out how to setup the arguments.)
No problem if this is still unclear... an example application!
The example we will use is to clip raster data using a shapefile. We use a data set from natural earth, which we will unzip to start working on it (off course using Python itself):
In [13]:
import zipfile
zip_ref = zipfile.ZipFile("../data/NE1_50m_SR.zip", 'r')
zip_ref.extractall("../scratch")
zip_ref.close()
The GDAL function that support the clipping of a raster file, is called gdalwarp
. Again, the documentation looks rather overwhelming... Let's start with an example execution:
In [15]:
!gdalwarp ../scratch/NE1_50M_SR/NE1_50M_SR.tif ../scratch/cliptest.tif -cutline "../scratch/subcat.shp" -crop_to_cutline -overwrite
!
this is a jupyter notebook-thing, telling it we're running something on the command line instead of in Pythongdalwarp
is the GDAL command to use../scratch/NE1_50M_SR/NE1_50M_SR.tif
the source file location../scratch/cliptest.tif
the output location of the created file-cutline "../scratch/subcat.shp"
the shape file to cut the raster with-crop_to_cutline
an additional argument to GDAL to make the clipping-overwrite
overwrite eventual existing file with the same name
In [16]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/cliptest.tif')
plt.imshow(img)
Out[16]:
This is off course a dummy example (to keep runtime low), but it illustrates the concept.
Doing the same using pure Python code and from osgeo import gdal
is actually not so beneficial, as the command above is reather straighforward... However, the dependency of the command line provides a switch of environment in any data analysis pipeline. I actually do want to have the best of both worlds:
Using Python code, but running the command line version of GDAL...
...therefore we need subprocess
!
In [17]:
import subprocess
Doing the same as above, but actually using Python code to run the command with given variables as input:
In [18]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_out")) # same location, but adding _out to the output
inshape = "../scratch/subcat.shp"
subprocess.call(['gdalwarp', inraster, outraster, '-cutline', inshape,
'-crop_to_cutline', '-overwrite'])
Out[18]:
Remark when GDAL provides a zero as return statement, this is a GOOD sign!
In [19]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_out.tif')
plt.imshow(img)
Out[19]:
Hence, the result is the same, but calling the command from Python. By writing a Python function for this routine, I do have a reusable functionality in my toolbox that I can load in any other Python script:
In [20]:
def clip_raster(inraster, outraster, invector):
"""clip a raster image with a vector file
Parameters
----------
inraster : GDAL compatible raster format
outraster : GDAL compatible raster format
invector : GDAL compatible vector format
"""
response = subprocess.call(['gdalwarp', inraster, outraster, '-cutline',
invector, '-crop_to_cutline', '-overwrite'])
return(response)
In [21]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_out")) # same location, but adding _out to the output
inshape = "../scratch/subcat.shp"
clip_raster(inraster, outraster, inshape)
Out[21]:
Consider the data set of the provinces we called from the WFS server earlier:
In [22]:
provinces
Out[22]:
We can actually use a selection of the provinces data set to execute the clipping:
In [23]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_OostVlaanderen"))
invector = "../scratch/provinces.geojson"
subprocess.call(['gdalwarp', inraster, outraster, '-cutline', invector,
'-cwhere', "NAAM='OOST-VLAANDEREN'",
'-crop_to_cutline',
'-overwrite'])
Out[23]:
In [24]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_OostVlaanderen.tif')
plt.imshow(img)
Out[24]:
By having it as a Python call, we can do the same action for each of the individual provinces in the dataset and create for each of the provinces a clipped raster data set:
In [25]:
import ogr
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
invector = "../scratch/provinces.geojson"
# GDAL magic...
ds = ogr.Open(invector)
lyr = ds.GetLayer(0)
lyr.ResetReading()
ft = lyr.GetNextFeature()
# clipping for each of the features (provincesin this case)
while ft:
province_name = ft.GetFieldAsString('NAAM')
print(province_name)
outraster = inraster.replace('.tif', '_%s.tif' % province_name.replace('-', '_'))
subprocess.call(['gdalwarp', inraster, outraster, '-cutline', invector,
'-crop_to_cutline', '-cwhere', "NAAM='%s'" %province_name])
ft = lyr.GetNextFeature()
ds = None
In [26]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_West_Vlaanderen.tif') # check also Antwerpen,...
plt.imshow(img)
Out[26]:
In [ ]: