In [ ]:
import ogr
import subprocess

import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline

GDAL command line

Example: reprojection

GDAL is a really powerful library for handling GIS data. It provides a number of functionalities to interact with spatial data. As a typical example, take the reprojection of a shapefile to another CRS:


In [ ]:
!ogr2ogr ../scratch/deelbekkens_wgs84 -t_srs "EPSG:4326" ../data/deelbekkens/Deelbekken.shp

What is this combination of commands?

  • ! this is a jupyter notebook-thing, telling it we're running something on the command line instead of in Python
  • ../scratch/deelbekkens_wgs84 the output location of the created file
  • -t_srs "EPSG:4326" the CRS information for to which the data should be projected
  • ../data/deelbekkens/Deelbekken.shp the source file location

The documentation is a bit overwhelming:


In [ ]:
!ogr2ogr --help

but there are great online resources with good examples you can easily copy paste for your own applications...

Example 2: Accessing online webservice data (Web Feature Service - WFS)

A lot of expensive terminology...

Let's illustrate this with an example: The information about municipalities is available as open data on geopunt (coming from informatie Vlaanderen). The publication is provided as a WFS service...

Take home message -> GDAL can handle WFS web services ;-)

Downloading the province boundaries from the WFS service provided by informatie Vlaanderen/Geopunt to a geojson file is as follows:


In [ ]:
!ogr2ogr -f 'Geojson' ../scratch/provinces.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/VRBG/wfs" Refprv

We can start working with this date...


In [ ]:
provinces = gpd.read_file("../scratch/provinces.geojson")
provinces.plot()

Actually, GDAL can directly query the WFS data:

Let's say I only need the province of Antwerp:


In [ ]:
!ogr2ogr -f 'Geojson' ../scratch/antwerp_prov.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/VRBG/wfs" Refprv -where "NAAM = 'Antwerpen'"

In [ ]:
antwerp = gpd.read_file("../scratch/antwerp_prov.geojson")
antwerp.plot()

I do know that the Meetplaatsen Oppervlaktewaterkwaliteit are also available as a WFS web service. However, I'm only interested in the locations for fytoplankton:


In [ ]:
!ogr2ogr -f 'Geojson' ../scratch/metingen_fytoplankton.geojson WFS:"https://geoservices.informatievlaanderen.be/overdrachtdiensten/MeetplOppervlwaterkwal/wfs" Mtploppw -where "FYTOPLANKT = '1'"

In [ ]:
import mplleaflet
fyto = gpd.read_file("../scratch/metingen_fytoplankton.geojson")
fyto.head()
fyto.to_crs('+init=epsg:4326').plot(markersize=5)
mplleaflet.display()

In [ ]:
fyto.head()

Actually, the same type of subselection are also possible on shapefiles,...

Extracting a specific DEELBEKKEN from the deelbekken shapefile:


In [ ]:
!ogr2ogr ../scratch/subcat.shp ../data/deelbekkens/Deelbekken.shp -where "DEELBEKKEN = '10-10'"

(If you're wondering how I know how to setup these commands and arguments, check the (draft) introduction_webservices.ipynb in the scratch folder.
I use the python interafce of GDAL/OGR and the package owslib to find out how to setup the arguments.)

GDAL command line, but inside Python...

No problem if this is still unclear... an example application!

Clipping example

The example we will use is to clip raster data using a shapefile. We use a data set from natural earth, which we will unzip to start working on it (off course using Python itself):


In [ ]:
import zipfile
zip_ref = zipfile.ZipFile("../data/NE1_50m_SR.zip", 'r')
zip_ref.extractall("../scratch")
zip_ref.close()

The GDAL function that support the clipping of a raster file, is called gdalwarp. Again, the documentation looks rather overwhelming... Let's start with an example execution:


In [ ]:
!gdalwarp ../scratch/NE1_50M_SR/NE1_50M_SR.tif ../scratch/cliptest.tif -cutline "../scratch/subcat.shp" -crop_to_cutline
  • ! this is a jupyter notebook-thing, telling it we're running something on the command line instead of in Python
  • gdalwarp is the GDAL command to use
  • ../scratch/NE1_50M_SR/NE1_50M_SR.tif the source file location
  • ../scratch/cliptest.tif the output location of the created file
  • -cutline "../scratch/subcat.shp" the shape file to cut the raster with
  • -crop_to_cutline an additional argument to GDAL to make the clipping

In [ ]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/cliptest.tif')
plt.imshow(img)

This is off course a dummy example (to keep runtime low), but it illustrates the concept.

the subprocess trick...

Doing the same using pure Python code and from osgeo import gdal is actually not so beneficial, as the command above is reather straighforward... However, the dependency of the command line provides a switch of environment in any data analysis pipeline. I actually do want to have the best of both worlds: Using Python code, but running the command line version of GDAL...

...therefore we need subprocess!


In [ ]:
import subprocess

Doing the same as above, but actually using Python code to run the command with given variables as input:


In [ ]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_out")) # same location, but adding _out to the output 
inshape = "../scratch/subcat.shp"
subprocess.call(['gdalwarp', inraster, outraster, '-cutline', inshape, 
                     '-crop_to_cutline', '-overwrite'])

Remark when GDAL provides a zero as return statement, this is a GOOD sign!


In [ ]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_out.tif')
plt.imshow(img)

Hence, the result is the same, but calling the command from Python. By writing a Python function for this routine, I do have a reusable functionality in my toolbox that I can load in any other Python script:


In [ ]:
def clip_raster(inraster, outraster, invector):
    """clip a raster image with a vector file
    
    Parameters
    ----------
    inraster : GDAL compatible raster format
    outraster : GDAL compatible raster format
    invector : GDAL compatible vector format
    """
    response = subprocess.call(['gdalwarp', inraster, outraster, '-cutline', 
                                invector, '-crop_to_cutline', '-overwrite'])
    return(response)

In [ ]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_out")) # same location, but adding _out to the output 
inshape = "../scratch/subcat.shp"
clip_raster(inraster, outraster, inshape)

More advanced clipping

Consider the data set of the provinces we called from the WFS server earlier:


In [ ]:
provinces

We can actually use a selection of the provinces data set to execute the clipping:


In [ ]:
inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
outraster = inraster.replace('.tif', '{}.tif'.format("_OostVlaanderen"))    
invector = "../scratch/provinces.geojson"
subprocess.call(['gdalwarp', inraster, outraster, '-cutline', invector, 
                 '-cwhere', "NAAM='OOST-VLAANDEREN'", 
                 '-crop_to_cutline', 
                 '-overwrite'])

In [ ]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_OostVlaanderen.tif')
plt.imshow(img)

By having it as a Python call, we can do the same action for each of the individual provinces in the dataset and create for each of the provinces a clipped raster data set:


In [ ]:
import ogr

inraster = '../scratch/NE1_50M_SR/NE1_50M_SR.tif'
invector = "../scratch/provinces.geojson"

# GDAL magic...
ds = ogr.Open(inshape)
lyr = ds.GetLayer(0)

lyr.ResetReading()
ft = lyr.GetNextFeature()

# clipping for each of the features (provincesin this case)
while ft:

    province_name = ft.GetFieldAsString('NAAM')
    print(province_name)

    outraster = inraster.replace('.tif', '_%s.tif' % province_name.replace('-', '_'))    
    subprocess.call(['gdalwarp', inraster, outraster, '-cutline', inshape, 
                     '-crop_to_cutline', '-cwhere', "NAAM='%s'" %province_name])

    ft = lyr.GetNextFeature()

ds = None

In [ ]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
img=mpimg.imread('../scratch/NE1_50M_SR/NE1_50M_SR_West_Vlaanderen.tif') # check also Antwerpen,...
plt.imshow(img)

In [ ]: