In [ ]:
import fiona
import geopandas as gpd
from osgeo import gdal
from osgeo import ogr

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Fiona: reading vector data

Fiona is a minimalist python package for reading (and writing) vector data in python. Fiona provides python objects (e.g. a dictionary for each record) to geospatial data in various formats.

Reading in vector files with Fiona is - good practice - done using the with environment:


In [ ]:
with fiona.open('../data/deelbekkens/Deelbekken.shp') as deelbekkens:
    feature = next(iter(deelbekkens)) # Just one checking the first
    print("Bekken: ", feature['properties']['BEKNAAM'])
    print("Vectortype: ", feature['geometry']['type'])
    print(feature['geometry']['coordinates'][0][0])

This is actually equivalent to:


In [ ]:
deelbekkens = fiona.open('../data/deelbekkens/Deelbekken.shp') 
feature = next(iter(deelbekkens))
print("Bekken: ", feature['properties']['BEKNAAM'])
print("Stroomgebied: ", feature['properties']['STRMGEB'])
print("Vectortype: ", feature['geometry']['type'])
deelbekkens.close()

But people tend to forget to use the close(!)

REMEMBER:
  • Instead of `open`-ing a file, do something and `close` again, use `with` (general Python advice)

The projection information is interpreted by Fiona as well:


In [ ]:
with fiona.open('../data/deelbekkens/Deelbekken.shp') as deelbekkens:
    #print(deelbekkens.crs_wkt)
    print(deelbekkens.crs)
    print(deelbekkens.crs_wkt)

Actually, **GeoPandas relies on Fiona** and translates this information into a tabular information, compatible to Pandas:


In [ ]:
df_deelbekkens = gpd.read_file('../data/deelbekkens/Deelbekken.shp')
df_deelbekkens.head()

The capabilities to read file with Fiona is dependent on the available drivers on your computer. You can get an overview by asking fiona for the supported drivers:


In [ ]:
fiona.supported_drivers

It provides also information about what you can do with the different types of data:

  • r: reading
  • a: appending (adding elements)
  • w: writing
EXERCISE:
  • Open the `EUgrid10.geojson` file in the data folder using the Fiona package and check the `type` of `geometry` and the `cellcode` in the `properties` of the first feature in the dataset

In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors7.py
EXERCISE:
  • Read again the EUgrid10.geojson file in the data folder, but use the GeoPandas library and show only the first feature

In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors8.py

So, what about GDAL?


In [ ]:
driver = ogr.GetDriverByName('ESRI Shapefile')
deelbekken_gdal = driver.Open('../data/deelbekkens/Deelbekken.shp')
deelbekken_gdal.GetLayerCount()

In [ ]:
deelbekken_layer = deelbekken_gdal.GetLayer()
print("The number of features of the deelbekken shapefile is: ", deelbekken_layer.GetFeatureCount())

That's why they call it not pythonic...

However, GDAL is powerfull for other activities, we'll come back to that later. Moreover, it supports mostly more file formats as fiona (GeoPandas) provide:


In [ ]:
!ogr2ogr --formats

For example, the interaction with an Esri File GeoDataBase, is provided in this example...

REMEMBER:
  • try `GeoPandas` first!
  • If that does not work, check if your available GDAL drivers support the format.

Shapely: working with GEOMetries

Shapely is a Python library for geometric operations using the **GEOS** library.

Shapely can perform:

  • geometry validation
  • geometry creation (e.g. collections)
  • geometry operations

In [ ]:
from shapely.geometry import LineString, Polygon

In [ ]:
line = LineString([(0, 0), (1, 1), (0, 2), (2, 2), (3, 1), (1, 0)])
line

In [ ]:
dilated = line.buffer(0.5, cap_style=3)
dilated

Getting back the coorindates as x, y arrays:


In [ ]:
polygon = Polygon(LineString([(0.2, 0.2), (0.2, 1), (1, 1), (1, 0.2)]))
polygon

In [ ]:
line.coords.xy

Binary predicates to check if the line is contained by the polygon


In [ ]:
line.within(polygon)
EXERCISE:
  • Check if the polygon intersects with the line object

In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors29.py

Actually, **GeoPandas relies on shapely** and uses these geometric properties and operations


In [ ]:
deelbekkens = gpd.read_file("../data/deelbekkens/Deelbekken.shp")

In [ ]:
demer = deelbekkens[deelbekkens["BEKNAAM"] == "Demerbekken"]

In [ ]:
demer.plot()
demer.convex_hull.plot()

pyproj: handling projections

Pyproj handles the Spatial Reference Systems (SRS) transformations. Actually, the only items to remember are:

  • pyproj.Proj: define a SRS
  • pyproj.transform: execute a transformation between two defined SRS

In [ ]:
from pyproj import Proj, transform

Define SRS

The easiest way is mostly just using the EPSG code:

Lambert 72:


In [ ]:
srs_lambert = Proj(init='epsg:31370')

In [ ]:
srs_lambert.srs

WGS84:


In [ ]:
srs_wgs84 = Proj(init='epsg:4326')

In [ ]:
srs_wgs84.is_latlong()

Other methods to define the SRS are also available:


In [ ]:
utm32 = Proj(proj="utm", zone="32")

Transform coordinates

The transform function takes the two defined SRS objects, together with an X/Y coordinate combination


In [ ]:
transform(srs_lambert, srs_wgs84, 98710.32800000161, 162573.7030000016)

Actually, **GeoPandas relies on pyproj** to perform SRS transformations using the EPSG code (but calls it `coordinate reference system (CRS)`...)


In [ ]:
deelbekkens = gpd.read_file("../data/deelbekkens/Deelbekken.shp")
deelbekkens.crs

In [ ]:
deelbekkens.to_crs(epsg="4326").head()

Actually, the function is useful in general to transform the X/Y combinations in any coordinate table. Consider the following example of species data with the coordinates in WGS84.


In [ ]:
inv_data = pd.read_excel("../data/invasive_extract.xlsx")

In [ ]:
inv_data.head()

Conversion to Lambert 72 is supported by the pyproj Package as follows. As I want to support future usage of this functionality as well, I'll write my own custom function:


In [ ]:
def transform_dfrow_projection(row, x_name, y_name, srs_from, srs_to):
    """
    Converts the x and y coordinates of a given DataFrame row into a Series of the
    longitude and latitude.
    """
    from pyproj import transform
    return pd.Series(transform(srs_from, srs_to, row[x_name], row[y_name]))

For a single row, this looks like:


In [ ]:
transform_dfrow_projection(inv_data.loc[0], "decimalLongitude", "decimalLatitude", 
                           srs_wgs84, srs_lambert)

Applying this to the entire table:


In [ ]:
inv_data[["x", "y"]] = inv_data.apply(transform_dfrow_projection, axis=1, 
                                      args=("decimalLongitude", "decimalLatitude", 
                                            srs_wgs84, srs_lambert))

This feels too complicated?!?Understandable ,but let's just once take the effort to compile this into a reusable function...


In [ ]:
def add_trf_coordinates(filename, x_name="decimalLongitude", y_name="decimalLatitude", 
                        srs_from='epsg:4326', srs_to='epsg:31370', 
                        new_x="X", new_y="Y"):
    """Add new X/Y columns with transformed SRS to a given Excel dataset with X/Y columns
    
    Parameters
    ----------
    filename : str
        Path to excel file to read dat from
    x_name : str
        Column containing the x-coordinates (default decimalLongitude)
    y_name : str
        Column containing the y-coordinates (default decimalLatitude)  
    srs_from: str
        epsg:n code for the current SRS of the coordinates
    srs_to: str
        epsg:n code for the SRS of the added coordinates
    new_x: str
        name of the added x-coordinate column
    new_y: str
        name of the added y-coordinate column    
    
    returns
    -------
    df with 2 additional columns
    """
    df = pd.read_excel(filename)
    srs_from = Proj(init=srs_from)
    srs_to = Proj(init=srs_to)
    df[[new_x, new_y]] = df.apply(transform_dfrow_projection, axis=1, 
                                  args=(x_name, y_name, srs_from, srs_to))
    return df

In [ ]:
df_enriched = add_trf_coordinates("../data/invasive_extract.xlsx")
df_enriched.head()

Remark that GDAL also has a utility gdalsrsinfo to get/check the SRS information of GIS file:


In [ ]:
! gdalsrsinfo -o proj4 ../data/EUgrid10.geojson  #try also wiht -o all

Or to execute the transformation of a dataset together with the transformation of the data type:


In [ ]:
!ogr2ogr -t_srs "EPSG:4326" -f "GeoJSON" "../scratch/deelbekken.geojson" "../data/deelbekkens/Deelbekken.shp"

(More on this in the next notebook)

Geopandas: Pandas + geometries

Actually, we got a whole sequence of data types (objects) to end up with GeoDataFrames:

  • lists/dictionaries -> standard Python library
  • Numpy arrays -> table of elements (usually numbers), all of the same type, indexed by positive integers
  • DataFrames -> Labeled Numpy arrays
  • GeoDataFrames -> DataFrames with geometry and SRS information attached to it

Our df_enriched is a Pandas DataFrame, so we can convert to a GeoDataFrame by converting the coordinates to a set of POINT geometries:


In [ ]:
from shapely.geometry import Point

geometry = [Point(xy) for xy in zip(df_enriched["X"], df_enriched["Y"])]
crs = {'init': 'epsg:31370'}
geo_df = gpd.GeoDataFrame(df_enriched, crs=deelbekkens.crs, geometry=geometry)

In [ ]:
import mplleaflet

In [ ]:
fig, ax = plt.subplots()
geo_df.to_crs(epsg="4326").plot(markersize=10, ax=ax) # on the fly conversion to WGS84
mplleaflet.display()
EXERCISE:
  • Add the UIDN and BEKNAAM names from the `deelbekkens` data to the occurrence data `df_enriched` with a **spatial JOIN**

In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors208.py
EXERCISE:
  • Plot the `geo_df` together with a buffered version of 500m of each of the individual points, use mplleaflet to make the plot interactive

In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors215.py

For those interested, also check the following example from Joris Vandenbossche, explaining how to derive the distance to the nearest green area for all adresses in the centre of Antwerp: