In [ ]:
import fiona
import geopandas as gpd
from osgeo import gdal
from osgeo import ogr
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Fiona is a minimalist python package for reading (and writing) vector data in python. Fiona provides python objects (e.g. a dictionary for each record) to geospatial data in various formats.
Reading in vector files with Fiona is - good practice - done using the with environment:
In [ ]:
with fiona.open('../data/deelbekkens/Deelbekken.shp') as deelbekkens:
feature = next(iter(deelbekkens)) # Just one checking the first
print("Bekken: ", feature['properties']['BEKNAAM'])
print("Vectortype: ", feature['geometry']['type'])
print(feature['geometry']['coordinates'][0][0])
This is actually equivalent to:
In [ ]:
deelbekkens = fiona.open('../data/deelbekkens/Deelbekken.shp')
feature = next(iter(deelbekkens))
print("Bekken: ", feature['properties']['BEKNAAM'])
print("Stroomgebied: ", feature['properties']['STRMGEB'])
print("Vectortype: ", feature['geometry']['type'])
deelbekkens.close()
But people tend to forget to use the close(!)
The projection information is interpreted by Fiona as well:
In [ ]:
with fiona.open('../data/deelbekkens/Deelbekken.shp') as deelbekkens:
#print(deelbekkens.crs_wkt)
print(deelbekkens.crs)
print(deelbekkens.crs_wkt)
Actually, **GeoPandas relies on Fiona** and translates this information into a tabular information, compatible to Pandas:
In [ ]:
df_deelbekkens = gpd.read_file('../data/deelbekkens/Deelbekken.shp')
df_deelbekkens.head()
The capabilities to read file with Fiona is dependent on the available drivers on your computer. You can get an overview by asking fiona for the supported drivers:
In [ ]:
fiona.supported_drivers
It provides also information about what you can do with the different types of data:
r: readinga: appending (adding elements)w: writing
In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors7.py
In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors8.py
So, what about GDAL?
In [ ]:
driver = ogr.GetDriverByName('ESRI Shapefile')
deelbekken_gdal = driver.Open('../data/deelbekkens/Deelbekken.shp')
deelbekken_gdal.GetLayerCount()
In [ ]:
deelbekken_layer = deelbekken_gdal.GetLayer()
print("The number of features of the deelbekken shapefile is: ", deelbekken_layer.GetFeatureCount())
That's why they call it not pythonic...
However, GDAL is powerfull for other activities, we'll come back to that later. Moreover, it supports mostly more file formats as fiona (GeoPandas) provide:
In [ ]:
!ogr2ogr --formats
For example, the interaction with an Esri File GeoDataBase, is provided in this example...
Shapely is a Python library for geometric operations using the **GEOS** library.
Shapely can perform:
In [ ]:
from shapely.geometry import LineString, Polygon
In [ ]:
line = LineString([(0, 0), (1, 1), (0, 2), (2, 2), (3, 1), (1, 0)])
line
In [ ]:
dilated = line.buffer(0.5, cap_style=3)
dilated
Getting back the coorindates as x, y arrays:
In [ ]:
polygon = Polygon(LineString([(0.2, 0.2), (0.2, 1), (1, 1), (1, 0.2)]))
polygon
In [ ]:
line.coords.xy
Binary predicates to check if the line is contained by the polygon
In [ ]:
line.within(polygon)
In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors29.py
Actually, **GeoPandas relies on shapely** and uses these geometric properties and operations
In [ ]:
deelbekkens = gpd.read_file("../data/deelbekkens/Deelbekken.shp")
In [ ]:
demer = deelbekkens[deelbekkens["BEKNAAM"] == "Demerbekken"]
In [ ]:
demer.plot()
demer.convex_hull.plot()
Pyproj handles the Spatial Reference Systems (SRS) transformations. Actually, the only items to remember are:
pyproj.Proj: define a SRSpyproj.transform: execute a transformation between two defined SRS
In [ ]:
from pyproj import Proj, transform
The easiest way is mostly just using the EPSG code:
Lambert 72:
In [ ]:
srs_lambert = Proj(init='epsg:31370')
In [ ]:
srs_lambert.srs
WGS84:
In [ ]:
srs_wgs84 = Proj(init='epsg:4326')
In [ ]:
srs_wgs84.is_latlong()
Other methods to define the SRS are also available:
In [ ]:
utm32 = Proj(proj="utm", zone="32")
The transform function takes the two defined SRS objects, together with an X/Y coordinate combination
In [ ]:
transform(srs_lambert, srs_wgs84, 98710.32800000161, 162573.7030000016)
Actually, **GeoPandas relies on pyproj** to perform SRS transformations using the EPSG code (but calls it `coordinate reference system (CRS)`...)
In [ ]:
deelbekkens = gpd.read_file("../data/deelbekkens/Deelbekken.shp")
deelbekkens.crs
In [ ]:
deelbekkens.to_crs(epsg="4326").head()
Actually, the function is useful in general to transform the X/Y combinations in any coordinate table. Consider the following example of species data with the coordinates in WGS84.
In [ ]:
inv_data = pd.read_excel("../data/invasive_extract.xlsx")
In [ ]:
inv_data.head()
Conversion to Lambert 72 is supported by the pyproj Package as follows. As I want to support future usage of this functionality as well, I'll write my own custom function:
In [ ]:
def transform_dfrow_projection(row, x_name, y_name, srs_from, srs_to):
"""
Converts the x and y coordinates of a given DataFrame row into a Series of the
longitude and latitude.
"""
from pyproj import transform
return pd.Series(transform(srs_from, srs_to, row[x_name], row[y_name]))
For a single row, this looks like:
In [ ]:
transform_dfrow_projection(inv_data.loc[0], "decimalLongitude", "decimalLatitude",
srs_wgs84, srs_lambert)
Applying this to the entire table:
In [ ]:
inv_data[["x", "y"]] = inv_data.apply(transform_dfrow_projection, axis=1,
args=("decimalLongitude", "decimalLatitude",
srs_wgs84, srs_lambert))
This feels too complicated?!?Understandable ,but let's just once take the effort to compile this into a reusable function...
In [ ]:
def add_trf_coordinates(filename, x_name="decimalLongitude", y_name="decimalLatitude",
srs_from='epsg:4326', srs_to='epsg:31370',
new_x="X", new_y="Y"):
"""Add new X/Y columns with transformed SRS to a given Excel dataset with X/Y columns
Parameters
----------
filename : str
Path to excel file to read dat from
x_name : str
Column containing the x-coordinates (default decimalLongitude)
y_name : str
Column containing the y-coordinates (default decimalLatitude)
srs_from: str
epsg:n code for the current SRS of the coordinates
srs_to: str
epsg:n code for the SRS of the added coordinates
new_x: str
name of the added x-coordinate column
new_y: str
name of the added y-coordinate column
returns
-------
df with 2 additional columns
"""
df = pd.read_excel(filename)
srs_from = Proj(init=srs_from)
srs_to = Proj(init=srs_to)
df[[new_x, new_y]] = df.apply(transform_dfrow_projection, axis=1,
args=(x_name, y_name, srs_from, srs_to))
return df
In [ ]:
df_enriched = add_trf_coordinates("../data/invasive_extract.xlsx")
df_enriched.head()
Remark that GDAL also has a utility gdalsrsinfo to get/check the SRS information of GIS file:
In [ ]:
! gdalsrsinfo -o proj4 ../data/EUgrid10.geojson #try also wiht -o all
Or to execute the transformation of a dataset together with the transformation of the data type:
In [ ]:
!ogr2ogr -t_srs "EPSG:4326" -f "GeoJSON" "../scratch/deelbekken.geojson" "../data/deelbekkens/Deelbekken.shp"
(More on this in the next notebook)
Actually, we got a whole sequence of data types (objects) to end up with GeoDataFrames:
Our df_enriched is a Pandas DataFrame, so we can convert to a GeoDataFrame by converting the coordinates to a set of POINT geometries:
In [ ]:
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df_enriched["X"], df_enriched["Y"])]
crs = {'init': 'epsg:31370'}
geo_df = gpd.GeoDataFrame(df_enriched, crs=deelbekkens.crs, geometry=geometry)
In [ ]:
import mplleaflet
In [ ]:
fig, ax = plt.subplots()
geo_df.to_crs(epsg="4326").plot(markersize=10, ax=ax) # on the fly conversion to WGS84
mplleaflet.display()
In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors208.py
In [ ]:
# %load ../notebooks/_solutions/04-gis-python-vectors215.py
For those interested, also check the following example from Joris Vandenbossche, explaining how to derive the distance to the nearest green area for all adresses in the centre of Antwerp: