pydov provides machine access to the data that can be visualized with the DOV viewer.
All the pydov functionalities rely on the existing DOV webservices. An in-depth overview of the available services and endpoints is provided on the accessing DOV data page. To retrieve data, pydov uses a combination of the available WFS services and the XML representation of the core DOV data.
As pydov relies on the XML data returned by the existing DOV webservices, downloading DOV data with pydov is governed by the same disclaimer that applies to the other DOV services. Be sure to consult it when using pydov!
pydov interfaces data and services hosted by the Flemish governement. Therefore, some syntax of the API as well as the descriptions provided by the backend are in Dutch.
In [1]:
%matplotlib inline
import inspect, sys
In [2]:
import pydov
import pandas as pd
To get started with pydov you should first determine which information you want to search for. DOV provides a lot of different datasets about soil, subsoil and groundwater of Flanders, some of which can be queried using pydov. Supported datasets are listed in the quickstart.
In this case, to start with a hydrogeological model, we are interested in the hydrostratigraphic interpretation of the borehole data and the groundwater level. These datasets can be found with the following search objects:
Indeed, each of the datasets can be queried using a search object for the specific dataset. While the search objects are different, the workflow is the same for each dataset. Relevant classes can be imported from the pydov.search package, for example if we’d like to query the dataset with hydrostratigraphic interpretations of borehole data:
In [3]:
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
hs = HydrogeologischeStratigrafieSearch()
If you would like some more information or metadata about the data you can retrieve, you can query the search object. Since pydov interfaces services and metadata from Flemish government agencies, the descriptions are in Dutch:
In [4]:
hs.get_description()
Out[4]:
The different fields that are available for objects of the 'Hydrogeologische Stratigrafie' datatype can be requested with the get_fields() method:
In [5]:
fields = hs.get_fields()
# print available fields
for f in fields.values():
print(f['name'])
You can get more information of a field by requesting it from the fields dictionary:
In [6]:
fields['pkey_interpretatie']
Out[6]:
The fields pkey_interpretatie
and pkey_boring
are important identifiers. In this case pkey_interpretatie
is the unique identifier of this interpretation and is also the permanent url where the data can be consulted (~https://www.dov.vlaanderen.be/data/interpretatie/...). You can retrieve an XML representation by appending '.xml' to the URL, or a JSON equivalent by appending '.json'.
The pkey_boring
is the identifier of the borehole from which this interpretation was made. As mentioned before, it is also the permanent url (~https://www.dov.vlaanderen.be/data/boring/...).
Optionally, if the values of a field have a specific domain the possible values are listed as values:
In [7]:
fields['aquifer']['values']
Out[7]:
The data can be queried on attributes, location or both. To query on attributes, the OGC filter functions from OWSLib are used:
In [8]:
# list available query methods
methods = [i for i,j in inspect.getmembers(sys.modules['owslib.fes'],
inspect.isclass)
if 'Property' in i]
print(*methods, sep = "\n")
If you are for example interested in all the hydrostratigraphic interpretations in the city of Leuven, you compose the query like below (mind that the values are in Dutch):
In [9]:
from owslib.fes import PropertyIsEqualTo
query = PropertyIsEqualTo(
propertyname='gemeente',
literal='Leuven')
dfhs = hs.search(query=query)
dfhs.head()
Out[9]:
This yielded 38 interpretations from 38, or less, boreholes. It can be less than 38 boreholes because multiple interpretations can be made of a single borehole.
If you would like to narrow the search down to for example interpretations deeper than 200 meters, you can combine features in the search using the logical operators And, Or provided by OWSLib:
In [10]:
from owslib.fes import And
from owslib.fes import PropertyIsGreaterThan
query = And([
PropertyIsEqualTo(
propertyname='gemeente',
literal='Leuven'),
PropertyIsGreaterThan(
propertyname='diepte_tot_m',
literal='200')
])
dfhs = hs.search(query=query)
dfhs.head()
Out[10]:
Mind the difference between attributes diepte_tot_m
and diepte_laag_...
. The former is defined in the WFS service and can be used as attribute in the query. The latter attributes are defined in the linked XML document, from which the information is only available after it has been gathered from the DOV webservice. All the attributes with cannot be used in the intial query and should be used in a subsequent filtering of the Pandas DataFrame.
More information on querying attribute properties is given in the docs. Worth mentioning is the query using lists where pydov extends the default OGC filter expressions described with a new expression PropertyInList that allows you to use lists (of strings) in search queries.
One last goodie is the possibility to join searches using common attibutes. For example the pkey_boring
field, denoting the borehole. As such, you can get the boreholes for which a hydrostratigraphical interpretation is available, and also query the lithological description of that borehole. Like below:
In [11]:
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch
ls = LithologischeBeschrijvingenSearch()
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],
how='left',
left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'],
right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
)
df_joined.head()
Out[11]:
One can also query on location, using the location objects and spatial filters from the pydov.util.location module. For example, to request all hydrostratigraphic interpretations in a given bounding box:
In [12]:
from pydov.util.location import Within, Box
location = Within(Box(170000, 171000, 172000, 173000))
df = hs.search(location=location)
df.head()
Out[12]:
Alternatively, you can define a Point or a GML document for the spatial query as is described in the docs. For example, if you are interested in a site you can define the point with a search radius of for example 500 meters like this:
In [13]:
from pydov.util.location import WithinDistance, Point
location = WithinDistance(
Point(171500, 172500),
500,
distance_unit='meter'
)
df = hs.search(location=location)
df.head()
Out[13]:
Querying the groundwater head data follows the same workflow as mentioned above for the interpretation of borehole data with the instantiation of a search object and the subsequent query with selection on attribute or location properties.
In [14]:
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
gws = GrondwaterFilterSearch()
fields = gws.get_fields()
# print available fields
for f in fields.values():
print(f['name'])
For example query all data in a bounding box from screens that are situated in the phreatic aquifer:
In [15]:
query = PropertyIsEqualTo(
propertyname='regime',
literal='freatisch')
location = Within(Box(170000, 171000, 173000, 174000))
df = gws.search(
query=query,
location=location)
df.head()
Out[15]:
One important difference is the presence of time-related data. More specifically the attributes datum
and tijdstip
. These can be combined to create a date.datetime object that can be used in the subsequent manipuliation of the Pandas DataFrame. Make sure to remove the records without a valid datum
and fill the empty tijdstip
fields with a default timestamp (!)
In [16]:
import pandas as pd
df.reset_index(inplace=True)
df = df.loc[~df.datum.isna()]
df['tijdstip'] = df.tijdstip.fillna('00:00:00')
df['tijd'] = pd.to_datetime(df.datum.astype(str) + ' ' + df.tijdstip.astype(str))
df.tijd.head()
Out[16]:
More examples for the timeseries processing and analysis is available in the Notebooks of pydov.
Notice the cc in the progress bar while loading of the data? It means the data was loaded from your local cache instead of being downloaded, as it was already part of an earlier data request. See the caching documentation for more in-depth information about the default directory, how to change and/or clean it, and even how to create some custom cache format.
In [17]:
# imports
import pandas as pd
import pydov
from pydov.util.location import WithinDistance, Point
from pydov.util.query import Join
from pydov.search.interpretaties import LithologischeBeschrijvingenSearch
from pydov.search.interpretaties import HydrogeologischeStratigrafieSearch
from pydov.search.grondwaterfilter import GrondwaterFilterSearch
from owslib.fes import PropertyIsEqualTo
# define search objects
hs = HydrogeologischeStratigrafieSearch()
ls = LithologischeBeschrijvingenSearch()
gws = GrondwaterFilterSearch()
# search hydrostratigraphic interpretations based on location
location = WithinDistance(
Point(171500, 172500),
500,
distance_unit='meter'
)
dfhs = hs.search(location=location)
# join the lithostratigraphic desriptions
dfls = ls.search(query=Join(dfhs, 'pkey_boring'))
df_joined = pd.merge(dfhs, dfls.loc[:, ['pkey_boring','diepte_laag_van', 'diepte_laag_tot', 'beschrijving']],
how='left',
left_on=['pkey_boring','diepte_laag_van', 'diepte_laag_tot'],
right_on = ['pkey_boring','diepte_laag_van', 'diepte_laag_tot']
)
# search the groundwater head data of the phreatic aquifers in the neighbourhoud
query = PropertyIsEqualTo(
propertyname='regime',
literal='freatisch')
dfgw = gws.search(query=query,
location=location)
# create date.datetime objects for further processing
dfgw.reset_index(inplace=True)
dfgw = dfgw.loc[~dfgw.datum.isna()]
dfgw['tijdstip'] = dfgw.tijdstip.fillna('00:00:00')
dfgw['tijd'] = pd.to_datetime(dfgw.datum.astype(str) + ' ' + dfgw.tijdstip.astype(str))
In [18]:
df_joined.head()
Out[18]:
In [19]:
dfgw.head()
Out[19]: