In [1]:
%matplotlib inline
import pandas as pd
import nivapy3 as nivapy
import matplotlib.pyplot as plt

plt.style.use('ggplot')

Explore "core" ICPW data

Prior to updating the "core" ICPW datasets in RESA, I need to get an overview of what's already in the database and what isn't.


In [2]:
# Connect to db
eng = nivapy.da.connect()


Connection successful.

1. Query ICPW projects

The are 18 projects (one for each country) currently in RESA. We also have data for some countries that do not yet have a project defined (e.g. the Netherlands).


In [3]:
# Query projects
prj_grid = nivapy.da.select_resa_projects(eng)
prj_grid


581 projects in the RESA database.

In [4]:
prj_df = prj_grid.get_selected_df()
print(len(prj_df))
prj_df


18
Out[4]:
project_id project_number project_name contact_person project_description
383 1679 None ICPWaters US None None
424 2120 None ICPWaters NO None None
425 2140 None ICPWaters CA None None
467 2923 None ICPWaters UK None None
468 2943 None ICPWaters FI None None
469 2944 None ICPWaters SE None None
473 2985 None ICPWaters BY None None
474 2986 None ICPWaters CZ None None
475 2987 None ICPWaters IT None None
476 2988 None ICPWaters PL None None
477 2989 None ICPWaters CH None None
478 2990 None ICPWaters LV None None
479 2991 None ICPWaters EE None None
517 3425 None ICPWaters ES None None
518 3445 None ICPWaters IE None None
536 3745 None ICPWaters ME None None
537 3746 None ICPWaters AM None None
380 1660 None ICPWaters DE None None

2. Get station list

There are 262 stations currently associated with the projects in RESA.


In [5]:
# Get stations
stn_df = nivapy.da.select_resa_project_stations(prj_df, eng)
print(len(stn_df))
stn_df.head()


262
Out[5]:
station_id station_code station_name latitude longitude altitude
0 221 BIE01 Birkenes 58.385426 8.241721 190.0
1 12080 STE01 Storgama v. dam 59.052333 8.653606 587.0
2 23505 DE01 Schwarzwald, Dürreychbach 48.750167 8.443167 730.0
3 23506 DE02 Fichtelgebirge, Eger 50.085000 11.824500 664.0
4 23507 DE03 Rothaargebirge, Elberndorfer Bach 50.987833 8.201333 540.0

In [6]:
# Map
nivapy.spatial.quickmap(stn_df, popup='station_code')


Out[6]:

3. Get parameters

Get a list of parameters available at these stations. I assume that all data submissions to ICPW will report pH, so extracting pH data should be a good way to get an indication of which stations actually have data.


In [7]:
# Select parameters
par_grid = nivapy.da.select_resa_station_parameters(stn_df,
                                                    '1970-01-01',
                                                    '2019-01-01',
                                                    eng)
par_grid


110 parameters available for the selected stations and dates.

In [9]:
# Get selected pars
par_df = par_grid.get_selected_df()
par_df


Out[9]:
parameter_id parameter_name unit
108 1 pH None

4. Get chemistry data


In [10]:
# Get data
wc_df, dup_df = nivapy.da.select_resa_water_chemistry(stn_df,
                                                      par_df,
                                                      '1970-01-01',
                                                      '2019-01-01',
                                                      eng,
                                                      lod_flags=False,
                                                      drop_dups=True)

wc_df.head()


Out[10]:
station_id station_code station_name sample_date depth1 depth2 pH_
0 108 LAE01 Langtjern, utløp 1972-09-01 00:00:00 0.0 0.0 5.04
1 108 LAE01 Langtjern, utløp 1972-12-11 00:00:00 0.0 0.0 4.92
2 108 LAE01 Langtjern, utløp 1973-04-04 15:15:00 0.0 0.0 4.80
3 108 LAE01 Langtjern, utløp 1973-04-05 17:40:00 0.0 0.0 4.69
4 108 LAE01 Langtjern, utløp 1973-04-06 12:00:00 0.0 0.0 4.67

In [11]:
# How many stations have pH data
len(wc_df['station_code'].unique())


Out[11]:
238

In [12]:
# Which stations do not have pH data?
all_stns = set(stn_df['station_code'].unique())
no_ph = list(all_stns - set(wc_df['station_code'].unique()))
no_ph_stns = stn_df.query('station_code in @no_ph').reset_index()
print(len(no_ph_stns))
no_ph_stns


24
Out[12]:
index station_id station_code station_name latitude longitude altitude
0 219 36451 ES08 CRAM (High Mountain Research Centre) NaN NaN NaN
1 220 36452 ES09 Lac Redon weather station NaN NaN NaN
2 221 36453 ES10 Aigüestortes weather station NaN NaN NaN
3 227 36459 SE20 Stormyrbäcken NaN NaN NaN
4 228 36460 SE21 Pipbäcken Nedre NaN NaN NaN
5 229 36461 SE22 Muddusälven NaN NaN NaN
6 230 36462 SE23 Lommabäcken Nedre NaN NaN NaN
7 231 36463 SE24 Lill-Fämtan NaN NaN NaN
8 232 36464 SE25 Laxtjärnsbäcken NaN NaN NaN
9 234 36465 ME 01 Black lake 43.143300 19.087500 NaN
10 235 36466 AM 1 Pambak v.Shirakamut 40.850000 44.233333 NaN
11 236 36467 AM 2 Pambak v.Vanadzor 40.800000 44.500000 NaN
12 237 36468 AM 3 Debed v.Airum 41.217000 44.883333 NaN
13 238 36469 AM 4 Sevjur v.Ranchpar 40.016667 44.366667 NaN
14 239 36470 AM 5 Hrazdan t. Masis 40.016667 44.433333 NaN
15 240 36471 AM 6 Arpa v. Areni 39.733333 45.200000 NaN
16 241 36472 AM 7 Voghchi c.Kapan 39.200000 46.400000 NaN
17 242 36473 AM 8 Meghriget c.Meghri 38.916667 46.233333 NaN
18 243 36474 AM 9 Gorisget c. Goris 39.516667 46.333333 NaN
19 244 36475 AM 10 Agstev c. Ijevan 40.866667 45.150000 NaN
20 245 36476 AM 11 Agstev v.Fioletovo 40.716667 44.733333 NaN
21 246 36477 AM 12 Marmarik v.Aghavnadzor 40.566667 44.683333 NaN
22 247 36478 AM 13 Vorotan v. Vorotan 39.466667 46.150000 NaN
23 248 37765 US127 Barnes Lake, Adirondacks 43.566700 -75.227800 393.0

In [13]:
# What data do these stations have?
par_grid2 = nivapy.da.select_resa_station_parameters(no_ph_stns,
                                                     '1970-01-01',
                                                     '2019-01-01',
                                                     eng)
par_grid2


0 parameters available for the selected stations and dates.

So, there are 262 stations within the "core" ICPW projects, but 24 of these have no data whatsoever associated with them (listed above).

5. Date for last sample by country

The code below gets the most recent pH sample in the database for each country.


In [14]:
# Most recent datab
for idx, row in prj_df.iterrows():
    # Get stations
    cnt_stns = nivapy.da.select_resa_project_stations([row['project_id'],], eng)
    
    # Get pH data
    wc, dups = nivapy.da.select_resa_water_chemistry(cnt_stns,
                                                     [1,],          # pH
                                                     '1970-01-01',
                                                     '2019-01-01',
                                                     eng,
                                                     lod_flags=False,
                                                     drop_dups=True)
    
    # Print results
    print(row['project_name'], '\t', len(cnt_stns), '\t', wc['sample_date'].max())


ICPWaters US 	 95 	 2012-12-31 00:00:00
ICPWaters NO 	 5 	 2018-11-12 00:00:00
ICPWaters CA 	 18 	 2012-12-18 00:00:00
ICPWaters UK 	 6 	 2012-03-07 00:00:00
ICPWaters FI 	 10 	 2012-12-12 11:00:00
ICPWaters SE 	 22 	 2013-10-23 00:00:00
ICPWaters BY 	 1 	 2010-12-09 00:00:00
ICPWaters CZ 	 8 	 2016-01-04 00:00:00
ICPWaters IT 	 6 	 2015-12-14 00:00:00
ICPWaters PL 	 4 	 2012-12-20 00:00:00
ICPWaters CH 	 9 	 2015-12-09 00:00:00
ICPWaters LV 	 8 	 2015-12-16 11:30:00
ICPWaters EE 	 1 	 2013-12-03 00:00:00
ICPWaters ES 	 10 	 2011-12-22 00:00:00
ICPWaters IE 	 10 	 2010-12-08 00:00:00
ICPWaters ME 	 1 	 nan
ICPWaters AM 	 13 	 nan
ICPWaters DE 	 35 	 2012-12-20 00:00:00

These results are largely as expected:

  • US. Complete up to 2012. John sent an entirely new dataset for the 2018 trends work, which can be used to replace the existing data series for the "core" stations as well

  • Norway. Transferred automatically

  • Canada. Partly processed for the trends work

  • UK. Partly processed for the trends work

  • Finland. Jussi sent an entirely new dataset (with changes suggested to site selections) in 2018

  • Sweden. Update using data collected via API. Also need to modify site selections

  • Belarus. Data up to 2010 are already in the database. Completed templates are available for 2012 to 2014 (2011 data are missing). No response to calls for data since 2014

  • Czech Republic. 2016 data needs to be uploaded

  • Italy. Data submitted during 2017 and 2018 need adding. Several new sites need creating

  • Poland. Recent data submissions seem very complex! Need to check through e-mails

  • Switzerland. Templates need combining and adding

  • Latvia. Data for 2016 and 2017 need combining and tidying

  • Estonia. No data for 2012 or 2014. Data from 2013, 2015, 2016 and 2017 needs merging and adding

  • Ireland. An entirely new set of stations and codes have been proposed. Some overlap with old stations, but need checking

  • Montenegro. Data from 2006 to 2009 are available in Excel. Need transferring to template and adding to database. Nothing since 2009

  • Armenia. Data from 2004 to 2008 are in Excel. Nothing since and no replies to annual Calls for Data

  • Germany. Excel templates cover 2015 to 2017. Nothing for 2013 or 2014? Some site codes have changes and some are no longer monitored. Also some old errors to fix - see e-mails?

  • Austria. Data up to 2012 are in Excel. No data since then

  • Moldova. Data from 2014 to 2017 in Excel. No project or stations in database

  • Netherlands. Excel data supplied in 2016. No idea what to make of these spreadsheets! No stations or project in RESA

  • Russia. Data provided from 2009 to 2014 in Excel. No stations or projects in database

  • Slovakia. Not currently in the "core" ICPW project, but data supplied by Jiri as part of the trends work. Fits well with Polish data