In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt, seaborn as sn, mpld3
import pandas as pd, imp, glob, os
from sqlalchemy import create_engine
sn.set_context('notebook')

Updated TOC trends analysis (part 2)

This notebook begins to explore trends in the broader water chemistry dataset collated by Heleen, Don and John for the updated TOC analysis. Note that the results presented here should be considered preliminary as there are still some outstanding database "cleaning" tasks that I haven't found time for yet. The reason I'm jumping ahead of myself here is because I suspect the best way to identify any further data issues is to attempt the trend analysis and see what problems crop up.

My original plan - as agreed with Heleen, Don and John in May - was to tidy up the data as much as possible and then simply re-run Tore's code for the trends calculations. Unfortunately, although I've managed to find most of the necessary code (either within RESA2 or as VBA projects within linked Access databases), I have been unable to locate some of the crucial subroutines, including the ones for the bulk of the statistical analysis. In addition, as far as I can tell, the code relies on an old, third-party Excel macro for the statistical calculations, and using this involves a lot of data shuffling (first from RESA2 to Excel, then into Access and finally back into RESA2), which is quite slow and difficult to keep track of.

What I have found in RESA2 is a table called ICPW_STATISTICS3, which appears to store the output of Tore's analysis. I think my best option is therefore to recode the whole thing from scratch, and then compare my results with those in Tore's table to make sure what I'm doing is more-or-less compatible with what's been done before. This will involve digging into the internlas of RESA2, which will hopefuily be a good learning experience for me as well.

1. Trend analysis code

This section provides an overview of the code I've written for the updated trends analysis. The code was initially developed in an earlier iPython notebook and later moved into a .py file to allow the trends functionality to be imported into other notebooks. This Python file is here:

C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015\Python\toc_trends_analysis.py

NB: The code in the Python file is slightly different and more sophisticated than in the original notebook. Use the Python file (not the notebook) as the basis for any further developments.

To run an analysis, the user must provide a list of valid RESA2 project names, plus start and end years for the period of interest. Optionally, it is possible to specify a folder where trend plots for each parameter will be stored. The code then performs the following operations:

Identifies all the monitoring stations associated with the specified projects.
Generates a list of water samples and sampling dates for all these stations within the period of interest. If some stations have no samples within the specified period, a warning message is printed and the list of stations with no data is included in the output.

Note that RESA2 sometimes includes measurements for samples collected at several different water depths. The code currently only selects near-surface water samples (depth $\leq 1 \; m$). Check this with Heleen.
Extracts the data for the key trends parameters (currently 'SO4', 'Cl', 'Ca', 'Mg', 'NO3-N', 'TOC' and 'Al', but this need amending - see below) for the selected water samples and converts the raw values from the units and methods originally specified into the RESA2's "standard" units (as specified in the RESA2.PARAMETER_DEFINITIONS table).
Checks for duplicate measurements of the same parameter at the same sampling point and date. If duplicates are found, the values are averaged and a warning message is printed. A list of duplicated records is then included in the output to facilitate data checking.
Sampled values are aggregated from the native data collection interval to annual frequency by taking medians.
For the parameters 'SO4', 'Cl', 'Mg', 'Ca' and 'NO3-N', concentrations are recalculated as $\mu eq/l$ (denoted by the prefix 'E' e.g. ESO4).

$$EPAR \; (\mu eq/l) = \frac{1.10^6 * valency}{molar \; mass \; (g/mol)} * PAR \; (g/l)$$
Sea-salt corrections are then applied for the parameters 'ESO4', 'EMg' and 'ECa'] (denoted by the suffix 'X' e.g. ESO4X).

$$EPARX = EPAR_{sample} - \left[ \left( \frac{EPAR}{ECl} \right)_{ref} * ECl_{sample} \right]$$

NB: I'm not sure what reference ratios were used in the original analysis. I've attempted to back-calculate them from RESA2, and it looks as though a single value has been assumed worldwide for each parameter? Check this with Heleen. Also, see section 4.1.2 of the code development notebook for more details.
Calculates combined parameters e.g. ESO4_ECl is calculated as $(ESO4 + ECl)$ etc.

At present, the code generates annual time series for the following parameters:
- ESO4 ($\mu eq/l$)
- ESO4X ($\mu eq/l$)
- ECl ($\mu eq/l$)
- ESO4_ECl ($\mu eq/l$)
- ECa_EMg ($\mu eq/l$)
- ECaX_EMgX ($\mu eq/l$)
- ENO3 ($\mu eq/l$)
- ESO4_ECl_ENO3 ($\mu eq/l$)
- TOC ($mg C/l$)
- Al ($mg/l$)
NB: It looks as though Tore's code calculates a few additional parameters as well:
- ANC
- ALK
- HPLUS
- ENO3_DIV_ENO3_ESO4X
These will be easy to add, but I'd like to check exactly which parameters are of interest and how they should be calculated before including them.
Performs statistical analysis for each parameter at each site for the period specified. The output includes the following summary statistics:
- Period over which data are available (i.e. start and end years, which are not always the same as the period originally specified)
- Number of non-missing values
- Median of data values
- Mean of data values
- Standard deviation of data values
- Standard deviation expected under the null hypothesis of the Mann-Kendall (M-K) test
- M-K statistic
- Normalised M-K statistic $\left(= \frac{M-K \; statistic}{Expected \; standard \; deviation} \right)$
- M-K p-value
- Sen's slope (a.k.a. the Theil-Sen slope). Optionally, a plot of the trend line can be produced.
Note that the algorithm uses the normal approximation to estimate the variance of the S parameter in the M-K test (and thereby the significance of any trends). This approximation is only robust when the number of non-null data points in the time series is $\geq 10$. If the number of non-missing values is fewer than 10, the code prints a warning message to indicate significance estimates may be unreliable.
The output from the algorithm consists of (i) a dataframe of summary statistics for each parameter at each site over the period of interest; (ii) a list of stations with no data in the period specified and (iii) a list of duplicated values (where the database contains two measurements of the same parameter at the same location on the same day). If the plot option is set to True when the function is called, the code will also output (iv) plots of the Theil-Sen regression line for each parameter, saved to the specified output folder.

2. Illustrative example and testing

This section illustrates how the code can be used to estimate trends. As a basic test, I'll start by reading some results from Tore's ICPW_STATISTICS3 table, which I believe stores the output from his statistical analyses. I'll then run my code for the same stations and time periods to make sure the results are comparable.

NB: My earlier notebook focussing on the code development includes some more rigorous testing.

2.1. Import modules and connect to database



In [2]:

    
# Import custom functions
# Connect to db
resa2_basic_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template'
                    r'\useful_resa2_code.py')

resa2_basic = imp.load_source('useful_resa2_code', resa2_basic_path)

engine, conn = resa2_basic.connect_to_resa2()

# Import code for trends analysis
resa2_trends_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015'
                     r'\Python\toc_trends_analysis.py')

resa2_trends = imp.load_source('toc_trends_analysis', resa2_trends_path)

2.2. Read previous results from RESA2

As an example, we'll extract some of Tore's results for one of the Czech sites (station_id = 23499) for the period from 1990 to 2004.



In [3]:

    
# Get results for test sites from RESA2
sql = ("SELECT * FROM resa2.icpw_statistics3 "
       "WHERE station_id = 23499 "
       "AND period = '1990-2004'")

old_df = pd.read_sql(sql, engine)

# Get just the cols to compare to my output
old_df = old_df[['station_id', 'parameter', 'period', 'nonmiss',
                   'average', 'median', 'stdev', 'test_stat', 
                   'mk_stat', 'mkp', 'senslope']]

old_df.head(14).sort_values(by='parameter')









    Out[3]:






  
    
      
      station_id
      parameter
      period
      nonmiss
      average
      median
      stdev
      test_stat
      mk_stat
      mkp
      senslope
    
  
  
    
      0
      23499
      AL
      1990-2004
      15
      486.052433
      503.500000
      186.876929
      -83
      -4.107435
      0.000040
      -32.700000
    
    
      1
      23499
      ALK
      1990-2004
      14
      -15.896429
      -15.850000
      6.270918
      20
      1.096542
      0.272842
      0.500000
    
    
      2
      23499
      ANC
      1990-2004
      12
      -47.948231
      -53.510865
      14.813228
      48
      3.291482
      0.000997
      3.178531
    
    
      3
      23499
      ECAEMG
      1990-2004
      15
      84.136540
      83.398845
      9.581870
      -75
      -3.711537
      0.000206
      -1.968997
    
    
      4
      23499
      ECAXEMGX
      1990-2004
      15
      79.221717
      78.338350
      9.248877
      -71
      -3.513589
      0.000442
      -1.886138
    
    
      5
      23499
      ECL
      1990-2004
      15
      21.093655
      21.718895
      2.980660
      -44
      -2.180106
      0.029250
      -0.435917
    
    
      6
      23499
      ENO3
      1990-2004
      12
      65.111360
      66.036608
      7.299949
      -40
      -2.742902
      0.006090
      -1.234397
    
    
      7
      23499
      ENO3DIVENO3ESO4X
      1990-2004
      12
      0.428599
      0.427500
      0.028221
      32
      2.194322
      0.028212
      0.006281
    
    
      8
      23499
      ESO4
      1990-2004
      15
      95.643491
      97.541580
      18.305102
      -97
      -4.800255
      0.000002
      -4.089638
    
    
      9
      23499
      ESO4CL
      1990-2004
      15
      116.737145
      118.934785
      20.274570
      -95
      -4.701281
      0.000003
      -4.449103
    
    
      10
      23499
      ESO4ECLENO3
      1990-2004
      12
      175.508512
      176.306256
      23.280313
      -52
      -3.565772
      0.000363
      -5.589727
    
    
      11
      23499
      ESO4X
      1990-2004
      15
      93.470844
      95.253690
      18.117947
      -97
      -4.800255
      0.000002
      -4.033607
    
    
      12
      23499
      HPLUSS
      1990-2004
      15
      18.383278
      18.506625
      4.313829
      -69
      -3.414614
      0.000639
      -0.795074
    
    
      13
      23499
      TOC_DOC
      1990-2004
      11
      1.561364
      1.385000
      0.554342
      4
      0.312348
      0.754776
      0.009286

2.3. Run new code

The above table shows output from the previous analysis for the period from 1990 to 2004. The code below runs my new trend analysis for all of the Czech sites (project_name = 'ICPW_TOCTRENDS_2015_CZ') for the same period and then prints just the results for site 23499 for comparison.



In [4]:

    
# Specify projects of interest
proj_list = ['ICPW_TOCTRENDS_2015_CZ',]

# Run analysis for the period 1990 - 2004
new_df, dup_df, no_data_df = resa2_trends.run_trend_analysis(proj_list, engine,
                                                             st_yr=1990, end_yr=2004)

# Delete mk_std_dev col as not relevant here
del new_df['mk_std_dev']

new_df.head(14).sort_values(by='par_id')









    



Extracting data from RESA2...
    Done.

Converting units and applying sea-salt correction...
    Done.

Calculating statistics...
    Done.

Finished.






    Out[4]:






  
    
      
      station_id
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      trend
      sen_slp
    
  
  
    
      0
      23499
      Al
      1990-2004
      15
      485.085767
      503.500000
      187.586374
      -83.0
      -4.057948
      0.000050
      decreasing
      -33.200000
    
    
      5
      23499
      ECa
      1990-2004
      15
      44.283333
      44.000000
      5.584470
      -75.0
      -3.662050
      0.000250
      decreasing
      -1.166667
    
    
      9
      23499
      ECaX
      1990-2004
      15
      43.489771
      43.186000
      5.524900
      -75.0
      -3.662050
      0.000250
      decreasing
      -1.179175
    
    
      12
      23499
      ECaX_EMgX
      1990-2004
      15
      79.786038
      78.874000
      9.366281
      -71.0
      -3.464102
      0.000532
      decreasing
      -1.899667
    
    
      11
      23499
      ECa_EMg
      1990-2004
      15
      84.783333
      84.000000
      9.708540
      -73.0
      -3.563076
      0.000367
      decreasing
      -1.981481
    
    
      3
      23499
      ECl
      1990-2004
      15
      21.447619
      22.000000
      3.017861
      -44.0
      -2.130559
      0.033126
      decreasing
      -0.441558
    
    
      4
      23499
      EMg
      1990-2004
      15
      40.500000
      40.000000
      4.170237
      -75.0
      -3.671052
      0.000242
      decreasing
      -0.833333
    
    
      8
      23499
      EMgX
      1990-2004
      15
      36.296267
      35.930000
      3.924888
      -67.0
      -3.266153
      0.001090
      decreasing
      -0.739152
    
    
      6
      23499
      ENO3
      1990-2004
      12
      64.866071
      66.037500
      7.380028
      -40.0
      -2.674329
      0.007488
      decreasing
      -1.278393
    
    
      2
      23499
      ESO4
      1990-2004
      15
      95.625000
      98.125000
      18.527533
      -95.0
      -4.651794
      0.000003
      decreasing
      -4.083333
    
    
      7
      23499
      ESO4X
      1990-2004
      15
      93.415895
      95.947286
      18.336954
      -95.0
      -4.651794
      0.000003
      decreasing
      -3.965619
    
    
      10
      23499
      ESO4_ECl
      1990-2004
      15
      117.072619
      119.267857
      20.531061
      -95.0
      -4.651794
      0.000003
      decreasing
      -4.457792
    
    
      13
      23499
      ESO4_ECl_ENO3
      1990-2004
      12
      175.649554
      176.662202
      23.824901
      -50.0
      -3.360055
      0.000779
      decreasing
      -5.648674
    
    
      1
      23499
      TOC
      1990-2004
      11
      1.561364
      1.385000
      0.554342
      4.0
      0.234261
      0.814783
      no trend
      0.009286

The results from my code are not exactly the same as the output produced by Tore, but for all practical purposes I'd say the differences are negligible. This is actually pretty surprising given the amount of second-guessing and reverse-engineering that went into developing my code.

3. Analysis of the full TOC trends dataset

The next big question is whether my code will upscale effectively to the full TOC dataset. The RESA2 application usually crashes if a user tries to extract data for more than one large project at a time, but I'm hoping that by bypassing RESA2 and communicating directly with the underlying Oracle instance, my code will not be affected by this issue.

This section attempts to run the trends analysis on the full TOC dataset. Initially I'll use the data for all years and I'll also generate plots for each series (which I'll return to later). The projects to be included have been previously agreed with Heleen - see section 3 of this notebook for details.



In [5]:

    
# Specify projects of interest
proj_list = ['ICPW_TOCTRENDS_2015_CA_ATL',
             'ICPW_TOCTRENDS_2015_CA_DO',
             'ICPW_TOCTRENDS_2015_CA_ICPW',
             'ICPW_TOCTRENDS_2015_CA_NF',
             'ICPW_TOCTRENDS_2015_CA_QU',
             'ICPW_TOCTRENDS_2015_CZ',
             'ICPW_TOCTRENDS_2015_Cz2',
             'ICPW_TOCTRENDS_2015_FI',
             'ICPW_TOCTRENDS_2015_NO',
             'ICPW_TOCTRENDS_2015_SE',
             'ICPW_TOCTRENDS_2015_UK',
             'ICPW_TOCTRENDS_2015_US_LTM',
             'ICPWaters Ca']

# Folder for saving PNG plots
plot_fold=r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015\Trends_Plots'

# Run analysis
res_df, dup_df, no_data_df = resa2_trends.run_trend_analysis(proj_list, engine,
                                                             st_yr=None, end_yr=None,
                                                             plot=True, fold=plot_fold)

# Delete mk_std_dev col as not relevant here
del res_df['mk_std_dev']









    



Extracting data from RESA2...
    The database contains duplicate values for some station-date-parameter combinations.
    These will be averaged, but you should check the repeated values are not errors.
    The duplicated entries are returned in a separate dataframe.

    Some stations have no relevant data in the period specified. Their IDs are returned in a separate dataframe.

    Done.

Converting units and applying sea-salt correction...
    Done.

Calculating statistics...
    Data series for Al at site 101 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 102 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 103 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 104 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 107 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 109 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 112 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 115 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 118 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 119 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 120 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 121 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 122 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 123 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 128 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 132 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 134 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 135 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 144 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 146 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 147 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 150 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 156 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 158 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 161 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 162 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 163 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 166 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 168 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 170 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 173 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 176 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 179 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 180 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 181 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 182 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 183 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 185 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 192 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 193 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 196 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 12081 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 23546 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 36547 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Data series for Al at site 36560 has fewer than 10 non-null values. Significance estimates may be unreliable.
    Done.

Finished.

3.1. Results

It's good to see the algorithm manages to process all the sites in one go. The table below shows the first 10 rows of the output, but it's the warning messages that I'm most interested in at the moment.



In [6]:

    
res_df.head(10)









    Out[6]:






  
    
      
      station_id
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      trend
      sen_slp
    
  
  
    
      0
      100.0
      Al
      1986-2015
      1
      276.433655
      276.433655
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      100.0
      TOC
      1986-2015
      28
      6.053929
      6.000000
      1.277101
      221.0
      4.351249
      1.353642e-05
      increasing
      0.100000
    
    
      2
      100.0
      ESO4
      1986-2015
      28
      37.526786
      31.250000
      18.794670
      -327.0
      -6.446919
      1.141465e-10
      decreasing
      -2.001984
    
    
      3
      100.0
      ECl
      1986-2015
      28
      18.061224
      17.142857
      3.572260
      -120.0
      -2.369597
      1.780750e-02
      decreasing
      -0.184694
    
    
      4
      100.0
      EMg
      1986-2015
      28
      10.711310
      10.833333
      2.028790
      -134.0
      -2.652231
      7.996170e-03
      decreasing
      -0.133025
    
    
      5
      100.0
      ECa
      1986-2015
      28
      18.607143
      18.750000
      3.090029
      -126.0
      -2.474722
      1.333399e-02
      decreasing
      -0.187500
    
    
      6
      100.0
      ENO3
      1986-2015
      28
      2.418367
      2.000000
      1.526747
      41.0
      0.791033
      4.289247e-01
      no trend
      0.028571
    
    
      7
      100.0
      ESO4X
      1986-2015
      28
      35.666480
      29.778571
      18.611860
      -329.0
      -6.481403
      9.087331e-11
      decreasing
      -1.945556
    
    
      8
      100.0
      EMgX
      1986-2015
      28
      7.171310
      6.860667
      1.698371
      -123.0
      -2.410766
      1.591906e-02
      decreasing
      -0.102902
    
    
      9
      100.0
      ECaX
      1986-2015
      28
      17.938878
      18.068143
      3.040012
      -122.0
      -2.390539
      1.682367e-02
      decreasing
      -0.182602

3.2. Data issues

3.2.1. Limited Al data

Many of the sites have limited Al data. This isn't a database error, but it could have implications for our ability to detect trends for this parameter.

3.2.2. Sites with no data



In [7]:

    
# Get properties for sites with no data
# Basic station properties
sql = ('SELECT station_id, station_code, station_name '
       'FROM resa2.stations '
       'WHERE station_id in %s' 
       % str(tuple(no_data_df['station_id'].values)))

na_stns = pd.read_sql(sql, engine)

# Get country for each station
sql = ('SELECT station_id, value '
       'FROM resa2.stations_par_values '
       'WHERE station_id in %s '
       'AND var_id = 261'
       % str(tuple(no_data_df['station_id'].values)))

co_df = pd.read_sql(sql, engine)

# Decode special characters fro `windows-1252` encoding to unicode
na_stns['station_name'] = na_stns['station_name'].str.decode('windows-1252')

# Join
na_stns = pd.merge(na_stns, co_df, how='left',
                   on='station_id')

na_stns.columns = ['station_id', 'station_code', 'station_name', 'country']

print 'Number of sites with no data:', len(no_data_df)
print 'Sites with no data come from the following countries:'
print list(na_stns.country.unique())

na_stns.head()









    



Number of sites with no data: 178
Sites with no data come from the following countries:
['Sweden']






    Out[7]:






  
    
      
      station_id
      station_code
      station_name
      country
    
  
  
    
      0
      36573
      656612-164132
      Årsjön
      Sweden
    
    
      1
      36574
      655275-153234
      Älgsjön
      Sweden
    
    
      2
      36577
      656984-164254
      Albysjön
      Sweden
    
    
      3
      36579
      665768-164748
      Aspdalssjön
      Sweden
    
    
      4
      36580
      656832-161545
      Aspen
      Sweden

So there are 178 Swedish sites (out of 261 in total) that have no data whatsoever for the parameters $SO_4$, $Cl$, $Ca$, $Mg$, $NO_3$, $TOC$ or $Al$. It seems strange to have these included in a project focussing on TOC trends! Perhaps they're included due to having data for other useful parameters? Let's check.



In [8]:

    
# Find ANY parameters associated with the 178 Swedish sites
sql = ('SELECT * '
       'FROM resa2.water_samples '
       'WHERE station_id in %s' 
       % str(tuple(no_data_df['station_id'].values)))

swed_df = pd.read_sql(sql, engine)

print 'Number of samples associated with the 178 Swedish sites (for ALL parameters):', len(swed_df)









    



Number of samples associated with the 178 Swedish sites (for ALL parameters): 0

So it appears that more than two-thirds of the Swedish sites in the database have no data at all. Is this expected, or has something gone wrong here?

3.2.3. Duplicated values

Some duplicate values are expected, because at some sites samples are taken at a variety of depths. My algorithm currently only selects samples from the upper 1 m of the water column, but this occasionally spans two depth measuremnts (usually 0 and 0.5 m). For the moment, it seems reasonable to average these values (and in most cases they are near-identical anyway), but check this with Heleen and modify the code if necessary.



In [9]:

    
print 'Total number of duplicated records:', len(dup_df)

dup_df.head(10)









    



Total number of duplicated records: 2617






    Out[9]:






  
    
      
      station_id
      sample_date
      name
      value
    
  
  
    
      1270
      103.0
      1974-10-29
      Ca
      1.25
    
    
      57232
      103.0
      1974-10-29
      Ca
      1.25
    
    
      1271
      103.0
      1974-10-29
      Cl
      4.40
    
    
      57233
      103.0
      1974-10-29
      Cl
      4.40
    
    
      1272
      103.0
      1974-10-29
      Mg
      0.74
    
    
      57234
      103.0
      1974-10-29
      Mg
      0.74
    
    
      1273
      103.0
      1974-10-29
      NO3-N
      90.00
    
    
      57235
      103.0
      1974-10-29
      NO3-N
      90.00
    
    
      6
      104.0
      1974-11-05
      Ca
      2.07
    
    
      57242
      104.0
      1974-11-05
      Ca
      2.07

Not all the duplicates can be explained by measuring at multiple depths, though, and occasionally the repeated values are significantly different. As an example, consider records for site 28970 (which corresponds to site code NF02YH0013, in Newfoundland, Canada).



In [10]:

    
# Get the duplicated values for this site
dup_df.query('station_id == 28970').head(10)









    Out[10]:






  
    
      
      station_id
      sample_date
      name
      value
    
  
  
    
      125050
      28970.0
      1990-06-29
      NO3-N
      25.0000
    
    
      125053
      28970.0
      1990-06-29
      NO3-N
      110.0000
    
    
      125052
      28970.0
      1990-06-29
      TOC
      0.3125
    
    
      125054
      28970.0
      1990-06-29
      TOC
      0.6400
    
    
      125059
      28970.0
      1990-10-18
      NO3-N
      23.0000
    
    
      125062
      28970.0
      1990-10-18
      NO3-N
      100.0000
    
    
      125061
      28970.0
      1990-10-18
      TOC
      0.8750
    
    
      125063
      28970.0
      1990-10-18
      TOC
      0.8960
    
    
      125068
      28970.0
      1991-06-19
      NO3-N
      25.0000
    
    
      379302
      28970.0
      1991-06-19
      NO3-N
      110.0000

There are lots of duplciated records here, and the values are not the same. Let's have a look at the database results for the first water sample in this series (the one from 29/06/1990), where we apparently have repeated measurements for NO3-N and TOC.



In [11]:

    
# Get all methods applied to sample(s) from 29/06/1990 at station 28970
sql = ("SELECT value_id, sample_id, method_id, value, flag1 "
       "FROM resa2.water_chemistry_values2 "
       "WHERE sample_id IN (SELECT water_sample_id FROM resa2.water_samples "
                           "WHERE station_id = 28970 "
                           "AND sample_date = DATE '1990-06-29')")

df = pd.read_sql(sql, engine)
df

Note that all these records have the same sample_id, which means the problem is not related to samples being entered into the database more than once. Instead, the problem seems to be due to entering multiple methods for the same parameter: method_id = 10265 corresponds to NO3-N concentrations measured in $\mu g/l$, whereas method_id = 10308 refers to NO3-N in $mg/l$. As the data is extracted for the trends analysis, my code automatically converts all NO3-N measurements into $\mu g/l$, which is why we end up with duplicate values of 25 and 110 in the trends results (see above). The puzzle is where do these values come from in the first place? $110 \; \mu g/l$ is, after all, a lot more than $25 \; \mu g/l$.

Looking at the database log, the value of $0.110 \; mg/l$ was uploaded on 16/02/2006, whereas the value of $25 \; \mu g/l$ was added on 16/11/2015. The only raw data I can find on the network for this location is here:

K:\Prosjekter\langtransporterte forurensninger\O-23300 - ICP-WATERS - HWI\Database\2015 DOC analysis\data delivery\CA\Couture\ICP Waters form for water chemistry_ATL_NF.xlsx

which gives the nitrate concentraion as $25 \; \mu g/l$ (i.e. consistent with the most recent database upload). At present, I can't identify where the older values have come from, and I'm reluctant to delete them without further clarification. Ask Heleen about this, but for now I'll just stick with averaging duplicated values because I have no obvious way of choosing which one(s) are correct.

4. Data visualisation

I'd like to try producing a Google map to visualise the results of the statistical analysis. If I automate this as much as possible, it should be easy to re-run the code once I'm happy with the basic input data.

I'll start off by limiting the dataset so that we're only visualising the following parameters (and only in cases where there are more than 10 data values in the series):

ESO4 (μeq/l)
ESO4X (μeq/l)
ESO4X (μeq/l)
ECl (μeq/l)
ESO4_ECl (μeq/l)
ECa_EMg (μeq/l)
ECaX_EMgX (μeq/l)
ENO3 (μeq/l)
ESO4_ECl_ENO3 (μeq/l)
TOC (mgC/l)
Al (mg/l)



In [19]:

    
# Extract subset of results to visualise
par_str = str(('ESO4', 'ESO4X', 'ESO4X', 
               'ECl', 'ESO4_ECl', 'ECa_EMg', 
               'ECaX_EMgX', 'ENO3', 'ESO4_ECl_ENO3', 
               'TOC', 'Al'))

vis_df = res_df.query("(par_id in %s) and (non_missing >= 10)" % par_str)

print 'Total number of stations to visualise:', len(vis_df['station_id'].unique())

vis_df.head()









    



Total number of stations to visualise: 428






    Out[19]:






  
    
      
      station_id
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      trend
      sen_slp
    
  
  
    
      1
      100.0
      TOC
      1986-2015
      28
      6.053929
      6.000000
      1.277101
      221.0
      4.351249
      1.353642e-05
      increasing
      0.100000
    
    
      2
      100.0
      ESO4
      1986-2015
      28
      37.526786
      31.250000
      18.794670
      -327.0
      -6.446919
      1.141465e-10
      decreasing
      -2.001984
    
    
      3
      100.0
      ECl
      1986-2015
      28
      18.061224
      17.142857
      3.572260
      -120.0
      -2.369597
      1.780750e-02
      decreasing
      -0.184694
    
    
      6
      100.0
      ENO3
      1986-2015
      28
      2.418367
      2.000000
      1.526747
      41.0
      0.791033
      4.289247e-01
      no trend
      0.028571
    
    
      7
      100.0
      ESO4X
      1986-2015
      28
      35.666480
      29.778571
      18.611860
      -329.0
      -6.481403
      9.087331e-11
      decreasing
      -1.945556



In [20]:

    
# Join in station details

# Basic station properties
sql = ('SELECT station_id, station_code, station_name, latitude, longitude '
       'FROM resa2.stations '
       'WHERE station_id in %s' 
       % str(tuple(vis_df['station_id'].unique())))

vis_stns = pd.read_sql(sql, engine)

# Get country for each station
sql = ('SELECT station_id, value '
       'FROM resa2.stations_par_values '
       'WHERE station_id in %s '
       'AND var_id = 261'
       % str(tuple(vis_df['station_id'].unique())))

co_df = pd.read_sql(sql, engine)

# Decode special characters fro `windows-1252` encoding to unicode
vis_stns['station_name'] = vis_stns['station_name'].str.decode('windows-1252')

# Join
vis_stns = pd.merge(vis_stns, co_df, how='left',
                    on='station_id')

vis_stns.columns = ['station_id', 'station_code', 'station_name', 
                    'latitude', 'longitude', 'country']

# Join to stats output
vis_df = pd.merge(vis_df, vis_stns, how='left',
                  on='station_id')

# Reorder
vis_df = vis_df[['station_id', 'station_code', 'station_name', 'country',
                 'latitude', 'longitude', 'par_id', 'period', 'non_missing',
                 'mean', 'median', 'std_dev', 'mk_stat', 'norm_mk_stat',
                 'mk_p_val', 'trend', 'sen_slp']]

vis_df.head()









    Out[20]:






  
    
      
      station_id
      station_code
      station_name
      country
      latitude
      longitude
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      trend
      sen_slp
    
  
  
    
      0
      100.0
      623-603
      Breidlivatnet
      Norway
      59.977669
      10.152037
      TOC
      1986-2015
      28
      6.053929
      6.000000
      1.277101
      221.0
      4.351249
      1.353642e-05
      increasing
      0.100000
    
    
      1
      100.0
      623-603
      Breidlivatnet
      Norway
      59.977669
      10.152037
      ESO4
      1986-2015
      28
      37.526786
      31.250000
      18.794670
      -327.0
      -6.446919
      1.141465e-10
      decreasing
      -2.001984
    
    
      2
      100.0
      623-603
      Breidlivatnet
      Norway
      59.977669
      10.152037
      ECl
      1986-2015
      28
      18.061224
      17.142857
      3.572260
      -120.0
      -2.369597
      1.780750e-02
      decreasing
      -0.184694
    
    
      3
      100.0
      623-603
      Breidlivatnet
      Norway
      59.977669
      10.152037
      ENO3
      1986-2015
      28
      2.418367
      2.000000
      1.526747
      41.0
      0.791033
      4.289247e-01
      no trend
      0.028571
    
    
      4
      100.0
      623-603
      Breidlivatnet
      Norway
      59.977669
      10.152037
      ESO4X
      1986-2015
      28
      35.666480
      29.778571
      18.611860
      -329.0
      -6.481403
      9.087331e-11
      decreasing
      -1.945556

In section 3, I generated a very large number of plots. Only some of them are relevant, so to save storage space online I'll delete the ones that aren't relevant to the data in vis_df.



In [14]:

    
# Delete unnecessary plots
# Folder path
png_fold = r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015\Trends_Plots'

# Paths to keep
keep_list = []
for rec in vis_df.itertuples():
    stn_id = rec.station_id
    par_id = rec.par_id
    period = rec.period
    
    keep_path = os.path.join(png_fold,
                             '%s_%s_%s.png' % (int(stn_id), par_id, period))
    keep_list.append(keep_path)

# Get a list of files in folder
search_path = os.path.join(png_fold, '*.png')
file_list = glob.glob(search_path)
                             
# Loop over files and delete where necessary
del_list = []
for file_path in file_list:
    if file_path not in keep_list:
        os.remove(file_path)

I've uploaded the remaining plots to a location online, which means I can link to them from the map visualisation. To do this, I need to insert the link to each plot as a new column in vis_df. I also want to add a column to specify the correct Google Fusion Tables marker symbol according to the nature of the trend and, finally, I'd like to capitalise the trend column to improve the formating on my finsihed map. The whole thing then needs to be saved as a CSV.



In [21]:

    
# Capitalise 'trend' column
vis_df['trend'] = vis_df['trend'].str.capitalize()

# Join in text for GFT markers
mark_df = pd.DataFrame({'trend':['Increasing', 'No trend', 'Decreasing'],
                        'symbol':['small_red', 'small_yellow', 'small_green']})

vis_df = pd.merge(vis_df, mark_df, how='left',
                  on='trend')

# Build the link path
link_stem = r'http://www.googledrive.com/host/0BximeC_RweaebnFoa2VWcGtRWms/'

vis_df['link'] = (link_stem + vis_df['station_id'].map(int).map(str) + '_' +
                  vis_df['par_id'] + '_' + vis_df['period'] + '.png')

# Save 
out_csv = r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015\Trends_Plots\vis_data.csv'
vis_df.to_csv(out_csv, index=False, encoding='windows-1252')

I've converted the output from all this into a Google Fusion Table, which can be found here.

5. Summary to date

This section summarises the work I've done so far with RESA2. The sub-headings can be compared to my RESA2 "to do" list, which can be found on the network here:

K:\Prosjekter\langtransporterte forurensninger\O-23300 - ICP-WATERS - HWI\Database\JamesS\RESA2_To_Do_HWI.docx

5.1. Review of data

I've worked my way through all the ICP-Waters data in the database and have cleaned things up where possible. The notebooks here and here describe the progress to date and the main outstanding issues. In particular, I've deleted and reloaded a large proportion of the Czech water samples, many of which were associated with the wrong site codes. I've also identified a number of sites (mostly Swedish and Canadian) for which we have incomplete site metadata and/or very limited water chemistry measurements. Where data are missing, I've contacted the relevant Focal Centres to determine whether more complete information exists and can be made available.

There are still a number of datasets on the network which have not been added to the database. Most of these are associated with locations where the main data contact is no longer very responsive (e.g. Hungary or Slovakia) and where there are issues regarding the raw data that has been supplied. In addition, datasets for Russia and the Netherlands have been supplied relatively recently, but in a format that will require significant work before they can be added to the database. I haven't got around to this yet.

5.2. Review US sites

I've tidied up the US sites in the database based on the information in John's e-mails - full details are in this notebook. There are stil a few (fairly minor) outstanding issues, but I've contacted John about these.

5.3. Land cover proportions and mean catchment elevation

I've added this information where possible and have also included a request for improved site metadata in the 2016 "call for data". I've already had responses from e.g. Italy and Switzerland, which have allowed me to substantially improve the station descriptions for these countires. Hopefully others will respond similarly later in the year.

For the Norwegian sites, I've done a bit of watershed processing to refine the catchment properties (see section 2 of this notebook), but there is still a substantial amount of work required before we have a complete set of metadata for the Norwegian catchments. Heleen thinks this may be worth doing at some point, but it's on hold for now due to lack of time and resources.

5.4. Correct Newfoundland data

This has been done, as described in section 5 of this notebook.

5.5. Add "country" as a metadata variable

Done.

5.6. Calculate distance from coast

Done - see section 4 of this notebook and section 1 of this notebook for details.

5.7. Produce a map of the TOC trends sites for further checking

Done. The map can be accessed here and includes a simple visualisation of the results of the trend analyses (see below).

5.8. Run the new trend analysis

As described here and in this notebook, I've attempted to recode the entire trends analysis. This work is not finished yet (I need to extend the code to include a few more parameters), but the basic framework is complete and seems to be working OK. The code is easy to re-run and so, as long as Heleen, Don and John are happy with the baisc output, it should be fairly straightforward to repeat and adapt the trends analysis and necessary. It would be useful to have a discussion on how the output could be modified and extended to make it more useful.

5.9. Send out 2016 "call for data"

I've e-mailed the Focal Centre contacts and have had several replies already (e.g. from Italy and Switzerland). All the new data I've received has been added to the database.

To check the progress of the 2016 call for any particular country, see the contacts sheet of the Excel file here:

C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Call_for_Data_2016\icpw_all_sites.xlsx

which keeps track of all the messages I've sent and the replies I've received to date.

	value_id	sample_id	method_id	value	flag1
0	7440578	328339	10249	79.0000	None
1	3621153	328339	10251	0.2200	None
2	3621154	328339	10253	2.5000	None
3	7440581	328339	10256	15.0000	None
4	3621155	328339	10258	0.1400	None
5	7440583	328339	10260	17.1000	None
6	3621156	328339	10261	0.2200	None
7	3621157	328339	10263	1.5000	None
8	7440586	328339	10265	25.0000	None
9	3621158	328339	10268	5.1000	None
10	3621159	328339	10271	1.0000	None
11	3621160	328339	10273	0.3125	None
12	7440589	328339	10274	100.0000	None
13	7440591	328339	10298	-6.0000	None
14	3621161	328339	10308	0.1100	None
15	3621162	328339	10311	-0.3000	<
16	7440590	328339	10823	0.5000	None

	station_id	parameter	period	nonmiss	average	median	stdev	test_stat	mk_stat	mkp	senslope
0	23499	AL	1990-2004	15	486.052433	503.500000	186.876929	-83	-4.107435	0.000040	-32.700000
1	23499	ALK	1990-2004	14	-15.896429	-15.850000	6.270918	20	1.096542	0.272842	0.500000
2	23499	ANC	1990-2004	12	-47.948231	-53.510865	14.813228	48	3.291482	0.000997	3.178531
3	23499	ECAEMG	1990-2004	15	84.136540	83.398845	9.581870	-75	-3.711537	0.000206	-1.968997
4	23499	ECAXEMGX	1990-2004	15	79.221717	78.338350	9.248877	-71	-3.513589	0.000442	-1.886138
5	23499	ECL	1990-2004	15	21.093655	21.718895	2.980660	-44	-2.180106	0.029250	-0.435917
6	23499	ENO3	1990-2004	12	65.111360	66.036608	7.299949	-40	-2.742902	0.006090	-1.234397
7	23499	ENO3DIVENO3ESO4X	1990-2004	12	0.428599	0.427500	0.028221	32	2.194322	0.028212	0.006281
8	23499	ESO4	1990-2004	15	95.643491	97.541580	18.305102	-97	-4.800255	0.000002	-4.089638
9	23499	ESO4CL	1990-2004	15	116.737145	118.934785	20.274570	-95	-4.701281	0.000003	-4.449103
10	23499	ESO4ECLENO3	1990-2004	12	175.508512	176.306256	23.280313	-52	-3.565772	0.000363	-5.589727
11	23499	ESO4X	1990-2004	15	93.470844	95.253690	18.117947	-97	-4.800255	0.000002	-4.033607
12	23499	HPLUSS	1990-2004	15	18.383278	18.506625	4.313829	-69	-3.414614	0.000639	-0.795074
13	23499	TOC_DOC	1990-2004	11	1.561364	1.385000	0.554342	4	0.312348	0.754776	0.009286

	station_id	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	trend	sen_slp
0	23499	Al	1990-2004	15	485.085767	503.500000	187.586374	-83.0	-4.057948	0.000050	decreasing	-33.200000
5	23499	ECa	1990-2004	15	44.283333	44.000000	5.584470	-75.0	-3.662050	0.000250	decreasing	-1.166667
9	23499	ECaX	1990-2004	15	43.489771	43.186000	5.524900	-75.0	-3.662050	0.000250	decreasing	-1.179175
12	23499	ECaX_EMgX	1990-2004	15	79.786038	78.874000	9.366281	-71.0	-3.464102	0.000532	decreasing	-1.899667
11	23499	ECa_EMg	1990-2004	15	84.783333	84.000000	9.708540	-73.0	-3.563076	0.000367	decreasing	-1.981481
3	23499	ECl	1990-2004	15	21.447619	22.000000	3.017861	-44.0	-2.130559	0.033126	decreasing	-0.441558
4	23499	EMg	1990-2004	15	40.500000	40.000000	4.170237	-75.0	-3.671052	0.000242	decreasing	-0.833333
8	23499	EMgX	1990-2004	15	36.296267	35.930000	3.924888	-67.0	-3.266153	0.001090	decreasing	-0.739152
6	23499	ENO3	1990-2004	12	64.866071	66.037500	7.380028	-40.0	-2.674329	0.007488	decreasing	-1.278393
2	23499	ESO4	1990-2004	15	95.625000	98.125000	18.527533	-95.0	-4.651794	0.000003	decreasing	-4.083333
7	23499	ESO4X	1990-2004	15	93.415895	95.947286	18.336954	-95.0	-4.651794	0.000003	decreasing	-3.965619
10	23499	ESO4_ECl	1990-2004	15	117.072619	119.267857	20.531061	-95.0	-4.651794	0.000003	decreasing	-4.457792
13	23499	ESO4_ECl_ENO3	1990-2004	12	175.649554	176.662202	23.824901	-50.0	-3.360055	0.000779	decreasing	-5.648674
1	23499	TOC	1990-2004	11	1.561364	1.385000	0.554342	4.0	0.234261	0.814783	no trend	0.009286

	station_id	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	trend	sen_slp
0	100.0	Al	1986-2015	1	276.433655	276.433655	NaN	NaN	NaN	NaN	NaN	NaN
1	100.0	TOC	1986-2015	28	6.053929	6.000000	1.277101	221.0	4.351249	1.353642e-05	increasing	0.100000
2	100.0	ESO4	1986-2015	28	37.526786	31.250000	18.794670	-327.0	-6.446919	1.141465e-10	decreasing	-2.001984
3	100.0	ECl	1986-2015	28	18.061224	17.142857	3.572260	-120.0	-2.369597	1.780750e-02	decreasing	-0.184694
4	100.0	EMg	1986-2015	28	10.711310	10.833333	2.028790	-134.0	-2.652231	7.996170e-03	decreasing	-0.133025
5	100.0	ECa	1986-2015	28	18.607143	18.750000	3.090029	-126.0	-2.474722	1.333399e-02	decreasing	-0.187500
6	100.0	ENO3	1986-2015	28	2.418367	2.000000	1.526747	41.0	0.791033	4.289247e-01	no trend	0.028571
7	100.0	ESO4X	1986-2015	28	35.666480	29.778571	18.611860	-329.0	-6.481403	9.087331e-11	decreasing	-1.945556
8	100.0	EMgX	1986-2015	28	7.171310	6.860667	1.698371	-123.0	-2.410766	1.591906e-02	decreasing	-0.102902
9	100.0	ECaX	1986-2015	28	17.938878	18.068143	3.040012	-122.0	-2.390539	1.682367e-02	decreasing	-0.182602

	station_id	station_code	station_name	country
0	36573	656612-164132	Årsjön	Sweden
1	36574	655275-153234	Älgsjön	Sweden
2	36577	656984-164254	Albysjön	Sweden
3	36579	665768-164748	Aspdalssjön	Sweden
4	36580	656832-161545	Aspen	Sweden

	station_id	sample_date	name	value
1270	103.0	1974-10-29	Ca	1.25
57232	103.0	1974-10-29	Ca	1.25
1271	103.0	1974-10-29	Cl	4.40
57233	103.0	1974-10-29	Cl	4.40
1272	103.0	1974-10-29	Mg	0.74
57234	103.0	1974-10-29	Mg	0.74
1273	103.0	1974-10-29	NO3-N	90.00
57235	103.0	1974-10-29	NO3-N	90.00
6	104.0	1974-11-05	Ca	2.07
57242	104.0	1974-11-05	Ca	2.07

	station_id	sample_date	name	value
125050	28970.0	1990-06-29	NO3-N	25.0000
125053	28970.0	1990-06-29	NO3-N	110.0000
125052	28970.0	1990-06-29	TOC	0.3125
125054	28970.0	1990-06-29	TOC	0.6400
125059	28970.0	1990-10-18	NO3-N	23.0000
125062	28970.0	1990-10-18	NO3-N	100.0000
125061	28970.0	1990-10-18	TOC	0.8750
125063	28970.0	1990-10-18	TOC	0.8960
125068	28970.0	1991-06-19	NO3-N	25.0000
379302	28970.0	1991-06-19	NO3-N	110.0000

	station_id	station_code	station_name	country	latitude	longitude	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	trend	sen_slp
0	100.0	623-603	Breidlivatnet	Norway	59.977669	10.152037	TOC	1986-2015	28	6.053929	6.000000	1.277101	221.0	4.351249	1.353642e-05	increasing	0.100000
1	100.0	623-603	Breidlivatnet	Norway	59.977669	10.152037	ESO4	1986-2015	28	37.526786	31.250000	18.794670	-327.0	-6.446919	1.141465e-10	decreasing	-2.001984
2	100.0	623-603	Breidlivatnet	Norway	59.977669	10.152037	ECl	1986-2015	28	18.061224	17.142857	3.572260	-120.0	-2.369597	1.780750e-02	decreasing	-0.184694
3	100.0	623-603	Breidlivatnet	Norway	59.977669	10.152037	ENO3	1986-2015	28	2.418367	2.000000	1.526747	41.0	0.791033	4.289247e-01	no trend	0.028571
4	100.0	623-603	Breidlivatnet	Norway	59.977669	10.152037	ESO4X	1986-2015	28	35.666480	29.778571	18.611860	-329.0	-6.481403	9.087331e-11	decreasing	-1.945556