In [1]:

    
%matplotlib inline
import pandas as pd, imp
from sqlalchemy import create_engine

Update TOC trends analysis

Tore has previously written code to calculate Mann-Kendall (M-K) trend statistics and Sen's slope estimates for data series in RESA2. According to my notes from a meeting with Tore on 13/05/2016, the workflow goes something like this:

Run code to extract and summarise time series from RESA2, insert this data into Mann-Kendall_Sen.xls, then read the results back into a new table in RESA2 called e.g. ICPW_STATISTICS.
Run the ICPStat query in Icp-waters2001_2000.accdb to summarise the data in ICPW_STATISTICS. This creates a new table currently called aaa, but Tore says he'll rename it to something more descriptive before he leaves.
Run the export() subroutine in the Export module of Icp-waters2001_2000.accdb to reformat the aaa table and write the results to an Excel file.

Mann-Kendall_Sen.xls is an early version of the popular Excel macro MULTIMK/CONDMK, which Tore has modified slightly for use in this analysis. (A more recent version of the same file is available here). This Excel macro permits some quite sophisticated multivariate and conditional analyses, but as far as I can tell the TOC trends code is only making use of the most basic functionality - performing repeated independent trend tests on annually summarised time series.

Unfortunately, although the workflow above makes sense, I've so far failed to find and run Tore's code for step 1 (I can find everything for steps 2 and 3, but not the code for interacting with the Excel workbook). It also seems a bit messy to be switching back and forth between RESA2, Excel and Access in this way, so the code here is a first step towards refactoring the whole analysis into Python.

1. Test data

The Mann-Kendall_Sen.xls file on the network already had some example ICPW data in it, which I can use to test my code. The raw input data and the results obtained from the Excel macro are saved as mk_sen_test_data.xlsx.



In [2]:

    
# Read data and results from the Excel macro
in_xlsx = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015'
           r'\Data\mk_sen_test_data.xlsx')

raw_df = pd.read_excel(in_xlsx, sheetname='input')
res_df = pd.read_excel(in_xlsx, sheetname='results')

raw_df



In [3]:

    
res_df









    Out[3]:






  
    
      
      par
      non_missing
      test_stat
      std_dev
      mk_stat
      mk_p_val
      sen_slp
    
  
  
    
      0
      ESO4
      11
      -43
      12.845233
      -3.347545
      0.000815
      -1.457400
    
    
      1
      ESO4X
      11
      -37
      12.845233
      -2.880446
      0.003971
      -1.701966
    
    
      2
      ECL
      11
      0
      12.806248
      0.000000
      1.000000
      0.000000
    
    
      3
      ESO4CL
      11
      -3
      12.845233
      -0.233550
      0.815335
      -1.585009
    
    
      4
      TOC_DOC
      11
      35
      12.662280
      2.764115
      0.005708
      0.100000
    
    
      5
      ECAEMG
      11
      9
      12.845233
      0.700649
      0.483522
      0.423846
    
    
      6
      ECAXEMGX
      11
      9
      12.845233
      0.700649
      0.483522
      0.267245
    
    
      7
      ENO3
      11
      -35
      12.767145
      -2.741412
      0.006118
      -0.964285
    
    
      8
      AL
      1
      0
      0.000000
      NaN
      NaN
      NaN
    
    
      9
      ANC
      11
      43
      12.845233
      3.347545
      0.000815
      3.562240
    
    
      10
      ALK
      11
      16
      9.933109
      1.610775
      0.107229
      0.000000
    
    
      11
      HPLUSS
      11
      -33
      12.845233
      -2.569047
      0.010198
      -0.539823

2. Statistical functions

Looking at the output in the ICPW_STATISTICS3 table of RESA2, we need to calculate the following statistcs (only some of which are output by the Excel macro):

Number of non-missing values
Median
Mean
Period over which data are available (start and end years)
Standard deviation (of the data)
Standard deviation (expected under the null hypothesis of the M-K test)
M-K statistic
Normalised M-K statistic $\left(= \frac{M-K \; statistic}{Standard \; deviation} \right)$
M-K p-value
Sen's slope (a.k.a. the Theil-Sen slope)

Most of these should be quite straightforward. We'll start off by defining a function to calculate the M-K statistic (note that Scipy already has a function for the Theil-Sen slope). We'll also define another function to bundle everything together and return a dataframe of the results.



In [4]:

    
def mk_test(x, stn_id, par, alpha=0.05):
    """ Adapted from http://pydoc.net/Python/ambhas/0.4.0/ambhas.stats/
        by Sat Kumar Tomer.
    
        Perform the MK test for monotonic trends. Uses the "normal
        approximation" to determine significance and therefore should 
        only be used if the number of values is >= 10.
    
    Args:
        x:     1D array of data
        name:  Name for data series (string)
        alpha: Significance level
    
    Returns:
        var_s: Variance of test statistic
        s:     M-K test statistic
        z:     Normalised test statistic 
        p:     p-value of the significance test
        trend: Whether to reject the null hypothesis (no trend) at
               the specified significance level. One of: 
               'increasing', 'decreasing' or 'no trend'
    """
    import numpy as np
    from scipy.stats import norm
    
    n = len(x)
    
    if n < 10:
        print ('    Data series for %s at site %s has fewer than 10 non-null values. '
               'Significance estimates may be unreliable.' % (par, int(stn_id)))
    
    # calculate S 
    s = 0
    for k in xrange(n-1):
        for j in xrange(k+1,n):
            s += np.sign(x[j] - x[k])
    
    # calculate the unique data
    unique_x = np.unique(x)
    g = len(unique_x)
    
    # calculate the var(s)
    if n == g: # there is no tie
        var_s = (n*(n-1)*(2*n+5))/18.           
    else: # there are some ties in data
        tp = np.zeros(unique_x.shape)
        for i in xrange(len(unique_x)):
            tp[i] = sum(unique_x[i] == x)
        # Sat Kumar's code has "+ np.sum", which is incorrect
        var_s = (n*(n-1)*(2*n+5) - np.sum(tp*(tp-1)*(2*tp+5)))/18.
    
    if s>0:
        z = (s - 1)/np.sqrt(var_s)
    elif s == 0:
            z = 0
    elif s<0:
        z = (s + 1)/np.sqrt(var_s)
    else:
        z = np.nan
        
    # calculate the p_value
    p = 2*(1-norm.cdf(abs(z))) # two tail test
    h = abs(z) > norm.ppf(1-alpha/2.) 

    if (z<0) and h:
        trend = 'decreasing'
    elif (z>0) and h:
        trend = 'increasing'
    elif np.isnan(z):
        trend = np.nan
    else:
        trend = 'no trend'
        
    return var_s, s, z, p, trend

def wc_stats(raw_df, st_yr=None, end_yr=None):
    """ Calculate key statistics for the TOC trends analysis:
        
            'station_id'
            'par_id'
            'non_missing'
            'median'
            'mean'
            'std_dev'
            'period'
            'mk_std_dev'
            'mk_stat'
            'norm_mk_stat'
            'mk_p_val'
            'trend'
            'sen_slp'
    
    Args:
        raw_df:   Dataframe with annual data for a single station. Columns must 
                  be: [station_id, year, par1, par2, ... parn]
        st_yr:    First year to include in analysis. Pass None to start
                  at the beginning of the series
        end_year: Last year to include in analysis. Pass None to start
                  at the beginning of the series
    
    Returns:
        df of key statistics.
    """
    import numpy as np, pandas as pd
    from scipy.stats import theilslopes
    
    # Checking
    df = raw_df.copy()
    assert list(df.columns[:2]) == ['STATION_ID', 'YEAR'], 'Columns must be: [STATION_ID, YEAR, par1, par2, ... parn]'
    assert len(df['STATION_ID'].unique()) == 1, 'You can only process data for one site at a time'
    
    # Get just the period of interest
    if st_yr:
        df = df.query('YEAR >= @st_yr')
    if end_yr:
        df = df.query('YEAR <= @end_yr')
    
    # Get stn_id
    stn_id = df['STATION_ID'].iloc[0]
    
    # Tidy up df
    df.index = df['YEAR']
    df.sort_index(inplace=True)
    del df['STATION_ID'], df['YEAR']
    
    # Container for results
    data_dict = {'station_id':[],
                 'par_id':[],
                 'non_missing':[],
                 'median':[],
                 'mean':[],
                 'std_dev':[],
                 'period':[],
                 'mk_std_dev':[],
                 'mk_stat':[],
                 'norm_mk_stat':[],
                 'mk_p_val':[],
                 'trend':[],
                 'sen_slp':[]}
    
    # Loop over pars
    for col in df.columns:
        # 1. Station ID
        data_dict['station_id'].append(stn_id)
        
        # 2. Par ID
        data_dict['par_id'].append(col)
        
        # 3. Non-missing
        data_dict['non_missing'].append(pd.notnull(df[col]).sum())
        
        # 4. Median
        data_dict['median'].append(df[col].median())
        
        # 5. Mean
        data_dict['mean'].append(df[col].mean())
        
        # 6. Std dev
        data_dict['std_dev'].append(df[col].std())
        
        # 7. Period
        st_yr = df.index.min()
        end_yr = df.index.max()
        per = '%s-%s' % (st_yr, end_yr)
        data_dict['period'].append(per)

        # 8. M-K test
        # Drop missing values
        mk_df = df[[col]].dropna(how='any')
        
        # Only run stats if more than 1 valid value
        if len(mk_df) > 1:
            var_s, s, z, p, trend = mk_test(mk_df[col].values, stn_id, col)
            data_dict['mk_std_dev'].append(np.sqrt(var_s)) 
            data_dict['mk_stat'].append(s)
            data_dict['norm_mk_stat'].append(z)
            data_dict['mk_p_val'].append(p)
            data_dict['trend'].append(trend)      

            # 8. Sen's slope
            # First element of output gives median slope. Other results could
            # also be useful - see docs
            sslp = theilslopes(mk_df[col].values, mk_df.index, 0.95)[0]
            data_dict['sen_slp'].append(sslp)
            
        # Otherwise all NaN
        else:
            for par in ['mk_std_dev', 'mk_stat', 'norm_mk_stat', 
                        'mk_p_val', 'trend', 'sen_slp']:
                data_dict[par].append(np.nan)
    
    # Build to df
    res_df = pd.DataFrame(data_dict)
    res_df = res_df[['station_id', 'par_id', 'period', 'non_missing',
                     'mean', 'median', 'std_dev', 'mk_stat', 'norm_mk_stat',
                     'mk_p_val', 'mk_std_dev', 'trend', 'sen_slp']]    
    
    return res_df

3. Perform comparison



In [5]:

    
# Run analysis on test data and print results
out_df = wc_stats(raw_df)
del out_df['station_id']
out_df









    Out[5]:






  
    
      
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      mk_std_dev
      trend
      sen_slp
    
  
  
    
      0
      ESO4
      2002-2012
      11
      58.012019
      57.46313
      5.862422
      -43.0
      -3.269696
      0.001077
      12.845233
      decreasing
      -1.457400
    
    
      1
      ESO4X
      2002-2012
      11
      30.684133
      31.04439
      6.406605
      -37.0
      -2.802596
      0.005069
      12.845233
      decreasing
      -1.701966
    
    
      2
      ECL
      2002-2012
      11
      265.319257
      270.78103
      41.453419
      0.0
      0.000000
      1.000000
      12.806248
      no trend
      0.000000
    
    
      3
      ESO4CL
      2002-2012
      11
      323.331276
      336.15575
      43.184790
      -3.0
      -0.155700
      0.876270
      12.845233
      no trend
      -1.585009
    
    
      4
      TOC_DOC
      2002-2012
      11
      1.618182
      1.50000
      0.368288
      35.0
      2.685140
      0.007250
      12.662280
      increasing
      0.100000
    
    
      5
      ECAEMG
      2002-2012
      11
      92.309261
      92.07006
      7.587119
      9.0
      0.622799
      0.533417
      12.845233
      no trend
      0.423846
    
    
      6
      ECAXEMGX
      2002-2012
      11
      30.934119
      29.09293
      5.498533
      9.0
      0.622799
      0.533417
      12.845233
      no trend
      0.267245
    
    
      7
      ENO3
      2002-2012
      11
      17.467532
      16.78571
      4.295982
      -35.0
      -2.663086
      0.007743
      12.767145
      decreasing
      -0.964285
    
    
      8
      AL
      2002-2012
      1
      116.049250
      116.04925
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9
      ANC
      2002-2012
      11
      -2.767265
      -2.30612
      13.596531
      43.0
      3.269696
      0.001077
      12.845233
      increasing
      3.562240
    
    
      10
      ALK
      2002-2012
      11
      1.548045
      0.00000
      2.667981
      16.0
      1.510101
      0.131018
      9.933110
      no trend
      0.000000
    
    
      11
      HPLUSS
      2002-2012
      11
      4.891747
      4.36516
      2.403797
      -33.0
      -2.491197
      0.012731
      12.845233
      decreasing
      -0.539823

And below is the output from the Excel macro for comparison.



In [6]:

    
res_df









    Out[6]:






  
    
      
      par
      non_missing
      test_stat
      std_dev
      mk_stat
      mk_p_val
      sen_slp
    
  
  
    
      0
      ESO4
      11
      -43
      12.845233
      -3.347545
      0.000815
      -1.457400
    
    
      1
      ESO4X
      11
      -37
      12.845233
      -2.880446
      0.003971
      -1.701966
    
    
      2
      ECL
      11
      0
      12.806248
      0.000000
      1.000000
      0.000000
    
    
      3
      ESO4CL
      11
      -3
      12.845233
      -0.233550
      0.815335
      -1.585009
    
    
      4
      TOC_DOC
      11
      35
      12.662280
      2.764115
      0.005708
      0.100000
    
    
      5
      ECAEMG
      11
      9
      12.845233
      0.700649
      0.483522
      0.423846
    
    
      6
      ECAXEMGX
      11
      9
      12.845233
      0.700649
      0.483522
      0.267245
    
    
      7
      ENO3
      11
      -35
      12.767145
      -2.741412
      0.006118
      -0.964285
    
    
      8
      AL
      1
      0
      0.000000
      NaN
      NaN
      NaN
    
    
      9
      ANC
      11
      43
      12.845233
      3.347545
      0.000815
      3.562240
    
    
      10
      ALK
      11
      16
      9.933109
      1.610775
      0.107229
      0.000000
    
    
      11
      HPLUSS
      11
      -33
      12.845233
      -2.569047
      0.010198
      -0.539823

My code gives near-identical results to those from the Excel macro, although there are a few edge cases that might be worth investigating further. For example, if there are fewer than 10 non-null values, my code currently prints a warning. I'm not sure exactly what the Excel macro does yet, but in general it seems that for fewer than 10 values it's necessary to use a lookup table (see e.g. the Instructions sheet of the file here).

4. Get data from RESA2

The next step is to read the correct data directly from RESA2 and summarise it to look like raw_df, above. Start off by connecting to the database.



In [7]:

    
# Use custom RESA2 function to connect to db
r2_func_path = r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\Upload_Template\useful_resa2_code.py'
resa2 = imp.load_source('useful_resa2_code', r2_func_path)

engine, conn = resa2.connect_to_resa2()

Looking at the ICPW_STATISTICS table in RESA2, it seems as though trends have been assessed for 14 parameters and several different time periods for each site of interest. The length and number of time periods vary from site to site, so I'll need to check with Heleen regarding how these varaibles should be chosen. The 14 parameters are as follows:

ESO4
ESO4X
ECl
ESO4Cl
TOC_DOC
ECaEMg
ECaXEMgX
ENO3
Al
ANC
ALK
HPLUS
ESO4EClENO3
ENO3DIVENO3ESO4X

Many of these quantities are unfamiliar to me, but presumably the equations for calculating them can be found in Tore's code (which I can't find at present). Check with Heleen whether all of these are still required and find equations as necessary.

The other issue is how to aggregate the values in the database from their original temporal resolution to annual summaries. I assume the median annual value is probably appropriate in most cases, but it would be good to know what Tore did previosuly.

For now, I'll focus on:

Extracting the data from the database for a specified time period,
Calculating the required water chemistry parameters,
Taking annual medians and
Estimating the trend statistics.

It should then be fairly easy to modify this code later as necessary.

4.1. Equations

Some of the quantities listed above are straightforward to calculate.

4.1.1. Micro-equivalents per litre

The Es in the parameter names are just unit conversions to micro-equivalents per litre:

$$EPAR \; (\mu eq/l) = \frac{1.10^6 * valency}{molar \; mass \; (g/mol)} * PAR \; (g/l)$$

Molar masses and valencies for the key species listed above are given in the table below.



In [8]:

    
# Tabulate chemical properties
chem_dict = {'molar_mass':[96, 35, 40, 24, 14],
             'valency':[2, 1, 2, 2, 1],
             'resa2_ref_ratio':[0.103, 1., 0.037, 0.196, 'N/A']}

chem_df = pd.DataFrame(chem_dict, index=['SO4', 'Cl', 'Ca', 'Mg', 'NO3-N'])
chem_df = chem_df[['molar_mass', 'valency', 'resa2_ref_ratio']]

chem_df









    Out[8]:






  
    
      
      molar_mass
      valency
      resa2_ref_ratio
    
  
  
    
      SO4
      96
      2
      0.103
    
    
      Cl
      35
      1
      1
    
    
      Ca
      40
      2
      0.037
    
    
      Mg
      24
      2
      0.196
    
    
      NO3-N
      14
      1
      N/A

4.1.2. Sea-salt corrected values

The Xs are sea-salt corrected values (also sometimes denoted with an asterisk e.g. Ca*). They are calculated by comparison to chloride concentrations, which are generall assumed to be conservative. The usual equation is:

$$PARX = PAR_{sample} - \left[ \left( \frac{PAR}{Cl} \right)_{ref} * Cl_{sample} \right]$$

where $PAR_{sample}$ and $Cl_{sample}$ are the concentrations measured in the lake or river and $\left( \frac{PAR}{Cl} \right)_{ref}$ is (ideally) the long-term average concentration in incoming rainwater. In some cases the reference values are simply taken from sea water concentrations (ignoring effects such as evaporative fractionation etc.).

I'm not sure what values to assume, but by rearranging the above equations and applying it to data extarcted from RESA2 I can back-calculate the reference values. For example, brief testing using data from Italy, Switzerland and the Czech Republic implies that RESA2 uses a standard reference value for sulphate of 0.103.

The reference ratios inferred from RESA2 for the key species listed are given in the table above.

NB: In doing this I've identified some additional erros in the database, where this correction has not beeen performed correctly. For some reason, ESO4X values have been set to zero, despite valid ESO4 and ECl measurements being available. The problem only affects a handful od sample, but could be enough to generate false trends. Return to this later?

NB2: Leah's experiences with the RECOVER project suggest that assuming a single reference concentration for all countires in the world is a bad idea. For example, I believe in e.g. the Czech Republic and Italy it is usual not to calculate sea-salt corrected concentrations at all, because most of the chloride input comes from industry rather than marine sources. Rainwater concentrations are also likely to vary dramatically from place to place, especially given the range of geographic and climatic conditions covered by this project. Check with Heleen.

4.1.3. ANC

Need to calculate this ANC, ALK, HPLUS and ENO3DIVENO3ESO4X.

4.2. Choose projects

The first step is to specify a list of RESA2 projects and get the stations associated with them.



In [9]:

    
# Get stations for a specified list of projects
proj_list = ['ICPW_TOCTRENDS_2015_CZ', 'ICPW_TOCTRENDS_2015_IT']

sql = ('SELECT station_id, station_code '
       'FROM resa2.stations '
       'WHERE station_id IN (SELECT UNIQUE(station_id) '
                            'FROM resa2.projects_stations '
                            'WHERE project_id IN (SELECT project_id '
                            'FROM resa2.projects '
                            'WHERE project_name IN %s))'
                            % str(tuple(proj_list)))

stn_df = pd.read_sql(sql, engine)

stn_df









    Out[9]:






  
    
      
      station_id
      station_code
    
  
  
    
      0
      23499
      CZ01
    
    
      1
      23500
      CZ02
    
    
      2
      23501
      CZ03
    
    
      3
      23502
      CZ04
    
    
      4
      23503
      CZ05
    
    
      5
      23504
      CZ06
    
    
      6
      33325
      CZ07
    
    
      7
      37745
      CZ08
    
    
      8
      33326
      CZ09

4.3. Extract time series

The next step is to get time series for the desired parameters for each of these stations.



In [10]:

    
# Specify parameters of interest
par_list = ['SO4', 'Cl', 'Ca', 'Mg', 'NO3-N', 'TOC', 'Al']

if 'DOC' in par_list:
    print ('The database treats DOC and TOC similarly.\n'
           'You should probably enter "TOC" instead')
    
# Check pars are valid
if len(par_list)==1:  
    sql = ("SELECT * FROM resa2.parameter_definitions "
           "WHERE name = '%s'" % par_list[0])
else:
    sql = ('SELECT * FROM resa2.parameter_definitions '
           'WHERE name in %s' % str(tuple(par_list)))
    
par_df = pd.read_sql_query(sql, engine)

assert len(par_df) == len(par_list), 'One or more parameters not valid.'



In [11]:

    
# Get results for ALL pars for sites and period of interest
if len(stn_df)==1:
    sql = ("SELECT * FROM resa2.water_chemistry_values2 "
           "WHERE sample_id IN (SELECT water_sample_id FROM resa2.water_samples "
                               "WHERE station_id = %s)"
                               % stn_df['station_id'].iloc[0])    
else:
    sql = ("SELECT * FROM resa2.water_chemistry_values2 "
           "WHERE sample_id IN (SELECT water_sample_id FROM resa2.water_samples "
                               "WHERE station_id IN %s)"
                               % str(tuple(stn_df['station_id'].values)))
    
wc_df = pd.read_sql_query(sql, engine)

# Get all sample dates for sites and period of interest
if len(stn_df)==1:
    sql = ("SELECT water_sample_id, station_id, sample_date "
           "FROM resa2.water_samples "
           "WHERE station_id = %s " % stn_df['station_id'].iloc[0])    
else:
    sql = ("SELECT water_sample_id, station_id, sample_date "
           "FROM resa2.water_samples "
           "WHERE station_id IN %s " % str(tuple(stn_df['station_id'].values)))
    
samp_df = pd.read_sql_query(sql, engine)

# Join in par IDs based on method IDs
sql = ('SELECT * FROM resa2.wc_parameters_methods')
meth_par_df = pd.read_sql_query(sql, engine)

wc_df = pd.merge(wc_df, meth_par_df, how='left',
                 left_on='method_id', right_on='wc_method_id')

# Get just the parameters of interest
wc_df = wc_df.query('wc_parameter_id in %s' % str(tuple(par_df['parameter_id'].values)))

# Join in sample dates
wc_df = pd.merge(wc_df, samp_df, how='left',
                 left_on='sample_id', right_on='water_sample_id')

# Join in parameter units
sql = ('SELECT * FROM resa2.parameter_definitions')
all_par_df = pd.read_sql_query(sql, engine)

wc_df = pd.merge(wc_df, all_par_df, how='left',
                 left_on='wc_parameter_id', right_on='parameter_id')

# Join in station codes
wc_df = pd.merge(wc_df, stn_df, how='left',
                 left_on='station_id', right_on='station_id')

# Convert units
wc_df['value'] = wc_df['value'] * wc_df['conversion_factor']

# Extract columns of interest
wc_df = wc_df[['station_id', 'sample_date', 'name', 'value']]

# Unstack
wc_df.set_index(['station_id', 'sample_date', 'name'], inplace=True)
wc_df = wc_df.unstack(level='name')
wc_df.columns = wc_df.columns.droplevel()
wc_df.reset_index(inplace=True)
wc_df.columns.name = None

wc_df.head()









    Out[11]:






  
    
      
      station_id
      sample_date
      Al
      Ca
      Cl
      Mg
      NO3-N
      SO4
      TOC
    
  
  
    
      0
      23499
      1984-06-04
      929.0
      1.08
      0.90
      0.65
      NaN
      6.81
      3.9
    
    
      1
      23499
      1984-08-04
      840.0
      1.05
      0.92
      0.60
      964.0
      6.51
      4.2
    
    
      2
      23499
      1985-06-05
      851.0
      1.01
      0.89
      0.58
      NaN
      6.85
      NaN
    
    
      3
      23499
      1986-06-06
      918.0
      1.07
      0.99
      0.56
      NaN
      7.01
      NaN
    
    
      4
      23499
      1986-08-06
      795.0
      1.01
      1.07
      0.54
      NaN
      6.77
      NaN

4.4. Aggregate to annual



In [12]:

    
# Extract year from date column
wc_df['year'] = wc_df['sample_date'].map(lambda x: x.year)
del wc_df['sample_date']

# Groupby station_id and year
grpd = wc_df.groupby(['station_id', 'year'])

# Calculate median
wc_df = grpd.agg('median')

wc_df.head()

4.4. Convert units and apply sea-salt correction

I haven't calculated all 14 parameters here, as I'm not sure exactly what they all are. The ones I'm reasonably certain of are included below.



In [13]:

    
# 1. Convert to ueq/l
for par in ['SO4', 'Cl', 'Mg', 'Ca', 'NO3-N']:
    val = chem_df.ix[par, 'valency']
    mm = chem_df.ix[par, 'molar_mass']
    
    if par == 'NO3-N':
        wc_df['ENO3'] = wc_df[par] * val / mm
    else:
        wc_df['E%s' % par] = wc_df[par] * val * 1000. / mm

# 2. Apply sea-salt correction
for par in ['ESO4', 'EMg', 'ECa']:
    ref = chem_df.ix[par[1:], 'resa2_ref_ratio']
    wc_df['%sX' % par] = wc_df[par] - (ref*wc_df['ECl'])
    
# 3. Calculate combinations
# 3.1. ESO4 + ECl
wc_df['ESO4_ECl'] = wc_df['ESO4'] + wc_df['ECl']

# 3.2. ECa + EMg
wc_df['ECa_EMg'] = wc_df['ECa'] + wc_df['EMg']

# 3.3. ECaX + EMgX
wc_df['ECaX_EMgX'] = wc_df['ECaX'] + wc_df['EMgX']

# 3.4. ESO4 + ECl + ENO3
wc_df['ESO4_ECl_ENO3'] = wc_df['ESO4'] + wc_df['ECl'] + wc_df['ENO3']

# 4. Delete unnecessary columns and tidy
for col in ['SO4', 'Cl', 'Mg', 'Ca', 'NO3-N']:
    del wc_df[col]

wc_df.reset_index(inplace=True)
    
wc_df.head()









    Out[13]:






  
    
      
      station_id
      year
      Al
      TOC
      ESO4
      ECl
      EMg
      ECa
      ENO3
      ESO4X
      EMgX
      ECaX
      ESO4_ECl
      ECa_EMg
      ECaX_EMgX
      ESO4_ECl_ENO3
    
  
  
    
      0
      23499
      1984
      884.5
      4.05
      138.750000
      26.000000
      52.083333
      53.25
      68.857143
      136.072000
      46.987333
      52.288000
      164.750000
      105.333333
      99.275333
      233.607143
    
    
      1
      23499
      1985
      851.0
      NaN
      142.708333
      25.428571
      48.333333
      50.50
      NaN
      140.089190
      43.349333
      49.559143
      168.136905
      98.833333
      92.908476
      NaN
    
    
      2
      23499
      1986
      856.5
      NaN
      143.541667
      29.428571
      45.833333
      52.00
      NaN
      140.510524
      40.065333
      50.911143
      172.970238
      97.833333
      90.976476
      NaN
    
    
      3
      23499
      1987
      750.0
      NaN
      137.916667
      29.428571
      45.000000
      55.00
      NaN
      134.885524
      39.232000
      53.911143
      167.345238
      100.000000
      93.143143
      NaN
    
    
      4
      23499
      1988
      795.0
      NaN
      126.562500
      22.571429
      46.250000
      50.50
      NaN
      124.237643
      41.826000
      49.664857
      149.133929
      96.750000
      91.490857
      NaN

4.5. Calculate trends



In [14]:

    
def process_water_chem_df(stn_df, wc_df, st_yr=None, end_yr=None):
    """ Calculate statistics for the stations, parameters and time
        periods specified.
    
    Args:
        stn_df:   Dataframe of station_ids
        wc_df:    Dataframe of water chemistry time series for stations
                  and parameters of interest
        st_yr:    First year to include in analysis. Pass None to start
                  at the beginning of the series
        end_year: Last year to include in analysis. Pass None to start
                  at the beginning of the series
    
    Returns:
        Dataframe of statistics
    """
    # Container for output
    df_list = []

    # Loop over sites
    for stn_id in stn_df['station_id']:
        # Extract data for this site
        df = wc_df.query('station_id == @stn_id')

        # Modify col names
        names = list(df.columns)
        names[:2] = ['STATION_ID', 'YEAR']
        df.columns = names

        # Run analysis
        df_list.append(toc_stats(df, st_yr=st_yr, end_yr=end_yr))

    res_df = pd.concat(df_list, axis=0)
    
    return res_df
    
res_df = process_water_chem_df(stn_df, wc_df)

res_df.head()









    Out[14]:






  
    
      
      station_id
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      mk_std_dev
      trend
      sen_slp
    
  
  
    
      0
      23499
      Al
      1984-2014
      31
      493.670533
      487.500000
      217.901691
      -341.0
      -5.778782
      7.524345e-09
      58.835930
      decreasing
      -21.000000
    
    
      1
      23499
      TOC
      1984-2014
      22
      1.953409
      1.775000
      0.744290
      66.0
      1.833595
      6.671420e-02
      35.449495
      no trend
      0.043750
    
    
      2
      23499
      ESO4
      1984-2014
      31
      90.795274
      88.020833
      32.177584
      -429.0
      -7.274466
      3.477219e-13
      58.835930
      decreasing
      -3.523551
    
    
      3
      23499
      ECl
      1984-2014
      31
      20.778903
      21.142857
      4.846152
      -311.0
      -5.270412
      1.361179e-07
      58.818931
      decreasing
      -0.441558
    
    
      4
      23499
      EMg
      1984-2014
      31
      39.522452
      38.750000
      5.930091
      -355.0
      -6.022534
      1.717076e-09
      58.779248
      decreasing
      -0.583333

5. Compare to previous trends analysis

This seems to be working OK so far, but I need to do some more testing to see that my results more-or-less agree with those calculated previously by Tore. As a start, let's compare the results above with those in the ICPW_STATISTICS3 table of RESA2, which is where (I think) Tore has saved his previous output.



In [15]:

    
# Get results for test sites from RESA2
sql = ('SELECT * FROM resa2.icpw_statistics3 '
       'WHERE station_id IN %s' 
       % str(tuple(stn_df['station_id'].values)))

stat_df = pd.read_sql(sql, engine)

# Get just the cols to compare to my output
stat_df = stat_df[['station_id', 'parameter', 'period', 'nonmiss',
                   'average', 'median', 'stdev', 'test_stat', 
                   'mk_stat', 'mkp', 'senslope']]

stat_df.head(14).sort_values(by='parameter')









    Out[15]:






  
    
      
      station_id
      parameter
      period
      nonmiss
      average
      median
      stdev
      test_stat
      mk_stat
      mkp
      senslope
    
  
  
    
      8
      23499
      AL
      1990-2004
      15
      486.052433
      503.500000
      186.876929
      -83
      -4.107435
      0.000040
      -32.700000
    
    
      10
      23499
      ALK
      1990-2004
      14
      -15.896429
      -15.850000
      6.270918
      20
      1.096542
      0.272842
      0.500000
    
    
      9
      23499
      ANC
      1990-2004
      12
      -47.948231
      -53.510865
      14.813228
      48
      3.291482
      0.000997
      3.178531
    
    
      5
      23499
      ECAEMG
      1990-2004
      15
      84.136540
      83.398845
      9.581870
      -75
      -3.711537
      0.000206
      -1.968997
    
    
      6
      23499
      ECAXEMGX
      1990-2004
      15
      79.221717
      78.338350
      9.248877
      -71
      -3.513589
      0.000442
      -1.886138
    
    
      2
      23499
      ECL
      1990-2004
      15
      21.093655
      21.718895
      2.980660
      -44
      -2.180106
      0.029250
      -0.435917
    
    
      7
      23499
      ENO3
      1990-2004
      12
      65.111360
      66.036608
      7.299949
      -40
      -2.742902
      0.006090
      -1.234397
    
    
      13
      23499
      ENO3DIVENO3ESO4X
      1990-2004
      12
      0.428599
      0.427500
      0.028221
      32
      2.194322
      0.028212
      0.006281
    
    
      0
      23499
      ESO4
      1990-2004
      15
      95.643491
      97.541580
      18.305102
      -97
      -4.800255
      0.000002
      -4.089638
    
    
      3
      23499
      ESO4CL
      1990-2004
      15
      116.737145
      118.934785
      20.274570
      -95
      -4.701281
      0.000003
      -4.449103
    
    
      12
      23499
      ESO4ECLENO3
      1990-2004
      12
      175.508512
      176.306256
      23.280313
      -52
      -3.565772
      0.000363
      -5.589727
    
    
      1
      23499
      ESO4X
      1990-2004
      15
      93.470844
      95.253690
      18.117947
      -97
      -4.800255
      0.000002
      -4.033607
    
    
      11
      23499
      HPLUSS
      1990-2004
      15
      18.383278
      18.506625
      4.313829
      -69
      -3.414614
      0.000639
      -0.795074
    
    
      4
      23499
      TOC_DOC
      1990-2004
      11
      1.561364
      1.385000
      0.554342
      4
      0.312348
      0.754776
      0.009286

For e.g. site 23499, I can now re-run my code for the period from 1990 to 2004 and compare my results to those above.



In [16]:

    
# Re-run python analysis for the period 1990 - 2004
res_df = process_water_chem_df(stn_df, wc_df, st_yr=1990, end_yr=2004)

# Delete mk_std_dev as not relevant here
del res_df['mk_std_dev']

res_df.head(14).sort_values(by='par_id')









    Out[16]:






  
    
      
      station_id
      par_id
      period
      non_missing
      mean
      median
      std_dev
      mk_stat
      norm_mk_stat
      mk_p_val
      trend
      sen_slp
    
  
  
    
      0
      23499
      Al
      1990-2004
      15
      485.085767
      503.500000
      187.586374
      -83.0
      -4.057948
      0.000050
      decreasing
      -33.200000
    
    
      5
      23499
      ECa
      1990-2004
      15
      44.283333
      44.000000
      5.584470
      -75.0
      -3.662050
      0.000250
      decreasing
      -1.166667
    
    
      9
      23499
      ECaX
      1990-2004
      15
      43.489771
      43.186000
      5.524900
      -75.0
      -3.662050
      0.000250
      decreasing
      -1.179175
    
    
      12
      23499
      ECaX_EMgX
      1990-2004
      15
      79.786038
      78.874000
      9.366281
      -71.0
      -3.464102
      0.000532
      decreasing
      -1.899667
    
    
      11
      23499
      ECa_EMg
      1990-2004
      15
      84.783333
      84.000000
      9.708540
      -73.0
      -3.563076
      0.000367
      decreasing
      -1.981481
    
    
      3
      23499
      ECl
      1990-2004
      15
      21.447619
      22.000000
      3.017861
      -44.0
      -2.130559
      0.033126
      decreasing
      -0.441558
    
    
      4
      23499
      EMg
      1990-2004
      15
      40.500000
      40.000000
      4.170237
      -75.0
      -3.671052
      0.000242
      decreasing
      -0.833333
    
    
      8
      23499
      EMgX
      1990-2004
      15
      36.296267
      35.930000
      3.924888
      -67.0
      -3.266153
      0.001090
      decreasing
      -0.739152
    
    
      6
      23499
      ENO3
      1990-2004
      12
      64.866071
      66.037500
      7.380028
      -40.0
      -2.674329
      0.007488
      decreasing
      -1.278393
    
    
      2
      23499
      ESO4
      1990-2004
      15
      95.625000
      98.125000
      18.527533
      -95.0
      -4.651794
      0.000003
      decreasing
      -4.083333
    
    
      7
      23499
      ESO4X
      1990-2004
      15
      93.415895
      95.947286
      18.336954
      -95.0
      -4.651794
      0.000003
      decreasing
      -3.965619
    
    
      10
      23499
      ESO4_ECl
      1990-2004
      15
      117.072619
      119.267857
      20.531061
      -95.0
      -4.651794
      0.000003
      decreasing
      -4.457792
    
    
      13
      23499
      ESO4_ECl_ENO3
      1990-2004
      12
      175.649554
      176.662202
      23.824901
      -50.0
      -3.360055
      0.000779
      decreasing
      -5.648674
    
    
      1
      23499
      TOC
      1990-2004
      11
      1.561364
      1.385000
      0.554342
      4.0
      0.234261
      0.814783
      no trend
      0.009286

The numbers in the above two tables are almost identical, which is actually pretty remarkable given that I'm second-guessing a lot of the decisions in Tore's analysis (having never seen his code) and also recoding everything from scratch. It certainly looks as though this will be a viable alternative for re-running the trends analysis.

There are still some loose ends to tie up. In particular, I need to add a few more parameters to the trends calculation, but that shouldn't be difficult once I've spoken to Heleen to find out what they are and how to calculate them. In the meantime, I'm sufficiently happy with this output to move the code into a separate module and then continue to explore the data in a new notebook.

NB: A nice way to visualise these results would be to create a google map, where each point is coloured according to 'increasing', 'decreasing' or 'no trend'. A pop-up on each point could then give the main summary statistics and a time series plot with the Sen's slope line overlaid. This would result in lots of points on top on each other, but users could filter the map to just show one parameter at a time to avoid things becoming too cluttered.

	STATION_ID	YEAR	ESO4	ESO4X	ECL	ESO4CL	TOC_DOC	ECAEMG	ECAXEMGX	ENO3	AL	ANC	ALK	HPLUSS
0	196	2002	65.37472	37.48427	270.78103	336.15575	1.1	90.04695	26.95497	25.35714	NaN	-29.62386	0.0000	6.45654
1	196	2003	67.66492	41.80815	251.03658	318.70150	1.2	91.65460	33.16308	24.28571	NaN	-16.01517	0.0000	7.58578
2	196	2004	58.29593	37.40714	202.80371	261.09964	1.5	76.21036	28.95710	17.85714	116.04925	-10.20958	0.0000	5.49541
3	196	2005	63.29272	31.04439	313.09057	376.38329	1.5	102.44073	29.49062	17.85714	NaN	-8.89069	0.0000	8.91251
4	196	2006	58.50413	33.60609	241.72849	300.23262	1.6	92.07006	35.74731	16.78571	NaN	-0.40248	0.0000	7.24436
5	196	2007	57.25493	26.74975	296.16676	353.42169	1.9	97.15416	28.14731	19.64286	NaN	-0.34566	5.2920	3.38844
6	196	2008	57.46313	26.95795	296.16676	353.62989	1.3	96.15616	27.14931	15.00000	NaN	-4.50689	0.0000	3.89045
7	196	2009	54.34013	21.51075	318.73184	373.07197	1.5	93.01387	23.63606	12.14286	NaN	-2.30612	0.0000	4.36516
8	196	2010	55.58933	31.47571	234.11277	289.70210	2.0	83.64120	29.09293	15.35714	NaN	15.23056	5.2920	2.13796
9	196	2011	46.63674	25.68985	203.36784	250.00458	2.3	91.07205	43.68734	12.14286	NaN	12.07122	6.4445	2.04174
10	196	2012	53.71553	23.79141	290.52548	344.24101	1.9	101.94173	34.24928	15.71429	NaN	14.55875	0.0000	2.29087

	par	non_missing	test_stat	std_dev	mk_stat	mk_p_val	sen_slp
0	ESO4	11	-43	12.845233	-3.347545	0.000815	-1.457400
1	ESO4X	11	-37	12.845233	-2.880446	0.003971	-1.701966
2	ECL	11	0	12.806248	0.000000	1.000000	0.000000
3	ESO4CL	11	-3	12.845233	-0.233550	0.815335	-1.585009
4	TOC_DOC	11	35	12.662280	2.764115	0.005708	0.100000
5	ECAEMG	11	9	12.845233	0.700649	0.483522	0.423846
6	ECAXEMGX	11	9	12.845233	0.700649	0.483522	0.267245
7	ENO3	11	-35	12.767145	-2.741412	0.006118	-0.964285
8	AL	1	0	0.000000	NaN	NaN	NaN
9	ANC	11	43	12.845233	3.347545	0.000815	3.562240
10	ALK	11	16	9.933109	1.610775	0.107229	0.000000
11	HPLUSS	11	-33	12.845233	-2.569047	0.010198	-0.539823

	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	mk_std_dev	trend	sen_slp
0	ESO4	2002-2012	11	58.012019	57.46313	5.862422	-43.0	-3.269696	0.001077	12.845233	decreasing	-1.457400
1	ESO4X	2002-2012	11	30.684133	31.04439	6.406605	-37.0	-2.802596	0.005069	12.845233	decreasing	-1.701966
2	ECL	2002-2012	11	265.319257	270.78103	41.453419	0.0	0.000000	1.000000	12.806248	no trend	0.000000
3	ESO4CL	2002-2012	11	323.331276	336.15575	43.184790	-3.0	-0.155700	0.876270	12.845233	no trend	-1.585009
4	TOC_DOC	2002-2012	11	1.618182	1.50000	0.368288	35.0	2.685140	0.007250	12.662280	increasing	0.100000
5	ECAEMG	2002-2012	11	92.309261	92.07006	7.587119	9.0	0.622799	0.533417	12.845233	no trend	0.423846
6	ECAXEMGX	2002-2012	11	30.934119	29.09293	5.498533	9.0	0.622799	0.533417	12.845233	no trend	0.267245
7	ENO3	2002-2012	11	17.467532	16.78571	4.295982	-35.0	-2.663086	0.007743	12.767145	decreasing	-0.964285
8	AL	2002-2012	1	116.049250	116.04925	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	ANC	2002-2012	11	-2.767265	-2.30612	13.596531	43.0	3.269696	0.001077	12.845233	increasing	3.562240
10	ALK	2002-2012	11	1.548045	0.00000	2.667981	16.0	1.510101	0.131018	9.933110	no trend	0.000000
11	HPLUSS	2002-2012	11	4.891747	4.36516	2.403797	-33.0	-2.491197	0.012731	12.845233	decreasing	-0.539823

	station_id	station_code
0	23499	CZ01
1	23500	CZ02
2	23501	CZ03
3	23502	CZ04
4	23503	CZ05
5	23504	CZ06
6	33325	CZ07
7	37745	CZ08
8	33326	CZ09

	station_id	sample_date	Al	Ca	Cl	Mg	NO3-N	SO4	TOC
0	23499	1984-06-04	929.0	1.08	0.90	0.65	NaN	6.81	3.9
1	23499	1984-08-04	840.0	1.05	0.92	0.60	964.0	6.51	4.2
2	23499	1985-06-05	851.0	1.01	0.89	0.58	NaN	6.85	NaN
3	23499	1986-06-06	918.0	1.07	0.99	0.56	NaN	7.01	NaN
4	23499	1986-08-06	795.0	1.01	1.07	0.54	NaN	6.77	NaN

	station_id	year	Al	TOC	ESO4	ECl	EMg	ECa	ENO3	ESO4X	EMgX	ECaX	ESO4_ECl	ECa_EMg	ECaX_EMgX	ESO4_ECl_ENO3
0	23499	1984	884.5	4.05	138.750000	26.000000	52.083333	53.25	68.857143	136.072000	46.987333	52.288000	164.750000	105.333333	99.275333	233.607143
1	23499	1985	851.0	NaN	142.708333	25.428571	48.333333	50.50	NaN	140.089190	43.349333	49.559143	168.136905	98.833333	92.908476	NaN
2	23499	1986	856.5	NaN	143.541667	29.428571	45.833333	52.00	NaN	140.510524	40.065333	50.911143	172.970238	97.833333	90.976476	NaN
3	23499	1987	750.0	NaN	137.916667	29.428571	45.000000	55.00	NaN	134.885524	39.232000	53.911143	167.345238	100.000000	93.143143	NaN
4	23499	1988	795.0	NaN	126.562500	22.571429	46.250000	50.50	NaN	124.237643	41.826000	49.664857	149.133929	96.750000	91.490857	NaN

	station_id	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	mk_std_dev	trend	sen_slp
0	23499	Al	1984-2014	31	493.670533	487.500000	217.901691	-341.0	-5.778782	7.524345e-09	58.835930	decreasing	-21.000000
1	23499	TOC	1984-2014	22	1.953409	1.775000	0.744290	66.0	1.833595	6.671420e-02	35.449495	no trend	0.043750
2	23499	ESO4	1984-2014	31	90.795274	88.020833	32.177584	-429.0	-7.274466	3.477219e-13	58.835930	decreasing	-3.523551
3	23499	ECl	1984-2014	31	20.778903	21.142857	4.846152	-311.0	-5.270412	1.361179e-07	58.818931	decreasing	-0.441558
4	23499	EMg	1984-2014	31	39.522452	38.750000	5.930091	-355.0	-6.022534	1.717076e-09	58.779248	decreasing	-0.583333

	station_id	parameter	period	nonmiss	average	median	stdev	test_stat	mk_stat	mkp	senslope
8	23499	AL	1990-2004	15	486.052433	503.500000	186.876929	-83	-4.107435	0.000040	-32.700000
10	23499	ALK	1990-2004	14	-15.896429	-15.850000	6.270918	20	1.096542	0.272842	0.500000
9	23499	ANC	1990-2004	12	-47.948231	-53.510865	14.813228	48	3.291482	0.000997	3.178531
5	23499	ECAEMG	1990-2004	15	84.136540	83.398845	9.581870	-75	-3.711537	0.000206	-1.968997
6	23499	ECAXEMGX	1990-2004	15	79.221717	78.338350	9.248877	-71	-3.513589	0.000442	-1.886138
2	23499	ECL	1990-2004	15	21.093655	21.718895	2.980660	-44	-2.180106	0.029250	-0.435917
7	23499	ENO3	1990-2004	12	65.111360	66.036608	7.299949	-40	-2.742902	0.006090	-1.234397
13	23499	ENO3DIVENO3ESO4X	1990-2004	12	0.428599	0.427500	0.028221	32	2.194322	0.028212	0.006281
0	23499	ESO4	1990-2004	15	95.643491	97.541580	18.305102	-97	-4.800255	0.000002	-4.089638
3	23499	ESO4CL	1990-2004	15	116.737145	118.934785	20.274570	-95	-4.701281	0.000003	-4.449103
12	23499	ESO4ECLENO3	1990-2004	12	175.508512	176.306256	23.280313	-52	-3.565772	0.000363	-5.589727
1	23499	ESO4X	1990-2004	15	93.470844	95.253690	18.117947	-97	-4.800255	0.000002	-4.033607
11	23499	HPLUSS	1990-2004	15	18.383278	18.506625	4.313829	-69	-3.414614	0.000639	-0.795074
4	23499	TOC_DOC	1990-2004	11	1.561364	1.385000	0.554342	4	0.312348	0.754776	0.009286

	station_id	par_id	period	non_missing	mean	median	std_dev	mk_stat	norm_mk_stat	mk_p_val	trend	sen_slp
0	23499	Al	1990-2004	15	485.085767	503.500000	187.586374	-83.0	-4.057948	0.000050	decreasing	-33.200000
5	23499	ECa	1990-2004	15	44.283333	44.000000	5.584470	-75.0	-3.662050	0.000250	decreasing	-1.166667
9	23499	ECaX	1990-2004	15	43.489771	43.186000	5.524900	-75.0	-3.662050	0.000250	decreasing	-1.179175
12	23499	ECaX_EMgX	1990-2004	15	79.786038	78.874000	9.366281	-71.0	-3.464102	0.000532	decreasing	-1.899667
11	23499	ECa_EMg	1990-2004	15	84.783333	84.000000	9.708540	-73.0	-3.563076	0.000367	decreasing	-1.981481
3	23499	ECl	1990-2004	15	21.447619	22.000000	3.017861	-44.0	-2.130559	0.033126	decreasing	-0.441558
4	23499	EMg	1990-2004	15	40.500000	40.000000	4.170237	-75.0	-3.671052	0.000242	decreasing	-0.833333
8	23499	EMgX	1990-2004	15	36.296267	35.930000	3.924888	-67.0	-3.266153	0.001090	decreasing	-0.739152
6	23499	ENO3	1990-2004	12	64.866071	66.037500	7.380028	-40.0	-2.674329	0.007488	decreasing	-1.278393
2	23499	ESO4	1990-2004	15	95.625000	98.125000	18.527533	-95.0	-4.651794	0.000003	decreasing	-4.083333
7	23499	ESO4X	1990-2004	15	93.415895	95.947286	18.336954	-95.0	-4.651794	0.000003	decreasing	-3.965619
10	23499	ESO4_ECl	1990-2004	15	117.072619	119.267857	20.531061	-95.0	-4.651794	0.000003	decreasing	-4.457792
13	23499	ESO4_ECl_ENO3	1990-2004	12	175.649554	176.662202	23.824901	-50.0	-3.360055	0.000779	decreasing	-5.648674
1	23499	TOC	1990-2004	11	1.561364	1.385000	0.554342	4.0	0.234261	0.814783	no trend	0.009286