In [11]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sn
import pandas as pd
import numpy as np
import imp
from scipy.stats import theilslopes

sn.set_context('talk')

In [12]:
# Import custom functions
func_path = (r'C:\Data\James_Work\Staff\Heleen_d_W\ICP_Waters\TOC_Trends_Analysis_2015'
             r'\Python\icpw\toc_trends_analysis.py')

icpw = imp.load_source('toc_trends_analysis', func_path)

Rolling Sen's slope

Basic proof of concept for rolling Sen's slope idea.

Key questions:

  1. Is TOC increasing (and, if so, is the trend significant)?

  2. Is the rate of increase slowing (and, if so, is the slowing trend significant)?

1. Generate fake data

First create some "toy" TOC data to experiment with. The data is noisy, but it initially increases rapidly and then levels off.


In [13]:
# Fake data
# Rising TOC, but *rise rate* decreasing through time
x = np.arange(1990, 2013)
y = np.log(1.5*(x-1989)) + np.random.uniform(low=0, high=1, size=len(x))
df = pd.DataFrame({'TOC':y}, index=x)
df.plot()


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fac1cf8>

2. Is TOC increasing?

We've already tested for this. Use Sen's slope to estimate the overall rate of change and then a Mann-Kendall test for significance.


In [14]:
# M-K test
res = icpw.mk_test(df['TOC'].values, 'test_stn', 'TOC')

print 'M-K test results: %s (p=%.4f)' % (res[4], res[3])

# Sen's slope
slp, intcp, lo, hi = theilslopes(y, x)

# Plot
y2 = slp*x + intcp
plt.plot(x, y)
plt.plot(x, y2)


M-K test results: increasing (p=0.0000)
Out[14]:
[<matplotlib.lines.Line2D at 0x1fba3a20>]

So there is an increasing overall trend (as we often find in our real data).

3. Is the rate of increase changing through time?

Idea:

  • Calculate Sen's slope using a 12-year moving window to give the rate of change through time

  • Use an M-K to test to see if there is a significant trend in this rate of change

NB: A 12 year moving window works well for a 22 year period: the minimum number of data points for reasonably robust M-K test is ~10, so using a 12 year window means the slope estimate for each segment is fairly robust. In addition, there are 12 twelve-year windows in a 22 year period, so the output is also large enough for a reliable trend test.


In [15]:
# Function for moving window
def sens_slp(df):
    """ Calculate Sen's slope estimate.
    """
    from scipy.stats import theilslopes
    
    res = theilslopes(df)
    
    return res[0]

In [16]:
# Calc rolling slopes
df2 = df.rolling(center=False, window=12).apply(func=sens_slp)
df2.plot()
plt.title('12 year windowed slopes (indexed by last year in each window)')
plt.ylabel('Slope')


Out[16]:
<matplotlib.text.Text at 0x213e32e8>

In general, the slopes seem to decrease through time, but is this change significant?


In [17]:
# Drop Nan
df2.dropna(inplace=True)

# M-K test
res = icpw.mk_test(df2['TOC'].values, 'test_stn', 'TOC')

print 'M-K test results: %s (p=%.4f)' % (res[4], res[3])

# Sen's slope
slp, intcp, lo, hi = theilslopes(df2['TOC'], df2.index)

# Plot
y3 = slp*df2.index + intcp
plt.plot(df2.index, df2['TOC'])
plt.plot(df2.index, y3)


M-K test results: decreasing (p=0.0001)
Out[17]:
[<matplotlib.lines.Line2D at 0x1f2c2d68>]

So there is evidience for a significant decrease in the rate of increase of TOC through time.

4. Quick visual check that the moving window is working sensibly

Plot the slope line for each 12-year segment.


In [18]:
def sens_intcp(df):
    """ Get intercept to go with slope estimate.
    """
    from scipy.stats import theilslopes
        
    res = theilslopes(df)
    
    return res[1]

In [19]:
# Get intercepts
incpts = df.rolling(center=False,window=10).apply(func=sens_intcp)
incpts.columns = ['c']

df2 = df2.join(incpts)
df2


Out[19]:
TOC c
2001 0.225395 2.022003
2002 0.205547 2.058495
2003 0.157749 2.360189
2004 0.148808 2.482201
2005 0.119367 2.872052
2006 0.113325 3.155391
2007 0.100009 3.256546
2008 0.068186 3.597205
2009 0.070283 3.626793
2010 0.032969 3.668021
2011 0.024983 3.572512
2012 0.037908 3.530902

In [20]:
fig = plt.figure(figsize=(15, 15))

# Plot slopes for each 10 year window
for yr in range(2001, 2013):
    xi = np.arange(yr-11, yr+1)
    slp = df2.ix[yr, 'TOC']
    inc = df2.ix[yr, 'c']
    yi = slp*(xi-yr+11) + inc
    plt.plot(xi, yi, c='k', alpha=0.5)

# Original data
plt.plot(x, y, lw=5)


Out[20]:
[<matplotlib.lines.Line2D at 0x1fc15748>]

This looks reasonable at first glance.