Sealevel monitor

This document is used to monitor the current sea level along the Dutch coast. The sea level is measured using a number of tide gauges. Six long running tide gauges are considered "main stations". The mean of these stations is used to estimate the "current sea-level rise". The measurements since 1890 are taken into account. Measurements before that are considered less valid because the Amsterdam Ordnance Datum was not yet normalized.



In [30]:

    
# this is a list of packages that are used in this notebook
# these come with python
import io
import zipfile
import functools

# you can install these packages using pip or anaconda
# (requests numpy pandas bokeh pyproj statsmodels)

# for downloading
import requests

# computation libraries
import numpy as np
import pandas

# coordinate systems
import pyproj 

# statistics
import statsmodels.api as sm

# plotting
import bokeh.charts
import bokeh.io
import bokeh.plotting
import bokeh.tile_providers
import bokeh.palettes

# displaying things
from ipywidgets import Image
import IPython.display



In [31]:

    
# Some coordinate systems
WEBMERCATOR = pyproj.Proj(init='epsg:3857')
WGS84 = pyproj.Proj(init='epsg:4326')

# If this notebook is not showing up with figures, you can use the following url:
# https://nbviewer.ipython.org/github/openearth/notebooks/blob/master/sealevelmonitor.ipynb
bokeh.io.output_notebook()









    





    
        
        Loading BokehJS ...

The global collection of tide gauge records at the PSMSL is used to access the data. The other way to access the data is to ask the service desk data at Rijkswaterstaat. There are two types of datasets the "Revised Local Reference" and "Metric". For the Netherlands the difference is that the "Revised Local Reference" undoes the corrections from the NAP correction in 2014, to get a consistent dataset.



In [32]:

    
urls = {
    'metric_monthly': 'http://www.psmsl.org/data/obtaining/met.monthly.data/met_monthly.zip',
    'rlr_monthly': 'http://www.psmsl.org/data/obtaining/rlr.annual.data/rlr_monthly.zip',
    'rlr_annual': 'http://www.psmsl.org/data/obtaining/rlr.annual.data/rlr_annual.zip'
}
dataset_name = 'rlr_annual'



In [33]:

    
# these compute the rlr back to NAP (ignoring the undoing of the NAP correction)
main_stations = {
    20: {
        'name': 'Vlissingen', 
        'rlr2nap': lambda x: x - (6976-46)
    },
    22: {
        'name': 'Hoek van Holland', 
        'rlr2nap': lambda x:x - (6994 - 121)
    },
    23: {
        'name': 'Den Helder', 
        'rlr2nap': lambda x: x - (6988-42)
    },
    24: {
        'name': 'Delfzijl', 
        'rlr2nap': lambda x: x - (6978-155)
    },
    25: {
        'name': 'Harlingen', 
        'rlr2nap': lambda x: x - (7036-122)
    },
    32: {
        'name': 'IJmuiden', 
        'rlr2nap': lambda x: x - (7033-83)
    }
}



In [34]:

    
# the main stations are defined by their ids
main_stations_idx = list(main_stations.keys())
main_stations_idx









    Out[34]:





[32, 20, 22, 23, 24, 25]



In [35]:

    
# download the zipfile
resp = requests.get(urls[dataset_name])

# we can read the zipfile
stream = io.BytesIO(resp.content)
zf = zipfile.ZipFile(stream)

# this list contains a table of 
# station ID, latitude, longitude, station name, coastline code, station code, and quality flag
csvtext = zf.read('{}/filelist.txt'.format(dataset_name))

stations = pandas.read_csv(
    io.BytesIO(csvtext), 
    sep=';',
    names=('id', 'lat', 'lon', 'name', 'coastline_code', 'station_code', 'quality'),
    converters={
        'name': str.strip,
        'quality': str.strip
    }
)
stations = stations.set_index('id')

# the dutch stations in the PSMSL database, make a copy
# or use stations.coastline_code == 150 for all dutch stations
selected_stations = stations.ix[main_stations_idx].copy()
# set the main stations, this should be a list of 6 stations
selected_stations









    Out[35]:






  
    
      
      lat
      lon
      name
      coastline_code
      station_code
      quality
    
    
      id
      
      
      
      
      
      
    
  
  
    
      32
      52.462222
      4.554722
      IJMUIDEN
      150
      41
      N
    
    
      20
      51.442222
      3.596111
      VLISSINGEN
      150
      101
      N
    
    
      22
      51.977500
      4.120000
      HOEK VAN HOLLAND
      150
      51
      N
    
    
      23
      52.964444
      4.745000
      DEN HELDER
      150
      31
      N
    
    
      24
      53.326389
      6.933056
      DELFZIJL
      150
      1
      N
    
    
      25
      53.175556
      5.409444
      HARLINGEN
      150
      21
      N



In [ ]:



In [36]:

    
# show all the stations on a map

# compute the bounds of the plot
sw = (50, -5)
ne = (55, 10)
# transform to web mercator
sw_wm = pyproj.transform(WGS84, WEBMERCATOR, sw[1], sw[0])
ne_wm = pyproj.transform(WGS84, WEBMERCATOR, ne[1], ne[0])
# create a plot
fig = bokeh.plotting.figure(tools='pan, wheel_zoom', plot_width=600, plot_height=200, x_range=(sw_wm[0], ne_wm[0]), y_range=(sw_wm[1], ne_wm[1]))
fig.axis.visible = False
# add some background tiles
fig.add_tile(bokeh.tile_providers.STAMEN_TERRAIN)
# add the stations
x, y = pyproj.transform(WGS84, WEBMERCATOR, np.array(stations.lon), np.array(stations.lat))
fig.circle(x, y)
x, y = pyproj.transform(WGS84, WEBMERCATOR, np.array(selected_stations.lon), np.array(selected_stations.lat))
_ = fig.circle(x, y, color='red')



In [37]:

    
# show the plot
bokeh.io.show(fig)

Now that we have defined which tide gauges we are monitoring we can start downloading the relevant data.



In [38]:

    
# each station has a number of files that you can look at.
# here we define a template for each filename

# stations that we are using for our computation
# define the name formats for the relevant files
names = {
    'datum': '{dataset}/RLR_info/{id}.txt',
    'diagram': '{dataset}/RLR_info/{id}.png',
    'url': 'http://www.psmsl.org/data/obtaining/rlr.diagrams/{id}.php',
    'data': '{dataset}/data/{id}.rlrdata',
    'doc': '{dataset}/docu/{id}.txt',
    'contact': '{dataset}/docu/{id}_auth.txt'
}



In [39]:

    
def get_url(station, dataset):
    """return the url of the station information (diagram and datum)"""
    info = dict(
        dataset=dataset,
        id=station.name
    )
    url = names['url'].format(**info)
    return url
# fill in the dataset parameter using the global dataset_name
f = functools.partial(get_url, dataset=dataset_name)
# compute the url for each station
selected_stations['url'] = selected_stations.apply(f, axis=1)
selected_stations









    Out[39]:






  
    
      
      lat
      lon
      name
      coastline_code
      station_code
      quality
      url
    
    
      id
      
      
      
      
      
      
      
    
  
  
    
      32
      52.462222
      4.554722
      IJMUIDEN
      150
      41
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...
    
    
      20
      51.442222
      3.596111
      VLISSINGEN
      150
      101
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...
    
    
      22
      51.977500
      4.120000
      HOEK VAN HOLLAND
      150
      51
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...
    
    
      23
      52.964444
      4.745000
      DEN HELDER
      150
      31
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...
    
    
      24
      53.326389
      6.933056
      DELFZIJL
      150
      1
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...
    
    
      25
      53.175556
      5.409444
      HARLINGEN
      150
      21
      N
      http://www.psmsl.org/data/obtaining/rlr.diagra...



In [40]:

    
def missing2nan(value, missing=-99999):
    """convert the value to nan if the float of value equals the missing value"""
    value = float(value)
    if value == missing:
        return np.nan
    return value

def get_data(station, dataset):
    """get data for the station (pandas record) from the dataset (url)"""
    info = dict(
        dataset=dataset,
        id=station.name
    )
    bytes = zf.read(names['data'].format(**info))
    df = pandas.read_csv(
        io.BytesIO(bytes), 
        sep=';', 
        names=('year', 'height', 'interpolated', 'flags'),
        converters={
            "height": lambda x: main_stations[station.name]['rlr2nap'](missing2nan(x)),
            "interpolated": str.strip,
        }
    )
    df['station'] = station.name
    return df



In [41]:

    
# get data for all stations
f = functools.partial(get_data, dataset=dataset_name)
# look up the data for each station
selected_stations['data'] = [f(station) for _, station in selected_stations.iterrows()]



In [42]:

    
# we now have data for each station
selected_stations[['name', 'data']]









    Out[42]:






  
    
      
      name
      data
    
    
      id
      
      
    
  
  
    
      32
      IJMUIDEN
      year  height interpolated  flags  station...
    
    
      20
      VLISSINGEN
      year  height interpolated  flags  station...
    
    
      22
      HOEK VAN HOLLAND
      year  height interpolated  flags  station...
    
    
      23
      DEN HELDER
      year  height interpolated  flags  station...
    
    
      24
      DELFZIJL
      year  height interpolated  flags  station...
    
    
      25
      HARLINGEN
      year  height interpolated  flags  station...

Now that we have all data downloaded we can compute the mean.



In [43]:

    
# compute the mean
grouped = pandas.concat(selected_stations['data'].tolist())[['year', 'height']].groupby('year')
mean_df = grouped.mean().reset_index()
# filter out non-trusted part (before NAP)
mean_df = mean_df[mean_df['year'] >= 1890].copy()



In [44]:

    
# these are the mean waterlevels 
mean_df.tail()



In [45]:

    
# show all the stations, including the mean
title = 'Sea-surface height for Dutch tide gauges [{year_min} - {year_max}]'.format(
    year_min=mean_df.year.min(),
    year_max=mean_df.year.max() 
)
fig = bokeh.plotting.figure(title=title, x_range=(1860, 2020), plot_width=900, plot_height=400)
colors = bokeh.palettes.Accent6
for color, (id_, station) in zip(colors, selected_stations.iterrows()):
    data = station['data']
    fig.circle(data.year, data.height, color=color, legend=station['name'], alpha=0.5)
fig.line(mean_df.year, mean_df.height, line_width=3, alpha=0.7, color='black', legend='Mean')
fig.legend.location = "bottom_right"
fig.yaxis.axis_label = 'waterlevel [mm] above NAP'
fig.xaxis.axis_label = 'year'



In [46]:

    
bokeh.io.show(fig)



In [ ]:

Methods

Now we can define the statistical model. The "current sea-level rise" is defined by the following formula. Please note that the selected epoch of 1970 is arbitrary. $ H(t) = a + b_{trend}(t-1970) + b_u\cos(2\pi\frac{t - 1970}{18.613}) + b_v\sin(2\pi\frac{t - 1970}{18.613}) $

The terms are refered to as Constant ($a$), Trend ($b_{trend}$), Nodal U ($b_u$) and Nodal V ($b_v$).

Alternative models are used to detect if sea-level rise is increasing. These models include the broken linear model, defined by a possible change in trend starting at 1993. This timespan is the start of the "satellite era" (start of TOPEX/Poseidon measurements), it is also often referred to as the start of acceleration because the satellite measurements tend to show a higher rate of sea level than the "tide-gauge era" (1900-2000). If this model fits better than the linear model, one could say that there is a "increase in sea-level rise".

$ H(t) = a + b_{trend}(t-1970) + b_{broken}(t > 1993)*(t-1993) + b_{u}\cos(2\pi\frac{t - 1970}{18.613}) + b_{v}\sin(2\pi\frac{t - 1970}{18.613}) $

Another way to look at increased sea-level rise is to look at sea-level acceleration. To detect sea-level acceleration one can use a quadratic model.

$ H(t) = a + b_{trend}(t-1970) + b_{quadratic}(t - 1970)*(t-1970) + b_{u}\cos(2\pi\frac{t - 1970}{18.613}) + b_{v}\sin(2\pi\frac{t - 1970}{18.613}) $



In [47]:

    
# define the statistical model
y = mean_df['height']
X = np.c_[
    mean_df['year']-1970, 
    np.cos(2*np.pi*(mean_df['year']-1970)/18.613),
    np.sin(2*np.pi*(mean_df['year']-1970)/18.613)
]
X = sm.add_constant(X)
model = sm.OLS(y, X)
fit = model.fit()



In [48]:

    
fit.summary(yname='Sea-surface height', xname=['Constant', 'Trend', 'Nodal U', 'Nodal V'])
# things to check:
# Durbin Watson should be >1 for no worries, >2 for no autocorrelation
# JB should be non-significant for normal residuals
# abs(x2.t) + abs(x3.t) should be > 3, otherwise adding nodal is not useful









    Out[48]:





OLS Regression Results

  Dep. Variable:     Sea-surface height    R-squared:             0.853


  Model:                     OLS           Adj. R-squared:        0.849


  Method:               Least Squares      F-statistic:           236.1


  Date:               Tue, 25 Apr 2017     Prob (F-statistic):  1.27e-50


  Time:                   13:27:07         Log-Likelihood:      -601.28


  No. Observations:           126          AIC:                   1211.


  Df Residuals:               122          BIC:                   1222.


  Df Model:                     3                                      


  Covariance Type:        nonrobust                                    




              coef      std err       t       P>|t|   [0.025     0.975]  


  Constant    -25.0320      2.877     -8.700   0.000    -30.728    -19.336


  Trend         1.8991      0.071     26.578   0.000      1.758      2.041


  Nodal U       6.2529      3.699      1.691   0.093     -1.069     13.575


  Nodal V     -12.1237      3.643     -3.328   0.001    -19.335     -4.912




  Omnibus:         3.045    Durbin-Watson:         1.706


  Prob(Omnibus):   0.218    Jarque-Bera (JB):      2.768


  Skew:           -0.363    Prob(JB):              0.251


  Kurtosis:        3.035    Cond. No.               57.9



In [49]:

    
fig = bokeh.plotting.figure(x_range=(1860, 2020), plot_width=900, plot_height=400)
for color, (id_, station) in zip(colors, selected_stations.iterrows()):
    data = station['data']
    fig.circle(data.year, data.height, color=color, legend=station['name'], alpha=0.8)
fig.circle(mean_df.year, mean_df.height, line_width=3, legend='Mean', color='black', alpha=0.5)
fig.line(mean_df.year, fit.predict(), line_width=3, legend='Current')
fig.legend.location = "bottom_right"
fig.yaxis.axis_label = 'waterlevel [mm] above N.A.P.'
fig.xaxis.axis_label = 'year'
bokeh.io.show(fig)

Is there a sea-level acceleration?

The following section computes two common models to detect sea-level acceleration. The broken linear model expects that sea level has been rising faster since 1990. The quadratic model assumes that the sea-level is accelerating continuously. Both models are compared to the linear model. The extra terms are tested for significance and the AIC is computed to see which model is "better".



In [50]:

    
# define the statistical model
y = mean_df['height']
X = np.c_[
    mean_df['year']-1970, 
    (mean_df['year'] > 1993) * (mean_df['year'] - 1993),
    np.cos(2*np.pi*(mean_df['year']-1970)/18.613),
    np.sin(2*np.pi*(mean_df['year']-1970)/18.613)
]
X = sm.add_constant(X)
model_broken_linear = sm.OLS(y, X)
fit_broken_linear = model_broken_linear.fit()



In [51]:

    
# define the statistical model
y = mean_df['height']
X = np.c_[
    mean_df['year']-1970, 
    (mean_df['year'] - 1970) * (mean_df['year'] - 1970),
    np.cos(2*np.pi*(mean_df['year']-1970)/18.613),
    np.sin(2*np.pi*(mean_df['year']-1970)/18.613)
]
X = sm.add_constant(X)
model_quadratic = sm.OLS(y, X)
fit_quadratic = model_quadratic.fit()



In [52]:

    
fit_broken_linear.summary(yname='Sea-surface height', xname=['Constant', 'Trend', 'Trend(year > 1990)', 'Nodal U', 'Nodal V'])









    Out[52]:





OLS Regression Results

  Dep. Variable:     Sea-surface height    R-squared:             0.855


  Model:                     OLS           Adj. R-squared:        0.851


  Method:               Least Squares      F-statistic:           178.9


  Date:               Tue, 25 Apr 2017     Prob (F-statistic):  8.20e-50


  Time:                   13:27:07         Log-Likelihood:      -600.28


  No. Observations:           126          AIC:                   1211.


  Df Residuals:               121          BIC:                   1225.


  Df Model:                     4                                      


  Covariance Type:        nonrobust                                    




                        coef      std err       t       P>|t|   [0.025     0.975]  


  Constant              -28.0783      3.608     -7.781   0.000    -35.222    -20.935


  Trend                   1.8255      0.089     20.569   0.000      1.650      2.001


  Trend(year > 1990)      0.8810      0.634      1.390   0.167     -0.374      2.136


  Nodal U                 6.0248      3.688      1.634   0.105     -1.277     13.327


  Nodal V               -12.5599      3.643     -3.448   0.001    -19.771     -5.348




  Omnibus:         1.828    Durbin-Watson:         1.733


  Prob(Omnibus):   0.401    Jarque-Bera (JB):      1.688


  Skew:           -0.283    Prob(JB):              0.430


  Kurtosis:        2.952    Cond. No.               58.2



In [53]:

    
fit_quadratic.summary(yname='Sea-surface height', xname=['Constant', 'Trend', 'Trend**2', 'Nodal U', 'Nodal V'])









    Out[53]:





OLS Regression Results

  Dep. Variable:     Sea-surface height    R-squared:             0.853


  Model:                     OLS           Adj. R-squared:        0.849


  Method:               Least Squares      F-statistic:           176.2


  Date:               Tue, 25 Apr 2017     Prob (F-statistic):  1.82e-49


  Time:                   13:27:07         Log-Likelihood:      -601.11


  No. Observations:           126          AIC:                   1212.


  Df Residuals:               121          BIC:                   1226.


  Df Model:                     4                                      


  Covariance Type:        nonrobust                                    




              coef      std err       t       P>|t|   [0.025     0.975]  


  Constant    -26.2982      3.646     -7.214   0.000    -33.516    -19.081


  Trend         1.9429      0.105     18.464   0.000      1.735      2.151


  Trend**2      0.0013      0.002      0.568   0.571     -0.003      0.006


  Nodal U       6.0876      3.720      1.636   0.104     -1.278     13.453


  Nodal V     -12.1889      3.655     -3.335   0.001    -19.425     -4.953




  Omnibus:         2.826    Durbin-Watson:         1.711


  Prob(Omnibus):   0.243    Jarque-Bera (JB):      2.578


  Skew:           -0.350    Prob(JB):              0.275


  Kurtosis:        3.015    Cond. No.           3.44e+03



In [54]:

    
fig = bokeh.plotting.figure(x_range=(1860, 2020), plot_width=900, plot_height=400)
for color, (id_, station) in zip(colors, selected_stations.iterrows()):
    data = station['data']
    fig.circle(data.year, data.height, color=color, legend=station['name'], alpha=0.8)
fig.circle(mean_df.year, mean_df.height, line_width=3, legend='Mean', color='black', alpha=0.5)
fig.line(mean_df.year, fit.predict(), line_width=3, legend='Current')
fig.line(mean_df.year, fit_broken_linear.predict(), line_width=3, color='#33bb33', legend='Broken')
fig.line(mean_df.year, fit_quadratic.predict(), line_width=3, color='#3333bb', legend='Quadratic')

fig.legend.location = "top_left"
fig.yaxis.axis_label = 'waterlevel [mm] above N.A.P.'
fig.xaxis.axis_label = 'year'
bokeh.io.show(fig)

Conclusions

Below are some statements that depend on the output calculated above.



In [55]:

    
msg = '''The current average waterlevel above NAP (in mm), 
based on the 6 main tide gauges for the year {year} is {height:.1f} cm.
The current sea-level rise is {rate:.0f} cm/century'''
print(msg.format(year=mean_df['year'].iloc[-1], height=fit.predict()[-1]/10.0, rate=fit.params.x1*100.0/10))









    



The current average waterlevel above NAP (in mm), 
based on the 6 main tide gauges for the year 2015 is 4.9 cm.
The current sea-level rise is 19 cm/century



In [56]:

    
if (fit.aic < fit_broken_linear.aic):
    print('The linear model is a higher quality model (smaller AIC) than the broken linear model.')
else:
    print('The broken linear model is a higher quality model (smaller AIC) than the linear model.')
if (fit_broken_linear.pvalues['x2'] < 0.05):
    print('The trend break is bigger than we would have expected under the assumption that there was no trend break.')
else:
    print('Under the assumption that there is no trend break, we would have expected a trend break as big as we have seen.')









    



The linear model is a higher quality model (smaller AIC) than the broken linear model.
Under the assumption that there is no trend break, we would have expected a trend break as big as we have seen.



In [57]:

    
if (fit.aic < fit_quadratic.aic):
    print('The linear model is a higher quality model (smaller AIC) than the quadratic model.')
else:
    print('The quadratic model is a higher quality model (smaller AIC) than the linear model.')
if (fit_quadratic.pvalues['x2'] < 0.05):
    print('The quadratic term is bigger than we would have expected under the assumption that there was no quadraticness.')
else:
    print('Under the assumption that there is no quadraticness, we would have expected a quadratic term as big as we have seen.')









    



The linear model is a higher quality model (smaller AIC) than the quadratic model.
Under the assumption that there is no quadraticness, we would have expected a quadratic term as big as we have seen.



In [ ]:

	year	height
149	2011	60.166667
150	2012	50.666667
151	2013	25.666667
152	2014	66.666667
153	2015	92.500000

	lat	lon	name	coastline_code	station_code	quality
id
32	52.462222	4.554722	IJMUIDEN	150	41	N
20	51.442222	3.596111	VLISSINGEN	150	101	N
22	51.977500	4.120000	HOEK VAN HOLLAND	150	51	N
23	52.964444	4.745000	DEN HELDER	150	31	N
24	53.326389	6.933056	DELFZIJL	150	1	N
25	53.175556	5.409444	HARLINGEN	150	21	N

	name	data
id
32	IJMUIDEN	year height interpolated flags station...
20	VLISSINGEN	year height interpolated flags station...
22	HOEK VAN HOLLAND	year height interpolated flags station...
23	DEN HELDER	year height interpolated flags station...
24	DELFZIJL	year height interpolated flags station...
25	HARLINGEN	year height interpolated flags station...

Dep. Variable:	Sea-surface height	R-squared:	0.853
Model:	OLS	Adj. R-squared:	0.849
Method:	Least Squares	F-statistic:	236.1
Date:	Tue, 25 Apr 2017	Prob (F-statistic):	1.27e-50
Time:	13:27:07	Log-Likelihood:	-601.28
No. Observations:	126	AIC:	1211.
Df Residuals:	122	BIC:	1222.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Constant	-25.0320	2.877	-8.700	0.000	-30.728	-19.336
Trend	1.8991	0.071	26.578	0.000	1.758	2.041
Nodal U	6.2529	3.699	1.691	0.093	-1.069	13.575
Nodal V	-12.1237	3.643	-3.328	0.001	-19.335	-4.912

Omnibus:	3.045	Durbin-Watson:	1.706
Prob(Omnibus):	0.218	Jarque-Bera (JB):	2.768
Skew:	-0.363	Prob(JB):	0.251
Kurtosis:	3.035	Cond. No.	57.9

Omnibus:	1.828	Durbin-Watson:	1.733
Prob(Omnibus):	0.401	Jarque-Bera (JB):	1.688
Skew:	-0.283	Prob(JB):	0.430
Kurtosis:	2.952	Cond. No.	58.2

Omnibus:	2.826	Durbin-Watson:	1.711
Prob(Omnibus):	0.243	Jarque-Bera (JB):	2.578
Skew:	-0.350	Prob(JB):	0.275
Kurtosis:	3.015	Cond. No.	3.44e+03