In [1]:

    
%run talktools
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_style('whitegrid')
from IPython.display import Image, display

Open Source can improve current scientific practice

Ipython notebooks are a great tool to support this

Sophie Balemans and Stijn Van Hoey

EGU 2015 - PICO session on Open Source Computing in Hydrology

The ideals of (hydrological) science

Provide verifiable answers about water and solutions to water-related problems.
The validation of these results by reproduction.
An altruistic, collective enterprise for humanity's benefit

F Perez

The ideals reality of (hydrological) science

Provide verifiable answers about water and solutions to water-related problems.
- The pursuit of highly cited papers for your CV.
The validation of our results by reproduction.
- Validation by convincing journal referees who didn't see your code or data.
An altruistic, collective enterprise for humanity's benefit.
- A deadly race to outrun your colleagues in front of the bear of funding.

F Perez

Free and Open Source Software (FOSS) in this context

Open, collaborative by definition.
- Industrial competition can coexist...
Continuous, public process.
- Distributed credit.
- Open peer review.
Reproducible by necessity.
Public bug tracking.
The use of licenses is essential (CC, BSD, GPL,...)

F Perez

FOSS $\neq$ free work

All waiting for the developer...

...or all developers?

Graveyard of good intentions

Towards continuous and collaborative

What do we need:

Training of students, Phds,...
Creating a future generation of scientists with reproducibility as default
Provide version control, script-based development, database management... in the curricula
Continuous funding of open source development
Payed to maintain and develop open source projects
Tools that facilitate a reproducible workflow
knitr, Ipython Notebook, git, RunMyCode, VIStrails, Authorea,...

Ipython Notebook

Reproducible science

Reproducibility at publication time? is TOO late!

We need to embed the entire lifecycle of a scientific idea:

exploratory stuff
(collaborative) development
production (simulations on HPC, data visualisation,...)
publication (with reproducible results)
teach about it
Go back to 1.

Ipython (Jupyter!) Notebook can support on the different levels

Ipython (Jupyter!) Notebook... (this is a notebook!)

Minimize effort between analysis and sharing

Interactive shell for data-analysis and exploration
Interaction between languages (R, Julia,...)
Parallel computing
ipynb to latex, pdf, html, html,slides, publications, books,...
Loading of images, websites, widgets,...
...

Check it out on https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks

Scripts, so it can be version controlled!

Recap 5. teach about it

The same file can be used to do analysis, create course notes and retrieve slides using nbconvert
Students can interactively work on their notebook
Different useful features: eg.
interactive widgets

Conceptual rainfall-runoff model



In [2]:

    
# %load PDM_HPC.py



In [3]:

    
pars =pd.read_csv('data/example2_PDM_parameters.txt',header=0, sep=',', index_col=0)
measured = pd.read_csv('data/example_PDM_measured.txt', header=0, sep='\t', decimal='.', index_col=0)
modelled = pd.read_csv('data/example2_PDM_outputs.txt',header=0, sep=',', index_col=0).T



In [4]:

    
modeloutput1 = pd.DataFrame(modelled.iloc[:,0].values, index=measured.index)

Measured vs modelled discharge:



In [5]:

    
fig, ax = plt.subplots(figsize=(10, 6))
p1 = measured.plot(ax=ax, label='measured')
p2 = modeloutput1.plot(ax=ax, label='modelled')
t = ax.set_ylabel(r'Q m$^3$s$^{-1}$')
plt.legend(['measured', 'modelled'])









    Out[5]:





<matplotlib.legend.Legend at 0x7f8080e98190>



In [6]:

    
from scatter_hist2 import create_scatterhist, create_seasonOF

names = pars.columns
time=np.array(measured.index)
modelled.index = time

pars_name={}
for i in range(0, names.size):
    pars_name[names[i]]=i

Exploring the parameter space

Simulating 20000 Monte-Carlo runs
Sampling from uniform distribution
Parallel calculation within IPython notebook



In [7]:

    
objective_functions = create_seasonOF(modelled, measured)

Visualisation in 2D scatter plot



In [8]:

    
from scatter_hist_season import create_scatterhist



In [9]:

    
scatter = create_scatterhist(pars, 2, 1, objective_functions, names, 
                                            objective_function='SSE', 
                                            threshold=0.4,  
                                            season = 'Winter')









    



Current threshold = 674.854206156
Number of behavioural parametersets = 7910 out of 20000



In [10]:

    
scatter = create_scatterhist(pars, 2, 1, objective_functions, names, 
                                            objective_function='SSE', 
                                            threshold=0.4,
                                            season = 'Spring')









    



Current threshold = 105.663890284
Number of behavioural parametersets = 9592 out of 20000

What about the model?

Parameter boundaries correct?
Optimal paramersets change periodically... Correct model structure?
NOT an optimization tool!

The function

Select parametersets based on:
1. Objective function (SSE, RMSE, RRMSE)
2. Time period of interest (whole year or specific season)
3. Relative threshold (scaled between 0 and 1)
Visualisation of a 2D parameter response surface of selected parametersets together with histograms

More interactive?

command:

interact(...)



In [12]:

    
#Loading interact functionality
from IPython.html.widgets import interact, fixed

interact(create_scatterhist,*args, **kwargs)

input list => dropdown

  objective_function = ['SSE', 'RMSE', 'RRMSE']
  season = ['Winter','Summer', 
                  'Spring', 'Autumn','All season']

input array => slider
```
  threshold=(0,1,0.005)
```

Ipython notebook...

Github Stijn Van Hoey stvhoey.vanhoey@ugent.be, @SVanHoey
Github Sophie sophie.balemans@ugent.be
Github university



In [ ]:



In [ ]:

Open Source can improve current scientific practice

Ipython notebooks are a great tool to support this

The ideals of (hydrological) science

The ideals reality of (hydrological) science

Free and Open Source Software (FOSS) in this context

FOSS $\neq$ free work

All waiting for the developer...

...or all developers?

Graveyard of good intentions

Towards continuous and collaborative

Ipython Notebook

Reproducible science

Minimize effort between analysis and sharing

Scripts, so it can be version controlled!

Recap 5. teach about it

interactive widgets

Conceptual rainfall-runoff model

Exploring the parameter space

Visualisation in 2D scatter plot

What about the model?

The function

More interactive?

command:

Ipython notebook...