MELODIST example

In this notebook the usage of MELODIST is demonstrated. For this purpose, we use the station data for Rosenthal-Willershausen, which is located in the file examples/testdata.csv.gz.


In [1]:
import matplotlib
import matplotlib.pyplot as plt
import melodist
import numpy as np
import pandas as pd
import scipy.stats

%matplotlib inline

In [2]:
def plot(obs, sim):
    plt.figure()
    ax = plt.gca()
    obs.loc[plot_period].plot(ax=ax, color='black', label='obs', lw=2)
    sim.loc[plot_period].plot(ax=ax)
    plt.legend()
    plt.show()

def calc_stats(obs, sim):
    df = pd.DataFrame(columns=['mean', 'std', 'r', 'rmse', 'nse'])

    obs = obs.loc[validation_period]
    sim = sim.loc[validation_period]
    
    df.loc['obs'] = obs.mean(), obs.std(), 1, 0, 1
    
    for c in sim.columns:
        osdf = pd.DataFrame(data=dict(obs=obs, sim=sim[c])).dropna(how='any')
        o = osdf.obs
        s = osdf.sim
    
        r = scipy.stats.pearsonr(o, s)[0]
        rmse = np.mean((o - s)**2)
        nse = 1 - np.sum((o - s)**2) / np.sum((o - o.mean())**2)
        df.loc[c] = s.mean(), s.std(), r, rmse, nse
    
    return df

def print_stats(obs, sim):
    df = calc_stats(obs, sim)
    html = df.round(2).style
    return html

First, we define the station data location, the station coordinates and time zone, and the periods for calibration, validation and plotting:


In [3]:
path_inp = 'testdata.csv.gz'
longitude = 8.86
latitude = 51.00
timezone = 1

calibration_period = slice('2014-01-01', '2015-12-31')
validation_period = slice('2016-01-01', '2016-12-31')
plot_period = slice('2016-09-03', '2016-09-05')

Then, we read in the hourly station data, convert the temperature values from °C to K, and truncate the data frame in order to obtain full days:


In [4]:
data_obs_hourly = pd.read_csv(path_inp, index_col=0, parse_dates=True)
data_obs_hourly.temp += 273.15
data_obs_hourly = melodist.util.drop_incomplete_days(data_obs_hourly)

In this example, we do not use actual observed daily data but aggregate the hourly values to daily data using the daily_from_hourly method:


In [5]:
data_obs_daily = melodist.util.daily_from_hourly(data_obs_hourly)

Next, we create a Station object:


In [6]:
station = melodist.Station(lon=longitude, lat=latitude,
                          timezone=timezone,
                          data_daily=data_obs_daily)

If hourly observations are available, they can be used to derive various statistics used in the disaggregation methods. For this purpose, we create a StationStatistics object and fill it with the hourly values (in this case only a subset thereof, as we want separate calibration and validation periods). Afterwards, the statistics for the individual variables are calculated.


In [7]:
station.statistics = melodist.StationStatistics(data_obs_hourly.loc[calibration_period])

stats = station.statistics
stats.calc_wind_stats()
stats.calc_humidity_stats()
stats.calc_temperature_stats()
stats.calc_radiation_stats()
stats.calc_precipitation_stats()

Disaggregation

Now we are ready to perform the disaggregation of the daily values. In the examples below, we use different methods for the disaggregation of each of the individual variables, plot the results for a short period (specified with plot_period above) and calculate several statistics for the validation period.

Temperature


In [8]:
tempdf = pd.DataFrame()
for method in ('sine_min_max', 'sine_mean', 'mean_course_min_max', 'mean_course_mean'):
    station.disaggregate_temperature(method=method, min_max_time='sun_loc_shift')
    tempdf[method] = station.data_disagg.temp

plot(data_obs_hourly.temp, tempdf)
print_stats(data_obs_hourly.temp, tempdf)


Out[8]:
mean std r rmse nse
obs 281.97 7.45 1 0 1
sine_min_max 282.12 7.6 0.98 2.28 0.96
sine_mean 281.98 7.5 0.98 2.12 0.96
mean_course_min_max 281.8 7.73 0.98 1.85 0.97
mean_course_mean 281.98 7.48 0.99 1.47 0.97

Humidity

For some of the humidity disaggregation methods already disaggregated temperature values need to be available. Hence, we first run the temperature disaggregation again (in this case assuming we have decided for the 'sine_min_max' method) and then perform the humidity disaggregation.


In [9]:
# ok, let's say we have decided to use sine_min_max. next we want to disaggregate humidity.
# as some of hum disagg functions rely on disagg'd temperature values we disagg temp again
# with our chosen method
station.disaggregate_temperature(method='sine_min_max', min_max_time='sun_loc_shift')

humdf = pd.DataFrame()
for method in ('equal', 'minimal', 'dewpoint_regression',
               'linear_dewpoint_variation', 'min_max',
               'month_hour_precip_mean'):
    station.disaggregate_humidity(method=method)
    humdf[method] = station.data_disagg.hum

plot(data_obs_hourly.hum, humdf)
print_stats(data_obs_hourly.hum, humdf)


Out[9]:
mean std r rmse nse
obs 86.6 13.4 1 0 1
equal 86.6 9.5 0.71 89.46 0.5
minimal 80.72 16.01 0.69 175.1 0.03
dewpoint_regression 85.76 15.98 0.74 117.82 0.34
linear_dewpoint_variation 86.13 14.99 0.75 103.29 0.42
min_max 84.13 14.17 0.89 47.69 0.73
month_hour_precip_mean 86.08 10.32 0.76 75.92 0.58

Here, 'min_max' yields the best results, however minimum and maximum daily humidity values are rarely available in practice. If mean daily humidity is available, setting preserve_daily_mean=True further improves the results (of all methods).

Radiation


In [10]:
globdf = pd.DataFrame()
for method in ('pot_rad',
               # 'pot_rad_via_ssd',  # not possible here as we do not have sunshine duration data
               'pot_rad_via_bc',
               'mean_course'):
    station.disaggregate_radiation(method=method)
    globdf[method] = station.data_disagg.glob

plot(data_obs_hourly.glob, globdf)
print_stats(data_obs_hourly.glob, globdf)


Out[10]:
mean std r rmse nse
obs 114.07 196.42 1 0 1
pot_rad 114.34 178.86 0.95 4125.61 0.89
pot_rad_via_bc 101.26 156.97 0.91 7315.06 0.81
mean_course 114.34 186 0.96 3257.96 0.92

Wind speed


In [11]:
winddf = pd.DataFrame()
for method in ('equal', 'cosine', 'random'):
    station.disaggregate_wind(method=method)
    winddf[method] = station.data_disagg.wind

plot(data_obs_hourly.wind, winddf)
print_stats(data_obs_hourly.wind, winddf)


Out[11]:
mean std r rmse nse
obs 0.73 0.8 1 0 1
equal 0.73 0.53 0.66 0.36 0.44
cosine 0.73 0.62 0.76 0.26 0.58
random 0.66 0.54 0.57 0.44 0.31

Precipitation


In [12]:
precipdf = pd.DataFrame()
for method in ('equal', 'cascade'):
    station.disaggregate_precipitation(method=method)
    precipdf[method] = station.data_disagg.precip
    
plot(data_obs_hourly.precip, precipdf)



In [ ]: