In the first notebook we calculated the clearsky irradiance for seven SURFRAD stations. Then we compared different models and atmospheric data sets at each station. In this notebook we will now combine all seven SURFRAD stations to compare models accross all regions. Finally we will calculate the grand total MBE and RMS for all SURFRAD stations for each model. The calculated clearsky data is shared in a Public OneDrive folder as hdf5, one file per site, each about 100MB. You can use ViTables or Java HDFView to visualize the data. This notebook assumes the files are all downloaded here, in this directory.
As described in the first notebook, this is a Jupyter notebook. To use it you may need to make sure that Python and the required Python packages are installed.
While process the SURFRAD station data, there was an issue with the Sioux Falls, SD data set from 2006 which contains some data from 2007 from October 8-21. The data appears to be similar to the data from 2007, so it was removed in favor of the the later.
In [54]:
# imports and settings
import os
import h5py
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import pvlib
import seaborn as sns
import statsmodels.api as sm
from pvsc44_clearsky_aod import ecmwf_macc_tools
%matplotlib inline
sns.set_context('notebook', rc={'figure.figsize': (8, 6)})
sns.set(font_scale=1.5)
In [55]:
# get the "metadata" that contains the station id codes for the SURFRAD data that was analyzed
METADATA = pd.read_csv('metadata.csv', index_col=0)
In [56]:
# load calculations for each station
atm_params_3min_clear = {}
for station_id in METADATA.index:
with h5py.File('%s_3min_clear_atm_params.h5' % station_id, 'r') as f:
np_atm_params_3min_clear = pd.DataFrame(np.array(f['data']))
np_atm_params_3min_clear['index'] = pd.DatetimeIndex(np_atm_params_3min_clear['index'])
np_atm_params_3min_clear.set_index('index', inplace=True)
np_atm_params_3min_clear.index.rename('timestamps', inplace=True)
atm_params_3min_clear[station_id] = np_atm_params_3min_clear
In [57]:
# filter out low light
# CONSTANTS
MODELS = {'solis': 'SOLIS', 'lt': 'Linke', 'macc': 'ECMWF-MACC', 'bird': 'Bird'}
CS = ['dni', 'dhi', 'ghi']
LOW_LIGHT = 200 # threshold for low light in W/m^2
is_bright = {}
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
is_bright[station_id] = station_atm_params_3min_clear['ghi'] > LOW_LIGHT
For a given site lets compare all models annual mean bias error (MBE) and root-mean-square error (RMSE). Then lets copmare all models MBE and RMSE for years by month to see if there are any seasonal biases. We can also just pivot the table by months to plot all years by month on the same plot. This should be similar to the combination of the MBE and RMSE on the same plot because the spread in montly MBE across years is representative of the RMSE for all years at that month. Finally we should make some boxplots to compmare models against each other to see if there is a statistical difference.
Absolute mean bias error is given as: $$MBE = \frac{\sum^{N}_{n=1} \left( predicted - measured \right)}{N}$$ Relative MBE is given as $$rMBE = N \frac{MBE}{\sum^{N}_{n=1} measured }$$
Absolute root mean square error is given as: $$RMSE = \sqrt{ \frac{\sum^{N}_{n=1} \left( predicted - measured \right)^2}{N} }$$ Relative RMSE is given as $$rRMSE = N \frac{RMSE}{\sum^{N}_{n=1} measured }$$
When measured and predicted numbers are very small, relative differences can be exagerated relative to a characteristic value. For example, a characteristic GHI might be 1000 W/m2. An error 100 W/m2 at say 800 W/m2 is only 12.5%, but at 200 W/m2 that would suddenly be 50%. There are at least two ways to avoid the curse of small numbers:
Use a characteristice value in the denominator. For example if you use the characteristic value of 1000 W/m2 for both errors you get 10% for 100 W/m2 at 800 W/m2 and at 200 W/m2, so the curse is avoided.
Rollup timeseries up to periods that include both small and big numbers. This doesn't minimizes the exageration of relative differences in small numbers because the demominator is dominated by larger numbers. For example in a typical day there might be 6 kWh/m2, an error of 0.6 kWh/m2 is only 10%, even if most of those errors occured during low light conditions.
Note that in the equation for rMBE, the number of samples N is in the numerator and the denominator so it cancels out allowing you to just sum up the timeseries over the desired period.
In [5]:
# plot annual MBE and RMSE, combine all models on one chart for each station and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
for n, cs in enumerate(CS):
annual_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].resample('A').mean()
# MBE
for model in MODELS.iterkeys():
((
(station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]]
).resample('A').mean() / annual_avg).plot(ax=ax[0][n])
ax[0][n].legend(MODELS.values())
ax[0][n].set_title('Mean Bias Error (MBE) of %s at %s' % (cs.upper(), METADATA['station name'][station_id]))
ax[0][n].set_ylabel('average relative difference (arb. units)')
# RMSE
for model in MODELS.iterkeys():
(np.sqrt((
((station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
).resample('A').mean()) / annual_avg).plot(ax=ax[1][n])
ax[1][n].legend(MODELS.values())
ax[1][n].set_title('Root Mean Square Error (RMSE) of %s at %s' % (cs.upper(), METADATA['station name'][station_id]))
ax[1][n].set_ylabel('relative RMSE (arb. units)')
f.tight_layout()
f.savefig('%s_annual_mbe-rmse.png' % station_id)
In [6]:
# plot MBE and RMSE by month, combine all models on one chart for each station and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
for n, cs in enumerate(CS):
monthly_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].groupby(lambda x: x.month).mean()
for model in MODELS.iterkeys():
((
(station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]]
).groupby(lambda x: x.month).mean() / monthly_avg).plot(ax=ax[0][n])
ax[0][n].legend(MODELS.values())
ax[0][n].set_title('rMBE of %s by month at %s' % (cs.upper(), METADATA['station name'][station_id]))
ax[0][n].set_ylabel('average relative difference (arb. units)')
for model in MODELS.iterkeys():
(np.sqrt((
((station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
).groupby(lambda x: x.month).mean()) / monthly_avg).plot(ax=ax[1][n])
ax[1][n].legend(MODELS.values())
ax[1][n].set_title('rRMSE of %s by month at %s' % (cs.upper(), METADATA['station name'][station_id]))
ax[1][n].set_ylabel('relative RMSE (arb. units)')
f.tight_layout()
f.savefig('%s_mbe-rmse_by_month.png' % station_id)
In [7]:
# combine MBE all years on one chart for each station, model and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
f, ax = plt.subplots(4, 3, figsize=(24, 24), sharex=False)
for n, cs in enumerate(CS):
for m, model in enumerate(MODELS.iteritems()):
x = station_atm_params_3min_clear.loc[is_bright[station_id]]
x.insert(1, 'mbe', (x['%s_%s' % (model[0], cs)] - x[cs]))
y = x.resample('M').mean()
y.insert(2, 'rmbe', y['mbe'] / y[cs])
y.pivot(index='month', columns='year', values='rmbe').plot(ax=ax[m][n])
ax[m][n].set_title('%s %s rMBE by month for %s'% (model[1], cs.upper(), METADATA['station name'][station_id]))
ax[m][n].set_ylabel('average relative difference (arb. units)')
f.tight_layout()
f.savefig('%s_%s_%s_mbe_by_month_all_years.png' % (station_id, model[0], cs))
In [8]:
# plot annual average error, combine all stations on one chart for each model and clear sky component
for model, name in MODELS.iteritems():
f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
for n, cs in enumerate(CS):
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
# MBE
annual_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].resample('A').mean()
((
(station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]]
).resample('A').mean() / annual_avg).plot(ax=ax[0][n])
# RMSE
(np.sqrt((
((station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
).resample('A').mean()) / annual_avg).plot(ax=ax[1][n])
ax[0][n].legend(METADATA.index.tolist())
ax[0][n].set_title('%s %s MBE' % (name, cs.upper()))
ax[0][n].set_ylabel('average relative difference (arb. units)')
ax[1][n].legend(METADATA.index.tolist())
ax[1][n].set_title('%s %s RMSE' % (name, cs.upper()))
ax[1][n].set_ylabel('relative RMSE (arb. units)')
f.tight_layout()
plt.savefig('%s_annual_mbe-rms.png' % model)
In [9]:
# plot monthly average error, combine all stations on one chart for each model and clear sky component
for model, name in MODELS.iteritems():
f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
for n, cs in enumerate(CS):
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
monthly_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].groupby(lambda x: x.month).mean()
((
(station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]]
).groupby(lambda x: x.month).mean() / monthly_avg).plot(ax=ax[0][n]).plot()
(np.sqrt((
((station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
).groupby(lambda x: x.month).mean()) / monthly_avg).plot(ax=ax[1][n])
ax[0][n].legend(METADATA.index.tolist())
ax[0][n].set_title('%s %s MBE by month' % (name, cs.upper()))
ax[0][n].set_ylabel('average relative difference (arb. units)')
ax[1][n].legend(METADATA.index.tolist())
ax[1][n].set_title('%s %s RMSE by month' % (name, cs.upper()))
ax[1][n].set_ylabel('relative RMSE (arb. units)')
f.tight_layout()
f.savefig('%s_%s_mbe-rmse_by_month.png' % (model, cs))
In [58]:
# CONSTANTS
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# an empty data frame for monthly atmospheric parameters
MONTHLY_ATM_PARAMS_3MIN_CLEAR = None
# loop over stations
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
# filter for bright days
data = station_atm_params_3min_clear[is_bright[station_id]]
# insert errors first
for n, model in MODELS.iteritems():
for m, cs in enumerate(CS):
data.insert(
43 + m + 3*n,
'%s-%s_err' % (model, cs.upper()),
data['%s_%s' % (model, cs)] - data[cs]
)
# average by month
avg_data = data.resample('M').mean()
# divide by montly average
for n, cs in enumerate(CS):
for m, model in MODELS.iteritems():
avg_data.insert(
54 + m + 4*n,
'%s-%s_rel' % (model, cs.upper()),
avg_data['%s-%s_err' % (model, cs.upper())] / avg_data[cs]
)
# insert station id
avg_data.insert(0, 'station', station_id)
# append together
if MONTHLY_ATM_PARAMS_3MIN_CLEAR is None:
MONTHLY_ATM_PARAMS_3MIN_CLEAR = pd.DataFrame(avg_data)
else:
MONTHLY_ATM_PARAMS_3MIN_CLEAR = MONTHLY_ATM_PARAMS_3MIN_CLEAR.append(avg_data)
MONTHLY_ATM_PARAMS_3MIN_CLEAR[['station', 'year', 'month', 'ghi', 'solis-GHI_err', 'solis-GHI_rel']]
Out[58]:
In [60]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False) # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
if cs == 'dhi': continue
if n == 2: n = 1
data = pd.melt(
MONTHLY_ATM_PARAMS_3MIN_CLEAR,
id_vars=['station', 'year', 'month'],
value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
var_name='model-cs',
value_name='error'
)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
sns.boxplot(x='model-cs', y='error', hue='year', data=data, ax=ax[n],
whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
ax[n].set_title('Interannual variability in %s by model' % cs.upper())
ax[n].set_ylabel('average monthly relative error')
ax[n].set_xlabel('model and clear sky component')
if n == 1:
ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=10, loc='lower center') # x = 0.5 for all 3 components
else:
legend = ax[n].legend()
legend.set_visible(False)
legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.2)
f.savefig('y2y-boxplot_by_model-cs_CORRECTED.png')
In [61]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False) # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
if cs == 'dhi': continue
if n == 2: n = 1
data = pd.melt(
MONTHLY_ATM_PARAMS_3MIN_CLEAR,
id_vars=['station', 'year', 'month'],
value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
var_name='model-cs',
value_name='error'
)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
sns.boxplot(x='model-cs', y='error', hue='month', data=data, ax=ax[n],
whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
ax[n].set_title('Seasonal variation in %s by model' % cs.upper())
ax[n].set_ylabel('average monthly relative error')
ax[n].set_xlabel('model and clear sky component')
if n == 1:
ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=12, loc='lower center') # x = 0.5 for all 3 components
else:
legend = ax[n].legend()
legend.set_visible(False)
legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.20)
f.savefig('seasonal-boxplot_by_model-cs_CORRECTED.png')
In [62]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False) # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
if cs == 'dhi': continue
if n == 2: n = 1
data = pd.melt(
MONTHLY_ATM_PARAMS_3MIN_CLEAR,
id_vars=['station', 'year', 'month'],
value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
var_name='model-cs',
value_name='error'
)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
sns.boxplot(x='model-cs', y='error', hue='station', data=data, ax=ax[n],
whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
ax[n].set_title('Regional variation in %s by model' % cs.upper())
ax[n].set_ylabel('average monthly relative error')
ax[n].set_xlabel('model and clear sky component')
if n == 1:
ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=12, loc='lower center') # x = 0.5 for all 3 components
else:
legend = ax[n].legend()
legend.set_visible(False)
legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.20)
f.savefig('boxplot_by_model-cs_and_station_CORRECTED.png')
In [17]:
# plot all
STATION_ORDER = dict([(station_id, n) for n, station_id in enumerate(METADATA.index)])
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
for n, cs in enumerate(CS):
data = pd.concat([
(station_atm_params_3min_clear['%s_%s' % (model, cs)] - station_atm_params_3min_clear[cs])[is_bright[station_id]]
for model in MODELS.itervalues()
], axis=1).resample('M').mean()
data.rename(columns=MODELS, inplace=True)
sns.boxplot(data=data, ax=ax[STATION_ORDER[station_id]][n])
sns.swarmplot(data=data, color=".25", ax=ax[STATION_ORDER[station_id]][n])
ax[STATION_ORDER[station_id]][n].set_title('%s by model at %s' % (cs.upper(), METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplot.png')
In [18]:
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
for n, cs in enumerate(CS):
data = pd.concat([
(station_atm_params_3min_clear['%s_%s' % (model, cs)]
- station_atm_params_3min_clear[cs])[is_bright[station_id]]
for model in MODELS.itervalues()
] + [station_atm_params_3min_clear['year']], axis=1).resample('M').mean()
data.rename(columns=MODELS, inplace=True)
data = pd.melt(
data, id_vars=['year'], value_vars=MODELS.values(),
var_name='model', value_name='mbe'
)
sns.boxplot(x='model', y='mbe', data=data, hue='year', ax=ax[STATION_ORDER[station_id]][n])
#ax[STATION_ORDER[station_id]][n].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
ax[STATION_ORDER[station_id]][n].set_title('%s by model and year at %s' % (cs, METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplots_by_year.png')
In [19]:
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
for n, cs in enumerate(CS):
data = pd.concat([
(station_atm_params_3min_clear['%s_%s' % (model, cs)] - station_atm_params_3min_clear[cs])[is_bright[station_id]]
for model in MODELS.itervalues()
] + [station_atm_params_3min_clear['month']], axis=1).resample('M').mean()
data.rename(columns=MODELS, inplace=True)
data = pd.melt(
data, id_vars=['month'], value_vars=MODELS.values(),
var_name='model', value_name='mbe'
)
sns.boxplot(x='model', y='mbe', data=data, hue='month', ax=ax[STATION_ORDER[station_id]][n])
#ax[STATION_ORDER[station_id]][n].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
ax[STATION_ORDER[station_id]][n].set_title('%s by model and month at %s' % (cs.upper(), METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplot_by_months.png')
In [51]:
# plot all SURFRAD stations DHI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average DHI
avg_dhi = pd.concat([
pd.Series(data=station_atm_params_3min_clear['dhi'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate DHI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate relative station DHI error by dividing average error by average station DHI
# then concatenate all models
data = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_dhi' % model] - station_atm_params_3min_clear['dhi'])[is_bright[station_id]],
name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_dhi for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations DHI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_DHI.png')
data
Out[51]:
In [52]:
# plot all SURFRAD stations DNI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average DNI
avg_dni = pd.concat([
pd.Series(data=station_atm_params_3min_clear['dni'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate DNI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate relative station DNI error by dividing average error by average station DNI
# then concatenate all models
data = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_dni' % model] - station_atm_params_3min_clear['dni'])[is_bright[station_id]],
name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_dni for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations DNI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_DNI.png')
data
Out[52]:
In [50]:
# plot all SURFRAD stations GHI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average GHI
avg_ghi = pd.concat([
pd.Series(data=station_atm_params_3min_clear['ghi'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# sxf 560.271702
# gwn 591.976444
# fpk 543.374845
# tbl 608.619889
# bon 584.017623
# dra 654.731447
# psu 586.754301
# calculate GHI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# sxf -10.020557
# gwn -2.605967
# fpk -7.293992
# tbl -24.014558
# bon -8.630023
# dra -29.584633
# psu -6.126202
# calculate relative station GHI error by dividing average error by average station GHI
# then concatenate all models
data = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_ghi' % model] - station_atm_params_3min_clear['ghi'])[is_bright[station_id]],
name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_ghi for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations GHI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_GHI.png')
data
Out[50]:
In [53]:
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# DNI
avg_dni = pd.concat([
pd.Series(data=station_atm_params_3min_clear['dni'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
dni = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_dni' % model] - station_atm_params_3min_clear['dni'])[is_bright[station_id]],
 name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_dni for model in MODELS.itervalues()
], axis=1)
dni.rename(columns=MODELS, inplace=True)
dni.insert(0, 'cs', ['dni']*len(dni))
# DHI
avg_dhi = pd.concat([
pd.Series(data=station_atm_params_3min_clear['dhi'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
dhi = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_dhi' % model] - station_atm_params_3min_clear['dhi'])[is_bright[station_id]],
name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_dhi for model in MODELS.itervalues()
], axis=1)
dhi.rename(columns=MODELS, inplace=True)
dhi.insert(0, 'cs', ['dhi']*len(dhi))
# GHI
avg_ghi = pd.concat([
pd.Series(data=station_atm_params_3min_clear['ghi'][is_bright[station_id]], name=station_id)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
ghi = pd.concat([
pd.concat([
pd.Series(
data=(station_atm_params_3min_clear['%s_ghi' % model] - station_atm_params_3min_clear['ghi'])[is_bright[station_id]],
name=station_id
) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0) / avg_ghi for model in MODELS.itervalues()
], axis=1)
ghi.rename(columns=MODELS, inplace=True)
ghi.insert(0, 'cs', ['ghi']*len(ghi))
# melt
data = pd.concat([dni, dhi, ghi])
data = pd.melt(
data, id_vars=['cs'], value_vars=MODELS.values(),
var_name='model', value_name='mbe'
)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(x='model', y='mbe', hue='cs', data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
plt.title('All SURFRAD stations by model')
plt.savefig('all_stations_all_years_by_model_and_cs.png')
Use of Measured Aerosol Optical Depth and Precipitable Water to Model Clear Sky Irradiance by Mark A. Mikofski, Clifford W. Hansen, William F. Holmgren and Gregory M. Kimball is licensed under a Creative Commons Attribution 4.0 International License.