Analysis of Predicted SURFRAD Clearsky Irradiance

In the first notebook we calculated the clearsky irradiance for seven SURFRAD stations. Then we compared different models and atmospheric data sets at each station. In this notebook we will now combine all seven SURFRAD stations to compare models accross all regions. Finally we will calculate the grand total MBE and RMS for all SURFRAD stations for each model. The calculated clearsky data is shared in a Public OneDrive folder as hdf5, one file per site, each about 100MB. You can use ViTables or Java HDFView to visualize the data. This notebook assumes the files are all downloaded here, in this directory.

Usage

As described in the first notebook, this is a Jupyter notebook. To use it you may need to make sure that Python and the required Python packages are installed.

Issues

While process the SURFRAD station data, there was an issue with the Sioux Falls, SD data set from 2006 which contains some data from 2007 from October 8-21. The data appears to be similar to the data from 2007, so it was removed in favor of the the later.


In [54]:
# imports and settings
import os

import h5py
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import pvlib
import seaborn as sns
import statsmodels.api as sm

from pvsc44_clearsky_aod import ecmwf_macc_tools

%matplotlib inline

sns.set_context('notebook', rc={'figure.figsize': (8, 6)})
sns.set(font_scale=1.5)

In [55]:
# get the "metadata" that contains the station id codes for the SURFRAD data that was analyzed
METADATA = pd.read_csv('metadata.csv', index_col=0)

In [56]:
# load calculations for each station
atm_params_3min_clear = {}
for station_id in METADATA.index:
    with h5py.File('%s_3min_clear_atm_params.h5' % station_id, 'r') as f:
        np_atm_params_3min_clear = pd.DataFrame(np.array(f['data']))
    np_atm_params_3min_clear['index'] = pd.DatetimeIndex(np_atm_params_3min_clear['index'])
    np_atm_params_3min_clear.set_index('index', inplace=True)
    np_atm_params_3min_clear.index.rename('timestamps', inplace=True)
    atm_params_3min_clear[station_id] = np_atm_params_3min_clear

In [57]:
# filter out low light

# CONSTANTS
MODELS = {'solis': 'SOLIS', 'lt': 'Linke', 'macc': 'ECMWF-MACC', 'bird': 'Bird'}
CS = ['dni', 'dhi', 'ghi']
LOW_LIGHT = 200  # threshold for low light in W/m^2

is_bright = {}
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    is_bright[station_id] = station_atm_params_3min_clear['ghi'] > LOW_LIGHT

Figures

For a given site lets compare all models annual mean bias error (MBE) and root-mean-square error (RMSE). Then lets copmare all models MBE and RMSE for years by month to see if there are any seasonal biases. We can also just pivot the table by months to plot all years by month on the same plot. This should be similar to the combination of the MBE and RMSE on the same plot because the spread in montly MBE across years is representative of the RMSE for all years at that month. Finally we should make some boxplots to compmare models against each other to see if there is a statistical difference.

MBE

Absolute mean bias error is given as: $$MBE = \frac{\sum^{N}_{n=1} \left( predicted - measured \right)}{N}$$ Relative MBE is given as $$rMBE = N \frac{MBE}{\sum^{N}_{n=1} measured }$$

RMSE

Absolute root mean square error is given as: $$RMSE = \sqrt{ \frac{\sum^{N}_{n=1} \left( predicted - measured \right)^2}{N} }$$ Relative RMSE is given as $$rRMSE = N \frac{RMSE}{\sum^{N}_{n=1} measured }$$

The curse of small numbers in relative differences.

When measured and predicted numbers are very small, relative differences can be exagerated relative to a characteristic value. For example, a characteristic GHI might be 1000 W/m2. An error 100 W/m2 at say 800 W/m2 is only 12.5%, but at 200 W/m2 that would suddenly be 50%. There are at least two ways to avoid the curse of small numbers:

  • Use a characteristice value in the denominator. For example if you use the characteristic value of 1000 W/m2 for both errors you get 10% for 100 W/m2 at 800 W/m2 and at 200 W/m2, so the curse is avoided.

  • Rollup timeseries up to periods that include both small and big numbers. This doesn't minimizes the exageration of relative differences in small numbers because the demominator is dominated by larger numbers. For example in a typical day there might be 6 kWh/m2, an error of 0.6 kWh/m2 is only 10%, even if most of those errors occured during low light conditions.

Note that in the equation for rMBE, the number of samples N is in the numerator and the denominator so it cancels out allowing you to just sum up the timeseries over the desired period.


In [5]:
# plot annual MBE and RMSE, combine all models on one chart for each station and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
    for n, cs in enumerate(CS):
        annual_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].resample('A').mean()
        # MBE
        for model in MODELS.iterkeys():
            ((
                (station_atm_params_3min_clear['%s_%s' % (model, cs)]
                 - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            ).resample('A').mean() / annual_avg).plot(ax=ax[0][n])
        ax[0][n].legend(MODELS.values())
        ax[0][n].set_title('Mean Bias Error (MBE) of %s at %s' % (cs.upper(), METADATA['station name'][station_id]))
        ax[0][n].set_ylabel('average relative difference (arb. units)')
        # RMSE
        for model in MODELS.iterkeys():
            (np.sqrt((
                ((station_atm_params_3min_clear['%s_%s' % (model, cs)]
                  - station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
            ).resample('A').mean()) / annual_avg).plot(ax=ax[1][n])
        ax[1][n].legend(MODELS.values())
        ax[1][n].set_title('Root Mean Square Error (RMSE) of %s at %s' % (cs.upper(), METADATA['station name'][station_id]))
        ax[1][n].set_ylabel('relative RMSE (arb. units)')
    f.tight_layout()
    f.savefig('%s_annual_mbe-rmse.png' % station_id)



In [6]:
# plot MBE and RMSE by month, combine all models on one chart for each station and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
    for n, cs in enumerate(CS):
        monthly_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].groupby(lambda x: x.month).mean()
        for model in MODELS.iterkeys():
            ((
                (station_atm_params_3min_clear['%s_%s' % (model, cs)]
                 - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            ).groupby(lambda x: x.month).mean() / monthly_avg).plot(ax=ax[0][n])
        ax[0][n].legend(MODELS.values())
        ax[0][n].set_title('rMBE of %s by month at %s' % (cs.upper(), METADATA['station name'][station_id]))
        ax[0][n].set_ylabel('average relative difference (arb. units)')
        for model in MODELS.iterkeys():
            (np.sqrt((
                ((station_atm_params_3min_clear['%s_%s' % (model, cs)]
                  - station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
            ).groupby(lambda x: x.month).mean()) / monthly_avg).plot(ax=ax[1][n])
        ax[1][n].legend(MODELS.values())
        ax[1][n].set_title('rRMSE of %s by month at %s' % (cs.upper(), METADATA['station name'][station_id]))
        ax[1][n].set_ylabel('relative RMSE (arb. units)')
    f.tight_layout()
    f.savefig('%s_mbe-rmse_by_month.png' % station_id)



In [7]:
# combine MBE all years on one chart for each station, model and clear sky component
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    f, ax = plt.subplots(4, 3, figsize=(24, 24), sharex=False)
    for n, cs in enumerate(CS):
        for m, model in enumerate(MODELS.iteritems()):
            x = station_atm_params_3min_clear.loc[is_bright[station_id]]
            x.insert(1, 'mbe', (x['%s_%s' % (model[0], cs)] - x[cs]))
            y = x.resample('M').mean()
            y.insert(2, 'rmbe', y['mbe'] / y[cs])
            y.pivot(index='month', columns='year', values='rmbe').plot(ax=ax[m][n])
            ax[m][n].set_title('%s %s rMBE by month for %s'% (model[1], cs.upper(), METADATA['station name'][station_id]))
            ax[m][n].set_ylabel('average relative difference (arb. units)')
    f.tight_layout()
    f.savefig('%s_%s_%s_mbe_by_month_all_years.png' % (station_id, model[0], cs))


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

In [8]:
# plot annual average error, combine all stations on one chart for each model and clear sky component
for model, name in MODELS.iteritems():
    f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
    for n, cs in enumerate(CS):
        for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
            # MBE
            annual_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].resample('A').mean()
            ((
                (station_atm_params_3min_clear['%s_%s' % (model, cs)]
                 - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            ).resample('A').mean() / annual_avg).plot(ax=ax[0][n])
            # RMSE
            (np.sqrt((
                ((station_atm_params_3min_clear['%s_%s' % (model, cs)]
                  - station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
            ).resample('A').mean()) / annual_avg).plot(ax=ax[1][n])
        ax[0][n].legend(METADATA.index.tolist())
        ax[0][n].set_title('%s %s MBE' % (name, cs.upper()))
        ax[0][n].set_ylabel('average relative difference (arb. units)')
        ax[1][n].legend(METADATA.index.tolist())
        ax[1][n].set_title('%s %s RMSE' % (name, cs.upper()))
        ax[1][n].set_ylabel('relative RMSE (arb. units)')
    f.tight_layout()
    plt.savefig('%s_annual_mbe-rms.png' % model)


Seasonal Slice

Is there a seasonal bias in the station errors?


In [9]:
# plot monthly average error, combine all stations on one chart for each model and clear sky component
for model, name in MODELS.iteritems():
    f, ax = plt.subplots(2, 3, figsize=(24, 12), sharex=False)
    for n, cs in enumerate(CS):
        for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
            monthly_avg = station_atm_params_3min_clear[cs][is_bright[station_id]].groupby(lambda x: x.month).mean()
            ((
                (station_atm_params_3min_clear['%s_%s' % (model, cs)]
                 - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            ).groupby(lambda x: x.month).mean() / monthly_avg).plot(ax=ax[0][n]).plot()
            (np.sqrt((
                ((station_atm_params_3min_clear['%s_%s' % (model, cs)]
                 - station_atm_params_3min_clear[cs])[is_bright[station_id]])**2
                ).groupby(lambda x: x.month).mean()) / monthly_avg).plot(ax=ax[1][n])
        ax[0][n].legend(METADATA.index.tolist())
        ax[0][n].set_title('%s %s MBE by month' % (name, cs.upper()))
        ax[0][n].set_ylabel('average relative difference (arb. units)')
        ax[1][n].legend(METADATA.index.tolist())
        ax[1][n].set_title('%s %s RMSE by month' % (name, cs.upper()))
        ax[1][n].set_ylabel('relative RMSE (arb. units)')
    f.tight_layout()
    f.savefig('%s_%s_mbe-rmse_by_month.png' % (model, cs))


Boxplot figures

Boxplot figures can combine the MBE and RMSE into a single plot. They can be used to ditermine whether categories are statistically different.


In [58]:
# CONSTANTS
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# an empty data frame for monthly atmospheric parameters
MONTHLY_ATM_PARAMS_3MIN_CLEAR = None
# loop over stations
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    # filter for bright days
    data = station_atm_params_3min_clear[is_bright[station_id]]
    # insert errors first
    for n, model in MODELS.iteritems():
        for m, cs in enumerate(CS):
            data.insert(
                43 + m + 3*n,
                '%s-%s_err' % (model, cs.upper()),
                data['%s_%s' % (model, cs)] - data[cs]
            )
    # average by month
    avg_data = data.resample('M').mean()
    # divide by montly average
    for n, cs in enumerate(CS):
        for m, model in MODELS.iteritems():
            avg_data.insert(
                54 + m + 4*n,
                '%s-%s_rel' % (model, cs.upper()),
                avg_data['%s-%s_err' % (model, cs.upper())] / avg_data[cs]
            )
    # insert station id
    avg_data.insert(0, 'station', station_id)
    # append together
    if MONTHLY_ATM_PARAMS_3MIN_CLEAR is None:
        MONTHLY_ATM_PARAMS_3MIN_CLEAR = pd.DataFrame(avg_data)
    else:
        MONTHLY_ATM_PARAMS_3MIN_CLEAR = MONTHLY_ATM_PARAMS_3MIN_CLEAR.append(avg_data)
MONTHLY_ATM_PARAMS_3MIN_CLEAR[['station', 'year', 'month', 'ghi', 'solis-GHI_err', 'solis-GHI_rel']]


Out[58]:
station year month ghi solis-GHI_err solis-GHI_rel
timestamps
2003-06-30 sxf 2003 6 647.642149 -24.392103 -0.037663
2003-07-31 sxf 2003 7 649.790835 -24.479409 -0.037673
2003-08-31 sxf 2003 8 575.541791 -34.904010 -0.060645
2003-09-30 sxf 2003 9 586.377628 -27.931720 -0.047634
2003-10-31 sxf 2003 10 487.084688 -14.314451 -0.029388
2003-11-30 sxf 2003 11 384.517503 0.943385 0.002453
2003-12-31 sxf 2003 12 324.327881 5.724915 0.017652
2004-01-31 sxf 2004 1 346.236174 15.766718 0.045537
2004-02-29 sxf 2004 2 482.545119 -26.883906 -0.055713
2004-03-31 sxf 2004 3 579.837466 -20.905775 -0.036055
2004-04-30 sxf 2004 4 616.583061 -25.933384 -0.042060
2004-05-31 sxf 2004 5 649.233105 -44.186642 -0.068060
2004-06-30 sxf 2004 6 651.489323 -40.151664 -0.061631
2004-07-31 sxf 2004 7 654.002882 -46.645625 -0.071323
2004-08-31 sxf 2004 8 612.056619 -48.416369 -0.079104
2004-09-30 sxf 2004 9 595.230575 -26.060970 -0.043783
2004-10-31 sxf 2004 10 505.297445 -9.940252 -0.019672
2004-11-30 sxf 2004 11 390.545449 -0.350652 -0.000898
2004-12-31 sxf 2004 12 332.406391 12.810522 0.038539
2005-01-31 sxf 2005 1 379.367492 -9.473715 -0.024972
2005-02-28 sxf 2005 2 458.023169 -18.481479 -0.040351
2005-03-31 sxf 2005 3 570.572960 -29.145401 -0.051081
2005-04-30 sxf 2005 4 627.459395 -26.479252 -0.042201
2005-05-31 sxf 2005 5 661.615360 -39.800386 -0.060156
2005-06-30 sxf 2005 6 689.522835 -48.274088 -0.070011
2005-07-31 sxf 2005 7 625.527226 -52.889161 -0.084551
2005-08-31 sxf 2005 8 626.928762 -33.158349 -0.052890
2005-09-30 sxf 2005 9 577.342416 -33.763143 -0.058480
2005-10-31 sxf 2005 10 466.221416 -8.996574 -0.019297
2005-11-30 sxf 2005 11 379.441712 -0.540677 -0.001425
... ... ... ... ... ... ...
2010-07-31 psu 2010 7 654.742731 -27.142434 -0.041455
2010-08-31 psu 2010 8 624.978500 -26.492675 -0.042390
2010-09-30 psu 2010 9 610.027522 -20.292071 -0.033264
2010-10-31 psu 2010 10 518.006868 -19.849509 -0.038319
2010-11-30 psu 2010 11 437.616078 -4.196045 -0.009588
2010-12-31 psu 2010 12 372.658454 6.684387 0.017937
2011-01-31 psu 2011 1 403.407287 -21.399369 -0.053047
2011-02-28 psu 2011 2 525.756719 -24.771834 -0.047117
2011-03-31 psu 2011 3 603.370528 -24.283604 -0.040247
2011-04-30 psu 2011 4 687.189510 -43.967518 -0.063982
2011-05-31 psu 2011 5 634.299562 -39.050311 -0.061564
2011-06-30 psu 2011 6 654.399643 -21.598156 -0.033005
2011-07-31 psu 2011 7 655.026927 -21.475971 -0.032786
2011-08-31 psu 2011 8 623.910085 -24.811979 -0.039769
2011-09-30 psu 2011 9 561.232385 -44.797081 -0.079819
2011-10-31 psu 2011 10 527.480068 -11.835994 -0.022439
2011-11-30 psu 2011 11 440.235709 -3.584816 -0.008143
2011-12-31 psu 2011 12 371.426891 -10.641463 -0.028650
2012-01-31 psu 2012 1 401.581893 -10.671282 -0.026573
2012-02-29 psu 2012 2 495.061224 -11.659225 -0.023551
2012-03-31 psu 2012 3 605.770495 -24.212891 -0.039970
2012-04-30 psu 2012 4 676.389847 -19.718676 -0.029153
2012-05-31 psu 2012 5 651.496331 -21.949765 -0.033691
2012-06-30 psu 2012 6 612.250085 -23.171483 -0.037846
2012-07-31 psu 2012 7 627.384781 -35.329511 -0.056312
2012-08-31 psu 2012 8 638.491022 -16.976400 -0.026588
2012-09-30 psu 2012 9 559.978088 -5.757441 -0.010282
2012-10-31 psu 2012 10 502.965362 -13.630498 -0.027100
2012-11-30 psu 2012 11 406.199442 0.485734 0.001196
2012-12-31 psu 2012 12 365.389686 5.656716 0.015481

835 rows × 6 columns


In [60]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False)  # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
    if cs == 'dhi': continue
    if n == 2: n = 1
    data = pd.melt(
        MONTHLY_ATM_PARAMS_3MIN_CLEAR,
        id_vars=['station', 'year', 'month'],
        value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
        var_name='model-cs',
        value_name='error'
    )
    meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
    sns.boxplot(x='model-cs', y='error', hue='year', data=data, ax=ax[n],
                whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
    ax[n].set_title('Interannual variability in %s by model' % cs.upper())
    ax[n].set_ylabel('average monthly relative error')
    ax[n].set_xlabel('model and clear sky component')
    if n == 1:
        ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=10, loc='lower center')  # x = 0.5 for all 3 components
    else:
        legend = ax[n].legend()
        legend.set_visible(False)
        legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.2)
f.savefig('y2y-boxplot_by_model-cs_CORRECTED.png')



In [61]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False)  # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
    if cs == 'dhi': continue
    if n == 2: n = 1
    data = pd.melt(
        MONTHLY_ATM_PARAMS_3MIN_CLEAR,
        id_vars=['station', 'year', 'month'],
        value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
        var_name='model-cs',
        value_name='error'
    )
    meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
    sns.boxplot(x='model-cs', y='error', hue='month', data=data, ax=ax[n],
                whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
    ax[n].set_title('Seasonal variation in %s by model' % cs.upper())
    ax[n].set_ylabel('average monthly relative error')
    ax[n].set_xlabel('model and clear sky component')
    if n == 1:
        ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=12, loc='lower center')  # x = 0.5 for all 3 components
    else:
        legend = ax[n].legend()
        legend.set_visible(False)
        legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.20)
f.savefig('seasonal-boxplot_by_model-cs_CORRECTED.png')



In [62]:
# plot
f, ax = plt.subplots(1, 2, figsize=(16, 6), sharex=False, sharey=False)  # 24 x 6 for 3 subplots
for n, cs in enumerate(CS):
    if cs == 'dhi': continue
    if n == 2: n = 1
    data = pd.melt(
        MONTHLY_ATM_PARAMS_3MIN_CLEAR,
        id_vars=['station', 'year', 'month'],
        value_vars=['%s-%s_rel' % (model, cs.upper()) for model in MODELS.itervalues()],
        var_name='model-cs',
        value_name='error'
    )
    meanlineprops = dict(linestyle='--', linewidth=1.5, color='red')
    sns.boxplot(x='model-cs', y='error', hue='station', data=data, ax=ax[n],
                whis=[5,95], showmeans=True, meanline=True, meanprops=meanlineprops)
    ax[n].set_title('Regional variation in %s by model' % cs.upper())
    ax[n].set_ylabel('average monthly relative error')
    ax[n].set_xlabel('model and clear sky component')
    if n == 1:
        ax[n].legend(bbox_to_anchor=(-0.1, -0.25), ncol=12, loc='lower center')  # x = 0.5 for all 3 components
    else:
        legend = ax[n].legend()
        legend.set_visible(False)
        legend.remove()
f.tight_layout()
f.subplots_adjust(bottom=0.20)
f.savefig('boxplot_by_model-cs_and_station_CORRECTED.png')



In [17]:
# plot all
STATION_ORDER = dict([(station_id, n) for n, station_id in enumerate(METADATA.index)])
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    for n, cs in enumerate(CS):
        data = pd.concat([
            (station_atm_params_3min_clear['%s_%s' % (model, cs)] - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            for model in MODELS.itervalues()
        ], axis=1).resample('M').mean()
        data.rename(columns=MODELS, inplace=True)
        sns.boxplot(data=data, ax=ax[STATION_ORDER[station_id]][n])
        sns.swarmplot(data=data, color=".25", ax=ax[STATION_ORDER[station_id]][n])
        ax[STATION_ORDER[station_id]][n].set_title('%s by model at %s' % (cs.upper(), METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplot.png')



In [18]:
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    for n, cs in enumerate(CS):
        data = pd.concat([
            (station_atm_params_3min_clear['%s_%s' % (model, cs)]
             - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            for model in MODELS.itervalues()
        ] + [station_atm_params_3min_clear['year']], axis=1).resample('M').mean()
        data.rename(columns=MODELS, inplace=True)
        data = pd.melt(
            data, id_vars=['year'], value_vars=MODELS.values(),
            var_name='model', value_name='mbe'
        )
        sns.boxplot(x='model', y='mbe', data=data, hue='year', ax=ax[STATION_ORDER[station_id]][n])
        #ax[STATION_ORDER[station_id]][n].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
        ax[STATION_ORDER[station_id]][n].set_title('%s by model and year at %s' % (cs, METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplots_by_year.png')



In [19]:
f, ax = plt.subplots(7, 3, figsize=(24, 42), sharex=False)
for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems():
    for n, cs in enumerate(CS):
        data = pd.concat([
            (station_atm_params_3min_clear['%s_%s' % (model, cs)] - station_atm_params_3min_clear[cs])[is_bright[station_id]]
            for model in MODELS.itervalues()
        ] + [station_atm_params_3min_clear['month']], axis=1).resample('M').mean()
        data.rename(columns=MODELS, inplace=True)
        data = pd.melt(
            data, id_vars=['month'], value_vars=MODELS.values(),
            var_name='model', value_name='mbe'
        )
        sns.boxplot(x='model', y='mbe', data=data, hue='month', ax=ax[STATION_ORDER[station_id]][n])
        #ax[STATION_ORDER[station_id]][n].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
        ax[STATION_ORDER[station_id]][n].set_title('%s by model and month at %s' % (cs.upper(), METADATA['station name'][station_id]))
f.tight_layout()
f.savefig('all_boxplot_by_months.png')



In [51]:
# plot all SURFRAD stations DHI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average DHI
avg_dhi = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['dhi'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate DHI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate relative station DHI error by dividing average error by average station DHI
# then concatenate all models
data = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_dhi' % model] - station_atm_params_3min_clear['dhi'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_dhi for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations DHI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_DHI.png')
data


Out[51]:
solis lt macc bird
sxf 0.042742 0.028913 0.077326 0.320740
gwn -0.031365 0.056529 0.073070 0.249971
fpk 0.067008 -0.083157 0.090016 0.360026
tbl 0.086766 0.423988 0.293303 0.382935
bon -0.074281 0.015100 -0.004631 0.192466
dra 0.207783 0.112152 0.301512 0.557257
psu -0.071437 -0.037308 -0.034717 0.200173

In [52]:
# plot all SURFRAD stations DNI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average DNI
avg_dni = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['dni'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate DNI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# calculate relative station DNI error by dividing average error by average station DNI
# then concatenate all models
data = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_dni' % model] - station_atm_params_3min_clear['dni'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_dni for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations DNI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_DNI.png')
data


Out[52]:
solis lt macc bird
sxf -0.066710 -0.054675 -0.072437 -0.104244
gwn -0.024883 -0.051284 -0.052644 -0.069595
fpk -0.059665 0.006956 -0.056658 -0.102769
tbl -0.069535 -0.075223 -0.029994 -0.120169
bon -0.027777 -0.060414 -0.049281 -0.075036
dra -0.093453 -0.035037 -0.089773 -0.146804
psu -0.009624 -0.025569 -0.025173 -0.064427

In [50]:
# plot all SURFRAD stations GHI by model
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# calculate station average GHI
avg_ghi = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['ghi'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# sxf    560.271702
# gwn    591.976444
# fpk    543.374845
# tbl    608.619889
# bon    584.017623
# dra    654.731447
# psu    586.754301
# calculate GHI errors for each station and concatenate stations horizontally (axis=1) and average each station
# NOTE: averaging columns using `mean(axis=0)` transposes the dataframe so that stations are now the index
# sxf   -10.020557
# gwn    -2.605967
# fpk    -7.293992
# tbl   -24.014558
# bon    -8.630023
# dra   -29.584633
# psu    -6.126202
# calculate relative station GHI error by dividing average error by average station GHI
# then concatenate all models
data = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_ghi' % model] - station_atm_params_3min_clear['ghi'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_ghi for model in MODELS.itervalues()
], axis=1)
data.rename(columns=MODELS, inplace=True)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
sns.swarmplot(data=data, color=".25")
plt.title('All SURFRAD stations GHI by model')
plt.ylabel('average relative error by station')
plt.savefig('all_stations_all_years_by_model_GHI.png')
data


Out[50]:
solis lt macc bird
sxf -0.050019 -0.027960 -0.040513 -0.017885
gwn -0.032766 -0.022025 -0.027301 -0.004402
fpk -0.039539 0.006957 -0.026902 -0.013423
tbl -0.045974 0.007681 0.017695 -0.039457
bon -0.039753 -0.039505 -0.036385 -0.014777
dra -0.052828 -0.013125 -0.035005 -0.045186
psu -0.027228 -0.025283 -0.026643 -0.010441

In [53]:
MODELS = {0: 'solis', 1: 'lt', 2: 'macc', 3: 'bird'}
# DNI
avg_dni = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['dni'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
dni = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_dni' % model] - station_atm_params_3min_clear['dni'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_dni for model in MODELS.itervalues()
], axis=1)
dni.rename(columns=MODELS, inplace=True)
dni.insert(0, 'cs', ['dni']*len(dni))
# DHI
avg_dhi = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['dhi'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
dhi = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_dhi' % model] - station_atm_params_3min_clear['dhi'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_dhi for model in MODELS.itervalues()
], axis=1)
dhi.rename(columns=MODELS, inplace=True)
dhi.insert(0, 'cs', ['dhi']*len(dhi))
# GHI
avg_ghi = pd.concat([
    pd.Series(data=station_atm_params_3min_clear['ghi'][is_bright[station_id]], name=station_id)
    for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
], axis=1).mean(axis=0)
ghi = pd.concat([
    pd.concat([
        pd.Series(
            data=(station_atm_params_3min_clear['%s_ghi' % model] - station_atm_params_3min_clear['ghi'])[is_bright[station_id]],
            name=station_id
        ) for station_id, station_atm_params_3min_clear in atm_params_3min_clear.iteritems()
    ], axis=1).mean(axis=0) / avg_ghi for model in MODELS.itervalues()
], axis=1)
ghi.rename(columns=MODELS, inplace=True)
ghi.insert(0, 'cs', ['ghi']*len(ghi))
# melt
data = pd.concat([dni, dhi, ghi])
data = pd.melt(
    data, id_vars=['cs'], value_vars=MODELS.values(),
    var_name='model', value_name='mbe'
)
meanlineprops = dict(linestyle='--', linewidth=1.5, color='black')
sns.boxplot(x='model', y='mbe', hue='cs', data=data, whis=[5, 95], showmeans=True, meanline=True, meanprops=meanlineprops)
plt.title('All SURFRAD stations by model')
plt.savefig('all_stations_all_years_by_model_and_cs.png')


Use of Measured Aerosol Optical Depth and Precipitable Water to Model Clear Sky Irradiance by Mark A. Mikofski, Clifford W. Hansen, William F. Holmgren and Gregory M. Kimball is licensed under a Creative Commons Attribution 4.0 International License.