Model calibration serves the purpose of comparing a (data) model revision against a benchmark (expert) model. It is a validation process that ensures external quality assurance. The key question asked during the calibration process is: "Does the proposed new model meet stakeholder needs?"
The calibration process captures the following 3 questions that determine the extent to which a proposed model is fit for purpose:
In [1]:
import features.feature_ts as ts
import evaluation.calibration as ec
import evaluation.evalplot as ep
import benchmark.bm0 as bm0
import pandas as pd
import numpy as np
import plotly.offline as offline
import plotly.graph_objs as go
import plotly as py
offline.init_notebook_mode(connected=True) #set for plotly offline plotting
In [2]:
#Select data and model
year = 2011
experiment_dir = 'exp1_BNve_1' # sub-directory in which inferred customer classes are saved
In [5]:
ods = pd.read_csv('data/experimental_model/exp1_BNve_1/demand_summary_2011.csv')
ohp = pd.read_csv('data/experimental_model/exp1_BNve_1/hourly_profiles_2011.csv')
ohp.head()
Out[5]:
The purpose of an uncertainty index is to assess the data integrity. The key questions that the uncertainty index answers are:
The uncertainty index is calculated by establishing whether the sample size is sufficient to draw a conclusion about a certain characteristic feature of the model. In this system it is derived by selecting a valid model based on:
The uncertainty index is the ratio of variables (rows) in the valid model to total variables. It is calculated as follows:
valid_submodel = submodel_input[where AnswerID_count >= minimum and valid_obs_ratio >= minimum]
uix = rows(valid_submodel) / rows(submodel_input)
Moreover, for a model to be valid, it must share the same baseline as the benchmark model (eg same year, same region).
In [3]:
min_answerid = 4
min_obsratio = 0.85
In [6]:
ohp.loc[ohp.YearsElectrified==6].describe()
Out[6]:
In [7]:
stats = ec.uncertaintyStats(ods)
stats.loc['informal_settlement']
Out[7]:
In [8]:
ep.plotAnswerIDCount(ohp)
In [9]:
ep.plotValidObsRatio(ohp, 'Weekday')
ep.plotValidObsRatio(ohp, 'Saturday')
ep.plotValidObsRatio(ohp, 'Sunday')
In [36]:
ods.name = 'demand_summary'
ohp.name = 'hourly_profiles'
validmodels = ec.dataIntegrity([ods, ohp], min_answerid, min_obsratio)
validmodels
Out[36]:
In [17]:
customer_class = 'informal_settlement'
daytype ='Weekday'
years_electrified = 8
In [100]:
ep.plotHourlyProfiles(customer_class, 'expert', daytype, 10)
In [101]:
ep.plotHourlyProfiles(customer_class, 'data', daytype, 10,
model_dir=experiment_dir, data=ohp)
In [37]:
#Get new valid submodels
valid_new_ds = validmodels.at['demand_summary','valid_data']
valid_new_hp = validmodels.at['hourly_profiles','valid_data']
new_dsts = 'M_kw_mean'
new_hpts = 'kw_mean'
#Get expert model
ex_ds, ex_hp, ex_dsts, ex_hpts = bm0.benchmarkModel()
#Calculate model similarity
euclid_ds, count_ds, merged_ds = ec.modelSimilarity(ex_ds, ex_dsts, valid_new_ds, new_dsts, 'ds')
euclid_hp, count_hp, merged_hp = ec.modelSimilarity(ex_hp, ex_hpts, valid_new_hp, new_hpts, 'hp')
#Visualise model similarity
ep.plotProfileSimilarity(merged_hp, customer_class, daytype)
ep.plotDemandSimilarity(merged_ds)
ep.multiplotDemandSimilarity(merged_ds)
In [34]:
valid_new_hp.loc[(valid_new_hp['class']=='informal_settlement')&(valid_new_hp.YearsElectrified==15)].describe()
Out[34]:
In [72]:
ep.plotProfileSimilarity(merged_hp, 'informal_settlement', 'Weekday')
In [17]:
pd.DataFrame(data = [[euclid_ds, count_ds], [euclid_hp, count_hp]], index=['demand_summary','hourly_profiles'],
columns=['euclidean distance','data point count'])
Out[17]:
In [10]:
ep.plotMaxDemandSpread(md)