In [409]:
import os, glob
import pandas as pd
import numpy as np
from support import data_dir
fileimport = glob.glob(os.path.join(data_dir, 'BN','*.txt'))
data = {}
names = [(f.rpartition('/')[2]).partition('.')[0] for f in fileimport]
for n, f in zip(names, fileimport):
print(n)
data[n] = pd.read_table(f, index_col=0)
The likelihood that an electrified customer belongs to a certain customer class is calculated by multiplying the probability that a LSM category is represented in a class with the probability that a household in that LSM category is electrified. This can be represented as the formula:
P(class|LSM) x P(LSM|electrified household) = P(class|electrified household)
The AMPS2013bErvenElectrificationOffset
table has been obtained from the Domestic Load Research Process Review 2015 and is derived from data in the AMPS 2013b Living Standard Measure survey.
The Electrification Offset quantifies the likelihood that a household in a LSM category has been electrified. It is a range between 0 and 1, where 0 means noone is electrified and 1 means everyone is electrified.
In [410]:
tbl1 = data['AMPS2013bErvenElectrificationOffset']
tbl1['ElectrifiedDwellings'] = tbl1['EstErven']*tbl1['ElectrificationOffset']
tbl1['P_LSM|electrified'] = tbl1['ElectrifiedDwellings']/sum(tbl1['ElectrifiedDwellings'])
tbl1
Out[410]:
In [425]:
tbl2 = data['LSMmakeupAssumptions']
t2 = tbl2.iloc[:, 0:10]
t2.columns = range(1,11)
tbl2['P_class|electrified'] = t2.dot(tbl1['P_LSM|electrified']).values
tbl2
Out[425]:
The likelihood that a customer with a specific income belongs to a certain class is the product of the probabilities that a household in that LSM category has a specific income and the distribution that a LSM category is represented in that class. This can be represented as the formula:
P(income|LSM) x P(LSM|class) = P(income|class)
In [412]:
data['HHtoIncomeByLSM'].head()
Out[412]:
In [413]:
tbl3 = data['HHtoIncomeByLSM'].iloc[:,1:9]
count = [100/(tbl3.index[i+1]-tbl3.index[i]) for i in range(0, len(tbl3)-1)]+[100/(240700 - 50000)]
t3 = tbl3.multiply(count, axis = 0)
ix = np.arange(0, 240800, 100)
bins = [0, 1800, 3200, 7800, 11600, 19116, 24500, 65500, 240700]
tbl3x = t3.reindex(ix, method = 'ffill')
tbl3_binned = tbl3x.groupby(pd.cut(tbl3x.index,bins)).sum()
tbl3_binned
Out[413]:
In [414]:
tbl3_totals = tbl3_binned.sum(axis=0)
Pincome_lsm = tbl3_binned/tbl3_totals
Pincome_lsm
Out[414]:
In [415]:
tbl4 = data['LSMmakeupHighDetail']
Plsm_class = tbl4.divide(tbl4.sum(axis=1), axis=0)
Plsm_class
Out[415]:
In [416]:
Pincome_class = Plsm_class.dot(Pincome_lsm.T)
Pincome_class
Out[416]: