Construction of Conditional Probability Tables for the BN Model

This notebook constructs conditional probability tables for the Bayesian Network class inference model


In [409]:
import os, glob
import pandas as pd
import numpy as np
from support import data_dir

fileimport = glob.glob(os.path.join(data_dir, 'BN','*.txt'))
data = {}
names = [(f.rpartition('/')[2]).partition('.')[0] for f in fileimport]
for n, f in zip(names, fileimport):
    print(n)
    data[n] = pd.read_table(f, index_col=0)


AMPS2013bErvenElectrificationOffset
LSMmakeupHighDetail
LSMmakeupAssumptions
HHtoIncomeByLSM

Customer class marginal probability distribution

The likelihood that an electrified customer belongs to a certain customer class is calculated by multiplying the probability that a LSM category is represented in a class with the probability that a household in that LSM category is electrified. This can be represented as the formula:

P(class|LSM) x P(LSM|electrified household) = P(class|electrified household)

P ( LSM | electrified household )

The AMPS2013bErvenElectrificationOffset table has been obtained from the Domestic Load Research Process Review 2015 and is derived from data in the AMPS 2013b Living Standard Measure survey.

The Electrification Offset quantifies the likelihood that a household in a LSM category has been electrified. It is a range between 0 and 1, where 0 means noone is electrified and 1 means everyone is electrified.


In [410]:
tbl1 = data['AMPS2013bErvenElectrificationOffset']
tbl1['ElectrifiedDwellings'] = tbl1['EstErven']*tbl1['ElectrificationOffset']
tbl1['P_LSM|electrified'] = tbl1['ElectrifiedDwellings']/sum(tbl1['ElectrifiedDwellings'])
tbl1


Out[410]:
EstErven ElectrificationOffset ElectrifiedDwellings P_LSM|electrified
LSM
1 92741 0.30 27822.30 0.004554
2 195786 0.42 82230.12 0.013460
3 374441 0.73 273341.93 0.044744
4 1106303 0.93 1028861.79 0.168417
5 821354 0.98 804926.92 0.131761
6 765524 0.99 757868.76 0.124058
7 1361603 1.00 1361603.00 0.222884
8 798807 1.00 798807.00 0.130759
9 673994 1.00 673994.00 0.110328
10 299553 1.00 299553.00 0.049035

P ( class | electrified household )

The distribution of LSM categories over the DLR customer classes P (class | LSM) has been approximated from customer class definitions in the Geo-based Load Forecast Appendix A.


In [425]:
tbl2 = data['LSMmakeupAssumptions']
t2 = tbl2.iloc[:, 0:10]
t2.columns = range(1,11)
tbl2['P_class|electrified'] = t2.dot(tbl1['P_LSM|electrified']).values
tbl2


Out[425]:
LSM 1 LSM 2 LSM 3 LSM 4 LSM 5 LSM 6 LSM 7 LSM 8 LSM 9 LSM 10 Assumption P_class|electrified
class
rural 0.6 0.4 0 0 0 0 0.0 0.0 0.0 0.0 assume 40% of LSM 1&2 living in rural scattered 0.008117
village 0.4 0.6 0 0 0 0 0.0 0.0 0.0 0.0 assume 60% of LSM 1&2 living in rural scattered 0.009898
informal settlement 0.0 0.0 1 1 0 0 0.0 0.0 0.0 0.0 NaN 0.213161
township 0.0 0.0 0 0 1 1 0.0 0.0 0.0 0.0 NaN 0.255818
urban residential 7 0.0 0.0 0 0 0 0 0.6 0.0 0.0 0.0 assume 60% of LSM 7 0.133731
urban townhouse 7&8 0.0 0.0 0 0 0 0 0.4 0.5 0.0 0.0 assume 40% of LSM 7 & 50% of LSM 8 0.154533
urban residential 8&9 0.0 0.0 0 0 0 0 0.0 0.5 0.5 0.0 assume 50% of LSM 8&9 0.120543
urban townhouse 9&10 0.0 0.0 0 0 0 0 0.0 0.0 0.5 0.5 assume 50% of LSM 9&10 0.079681
urban estate 0.0 0.0 0 0 0 0 0.0 0.0 0.0 0.5 assume 50% LSM 10 0.024517

Derivation of monthly income by customer class conditional probability distribution

The likelihood that a customer with a specific income belongs to a certain class is the product of the probabilities that a household in that LSM category has a specific income and the distribution that a LSM category is represented in that class. This can be represented as the formula:

P(income|LSM) x P(LSM|class) = P(income|class)

P ( income | LSM )

Number of households per income range per LSM

The number of households per income range per LSM has been approximated from Table 3 in the Geo-based Load Forecast, which is based on data from the AMPS 2010b Living Standard Measure Survey.


In [412]:
data['HHtoIncomeByLSM'].head()


Out[412]:
max income lsm7low lsm7high lsm8low lsm8high lsm9low lsm9high lsm10low lsm10high
min income
0 499 1182 705 645 470 35 477 215 0
500 599 2307 858 0 197 0 0 0 0
600 699 1234 752 0 0 0 0 0 0
700 799 1039 293 611 344 150 352 0 0
800 899 444 482 266 1076 564 0 0 0

Number of households per DLR compatible income bin per LSM


In [413]:
tbl3 = data['HHtoIncomeByLSM'].iloc[:,1:9]

count = [100/(tbl3.index[i+1]-tbl3.index[i]) for i in range(0, len(tbl3)-1)]+[100/(240700 - 50000)]
t3 = tbl3.multiply(count, axis = 0)

ix = np.arange(0, 240800, 100)
bins = [0, 1800, 3200, 7800, 11600, 19116, 24500, 65500, 240700]
tbl3x = t3.reindex(ix, method = 'ffill')
tbl3_binned = tbl3x.groupby(pd.cut(tbl3x.index,bins)).sum()
tbl3_binned


Out[413]:
lsm7low lsm7high lsm8low lsm8high lsm9low lsm9high lsm10low lsm10high
(0, 1800] 38858.850000 30827.250000 13914.250000 8510.250000 4709.750000 5159.100000 436.000000 864.750000
(1800, 3200] 63866.550000 34037.750000 25986.950000 12697.150000 8988.350000 6771.500000 2264.000000 1686.050000
(3200, 7800] 238428.200000 225564.100000 129699.700000 107037.700000 87627.200000 47350.000000 20245.500000 6256.700000
(7800, 11600] 191884.300000 187032.100000 147215.300000 126917.300000 131591.500000 112744.900000 52059.600000 22336.200000
(11600, 19116] 115805.900000 133593.800000 126419.800000 141928.600000 160740.400000 146641.300000 80955.500000 50421.300000
(19116, 24500] 25403.480000 42954.040000 66228.000000 59934.760000 71996.760000 85446.720000 66943.560000 51820.000000
(24500, 65500] 15529.337829 32321.574578 31542.578920 74723.937955 114935.653529 143351.479056 146132.109009 187405.798112
(65500, 240700] 1190.661772 231.517567 1354.194022 5387.377032 13146.890404 19332.635553 37030.867331 57937.271106

In [414]:
tbl3_totals = tbl3_binned.sum(axis=0)
Pincome_lsm = tbl3_binned/tbl3_totals
Pincome_lsm


Out[414]:
lsm7low lsm7high lsm8low lsm8high lsm9low lsm9high lsm10low lsm10high
(0, 1800] 0.056238 0.044901 0.025655 0.015844 0.007932 0.009102 0.001074 0.002283
(1800, 3200] 0.092431 0.049577 0.047915 0.023639 0.015139 0.011947 0.005575 0.004452
(3200, 7800] 0.345064 0.328541 0.239139 0.199274 0.147586 0.083540 0.049858 0.016520
(7800, 11600] 0.277704 0.272418 0.271434 0.236285 0.221633 0.198916 0.128204 0.058977
(11600, 19116] 0.167600 0.194584 0.233092 0.264232 0.270727 0.258719 0.199365 0.133133
(19116, 24500] 0.036765 0.062564 0.122111 0.111582 0.121260 0.150753 0.164858 0.136826
(24500, 65500] 0.022475 0.047077 0.058158 0.139115 0.193580 0.252915 0.359872 0.494829
(65500, 240700] 0.001723 0.000337 0.002497 0.010030 0.022143 0.034109 0.091194 0.152979

P ( LSM | class )


In [415]:
tbl4 = data['LSMmakeupHighDetail']
Plsm_class = tbl4.divide(tbl4.sum(axis=1), axis=0)
Plsm_class


Out[415]:
lsm7low lsm7high lsm8low lsm8high lsm9low lsm9high lsm10low lsm10high
class
rural NaN NaN NaN NaN NaN NaN NaN NaN
village NaN NaN NaN NaN NaN NaN NaN NaN
informal settlement NaN NaN NaN NaN NaN NaN NaN NaN
township NaN NaN NaN NaN NaN NaN NaN NaN
urban residential 7 0.500000 0.500000 0.000000 0.000000 0.00 0.00 0.00 0.00
urban townhouse 7&8 0.222222 0.222222 0.277778 0.277778 0.00 0.00 0.00 0.00
urban residential 8&9 0.000000 0.000000 0.250000 0.250000 0.25 0.25 0.00 0.00
urban townhouse 9&10 0.000000 0.000000 0.000000 0.000000 0.25 0.25 0.25 0.25
urban estate 0.000000 0.000000 0.000000 0.000000 0.00 0.00 0.50 0.50

P ( income | class )


In [416]:
Pincome_class = Plsm_class.dot(Pincome_lsm.T)
Pincome_class


Out[416]:
(0, 1800] (1800, 3200] (3200, 7800] (7800, 11600] (11600, 19116] (19116, 24500] (24500, 65500] (65500, 240700]
class
rural NaN NaN NaN NaN NaN NaN NaN NaN
village NaN NaN NaN NaN NaN NaN NaN NaN
informal settlement NaN NaN NaN NaN NaN NaN NaN NaN
township NaN NaN NaN NaN NaN NaN NaN NaN
urban residential 7 0.050570 0.071004 0.336803 0.275061 0.181092 0.049665 0.034776 0.001030
urban townhouse 7&8 0.034003 0.051433 0.271472 0.263282 0.218631 0.086988 0.070254 0.003937
urban residential 8&9 0.014633 0.024660 0.167385 0.232067 0.256692 0.126427 0.160942 0.017194
urban townhouse 9&10 0.005098 0.009278 0.074376 0.151932 0.215486 0.143425 0.325299 0.075106
urban estate 0.001679 0.005014 0.033189 0.093591 0.166249 0.150842 0.427351 0.122086