Construction of Conditional Probability Tables for the BN Model

This notebook constructs conditional probability tables for the Bayesian Network class inference model



In [409]:

    
import os, glob
import pandas as pd
import numpy as np
from support import data_dir

fileimport = glob.glob(os.path.join(data_dir, 'BN','*.txt'))
data = {}
names = [(f.rpartition('/')[2]).partition('.')[0] for f in fileimport]
for n, f in zip(names, fileimport):
    print(n)
    data[n] = pd.read_table(f, index_col=0)









    



AMPS2013bErvenElectrificationOffset
LSMmakeupHighDetail
LSMmakeupAssumptions
HHtoIncomeByLSM

Customer class marginal probability distribution

The likelihood that an electrified customer belongs to a certain customer class is calculated by multiplying the probability that a LSM category is represented in a class with the probability that a household in that LSM category is electrified. This can be represented as the formula:

P(class|LSM) x P(LSM|electrified household) = P(class|electrified household)

P ( LSM | electrified household )

The AMPS2013bErvenElectrificationOffset table has been obtained from the Domestic Load Research Process Review 2015 and is derived from data in the AMPS 2013b Living Standard Measure survey.

The Electrification Offset quantifies the likelihood that a household in a LSM category has been electrified. It is a range between 0 and 1, where 0 means noone is electrified and 1 means everyone is electrified.



In [410]:

    
tbl1 = data['AMPS2013bErvenElectrificationOffset']
tbl1['ElectrifiedDwellings'] = tbl1['EstErven']*tbl1['ElectrificationOffset']
tbl1['P_LSM|electrified'] = tbl1['ElectrifiedDwellings']/sum(tbl1['ElectrifiedDwellings'])
tbl1









    Out[410]:







  
    
      
      EstErven
      ElectrificationOffset
      ElectrifiedDwellings
      P_LSM|electrified
    
    
      LSM
      
      
      
      
    
  
  
    
      1
      92741
      0.30
      27822.30
      0.004554
    
    
      2
      195786
      0.42
      82230.12
      0.013460
    
    
      3
      374441
      0.73
      273341.93
      0.044744
    
    
      4
      1106303
      0.93
      1028861.79
      0.168417
    
    
      5
      821354
      0.98
      804926.92
      0.131761
    
    
      6
      765524
      0.99
      757868.76
      0.124058
    
    
      7
      1361603
      1.00
      1361603.00
      0.222884
    
    
      8
      798807
      1.00
      798807.00
      0.130759
    
    
      9
      673994
      1.00
      673994.00
      0.110328
    
    
      10
      299553
      1.00
      299553.00
      0.049035

P ( class | electrified household )

The distribution of LSM categories over the DLR customer classes P (class | LSM) has been approximated from customer class definitions in the Geo-based Load Forecast Appendix A.



In [425]:

    
tbl2 = data['LSMmakeupAssumptions']
t2 = tbl2.iloc[:, 0:10]
t2.columns = range(1,11)
tbl2['P_class|electrified'] = t2.dot(tbl1['P_LSM|electrified']).values
tbl2









    Out[425]:







  
    
      
      LSM 1
      LSM 2
      LSM 3
      LSM 4
      LSM 5
      LSM 6
      LSM 7
      LSM 8
      LSM 9
      LSM 10
      Assumption
      P_class|electrified
    
    
      class
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      rural
      0.6
      0.4
      0
      0
      0
      0
      0.0
      0.0
      0.0
      0.0
      assume 40% of LSM 1&2 living in rural scattered
      0.008117
    
    
      village
      0.4
      0.6
      0
      0
      0
      0
      0.0
      0.0
      0.0
      0.0
      assume 60% of LSM 1&2 living in rural scattered
      0.009898
    
    
      informal settlement
      0.0
      0.0
      1
      1
      0
      0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.213161
    
    
      township
      0.0
      0.0
      0
      0
      1
      1
      0.0
      0.0
      0.0
      0.0
      NaN
      0.255818
    
    
      urban residential 7
      0.0
      0.0
      0
      0
      0
      0
      0.6
      0.0
      0.0
      0.0
      assume 60% of LSM 7
      0.133731
    
    
      urban townhouse 7&8
      0.0
      0.0
      0
      0
      0
      0
      0.4
      0.5
      0.0
      0.0
      assume 40% of LSM 7 & 50% of LSM 8
      0.154533
    
    
      urban residential 8&9
      0.0
      0.0
      0
      0
      0
      0
      0.0
      0.5
      0.5
      0.0
      assume 50% of LSM 8&9
      0.120543
    
    
      urban townhouse 9&10
      0.0
      0.0
      0
      0
      0
      0
      0.0
      0.0
      0.5
      0.5
      assume 50% of LSM 9&10
      0.079681
    
    
      urban estate
      0.0
      0.0
      0
      0
      0
      0
      0.0
      0.0
      0.0
      0.5
      assume 50% LSM 10
      0.024517

Derivation of monthly income by customer class conditional probability distribution

The likelihood that a customer with a specific income belongs to a certain class is the product of the probabilities that a household in that LSM category has a specific income and the distribution that a LSM category is represented in that class. This can be represented as the formula:

P(income|LSM) x P(LSM|class) = P(income|class)

P ( income | LSM )

Number of households per income range per LSM

The number of households per income range per LSM has been approximated from Table 3 in the Geo-based Load Forecast, which is based on data from the AMPS 2010b Living Standard Measure Survey.



In [412]:

    
data['HHtoIncomeByLSM'].head()

Number of households per DLR compatible income bin per LSM



In [413]:

    
tbl3 = data['HHtoIncomeByLSM'].iloc[:,1:9]

count = [100/(tbl3.index[i+1]-tbl3.index[i]) for i in range(0, len(tbl3)-1)]+[100/(240700 - 50000)]
t3 = tbl3.multiply(count, axis = 0)

ix = np.arange(0, 240800, 100)
bins = [0, 1800, 3200, 7800, 11600, 19116, 24500, 65500, 240700]
tbl3x = t3.reindex(ix, method = 'ffill')
tbl3_binned = tbl3x.groupby(pd.cut(tbl3x.index,bins)).sum()
tbl3_binned









    Out[413]:







  
    
      
      lsm7low
      lsm7high
      lsm8low
      lsm8high
      lsm9low
      lsm9high
      lsm10low
      lsm10high
    
  
  
    
      (0, 1800]
      38858.850000
      30827.250000
      13914.250000
      8510.250000
      4709.750000
      5159.100000
      436.000000
      864.750000
    
    
      (1800, 3200]
      63866.550000
      34037.750000
      25986.950000
      12697.150000
      8988.350000
      6771.500000
      2264.000000
      1686.050000
    
    
      (3200, 7800]
      238428.200000
      225564.100000
      129699.700000
      107037.700000
      87627.200000
      47350.000000
      20245.500000
      6256.700000
    
    
      (7800, 11600]
      191884.300000
      187032.100000
      147215.300000
      126917.300000
      131591.500000
      112744.900000
      52059.600000
      22336.200000
    
    
      (11600, 19116]
      115805.900000
      133593.800000
      126419.800000
      141928.600000
      160740.400000
      146641.300000
      80955.500000
      50421.300000
    
    
      (19116, 24500]
      25403.480000
      42954.040000
      66228.000000
      59934.760000
      71996.760000
      85446.720000
      66943.560000
      51820.000000
    
    
      (24500, 65500]
      15529.337829
      32321.574578
      31542.578920
      74723.937955
      114935.653529
      143351.479056
      146132.109009
      187405.798112
    
    
      (65500, 240700]
      1190.661772
      231.517567
      1354.194022
      5387.377032
      13146.890404
      19332.635553
      37030.867331
      57937.271106



In [414]:

    
tbl3_totals = tbl3_binned.sum(axis=0)
Pincome_lsm = tbl3_binned/tbl3_totals
Pincome_lsm









    Out[414]:







  
    
      
      lsm7low
      lsm7high
      lsm8low
      lsm8high
      lsm9low
      lsm9high
      lsm10low
      lsm10high
    
  
  
    
      (0, 1800]
      0.056238
      0.044901
      0.025655
      0.015844
      0.007932
      0.009102
      0.001074
      0.002283
    
    
      (1800, 3200]
      0.092431
      0.049577
      0.047915
      0.023639
      0.015139
      0.011947
      0.005575
      0.004452
    
    
      (3200, 7800]
      0.345064
      0.328541
      0.239139
      0.199274
      0.147586
      0.083540
      0.049858
      0.016520
    
    
      (7800, 11600]
      0.277704
      0.272418
      0.271434
      0.236285
      0.221633
      0.198916
      0.128204
      0.058977
    
    
      (11600, 19116]
      0.167600
      0.194584
      0.233092
      0.264232
      0.270727
      0.258719
      0.199365
      0.133133
    
    
      (19116, 24500]
      0.036765
      0.062564
      0.122111
      0.111582
      0.121260
      0.150753
      0.164858
      0.136826
    
    
      (24500, 65500]
      0.022475
      0.047077
      0.058158
      0.139115
      0.193580
      0.252915
      0.359872
      0.494829
    
    
      (65500, 240700]
      0.001723
      0.000337
      0.002497
      0.010030
      0.022143
      0.034109
      0.091194
      0.152979

P ( LSM | class )



In [415]:

    
tbl4 = data['LSMmakeupHighDetail']
Plsm_class = tbl4.divide(tbl4.sum(axis=1), axis=0)
Plsm_class









    Out[415]:







  
    
      
      lsm7low
      lsm7high
      lsm8low
      lsm8high
      lsm9low
      lsm9high
      lsm10low
      lsm10high
    
    
      class
      
      
      
      
      
      
      
      
    
  
  
    
      rural
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      village
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      informal settlement
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      township
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      urban residential 7
      0.500000
      0.500000
      0.000000
      0.000000
      0.00
      0.00
      0.00
      0.00
    
    
      urban townhouse 7&8
      0.222222
      0.222222
      0.277778
      0.277778
      0.00
      0.00
      0.00
      0.00
    
    
      urban residential 8&9
      0.000000
      0.000000
      0.250000
      0.250000
      0.25
      0.25
      0.00
      0.00
    
    
      urban townhouse 9&10
      0.000000
      0.000000
      0.000000
      0.000000
      0.25
      0.25
      0.25
      0.25
    
    
      urban estate
      0.000000
      0.000000
      0.000000
      0.000000
      0.00
      0.00
      0.50
      0.50

P ( income | class )



In [416]:

    
Pincome_class = Plsm_class.dot(Pincome_lsm.T)
Pincome_class









    Out[416]:







  
    
      
      (0, 1800]
      (1800, 3200]
      (3200, 7800]
      (7800, 11600]
      (11600, 19116]
      (19116, 24500]
      (24500, 65500]
      (65500, 240700]
    
    
      class
      
      
      
      
      
      
      
      
    
  
  
    
      rural
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      village
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      informal settlement
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      township
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      urban residential 7
      0.050570
      0.071004
      0.336803
      0.275061
      0.181092
      0.049665
      0.034776
      0.001030
    
    
      urban townhouse 7&8
      0.034003
      0.051433
      0.271472
      0.263282
      0.218631
      0.086988
      0.070254
      0.003937
    
    
      urban residential 8&9
      0.014633
      0.024660
      0.167385
      0.232067
      0.256692
      0.126427
      0.160942
      0.017194
    
    
      urban townhouse 9&10
      0.005098
      0.009278
      0.074376
      0.151932
      0.215486
      0.143425
      0.325299
      0.075106
    
    
      urban estate
      0.001679
      0.005014
      0.033189
      0.093591
      0.166249
      0.150842
      0.427351
      0.122086

	EstErven	ElectrificationOffset	ElectrifiedDwellings	P_LSM\|electrified
LSM
1	92741	0.30	27822.30	0.004554
2	195786	0.42	82230.12	0.013460
3	374441	0.73	273341.93	0.044744
4	1106303	0.93	1028861.79	0.168417
5	821354	0.98	804926.92	0.131761
6	765524	0.99	757868.76	0.124058
7	1361603	1.00	1361603.00	0.222884
8	798807	1.00	798807.00	0.130759
9	673994	1.00	673994.00	0.110328
10	299553	1.00	299553.00	0.049035

	LSM 1	LSM 2	LSM 3	LSM 4	LSM 5	LSM 6	LSM 7	LSM 8	LSM 9	LSM 10	Assumption	P_class\|electrified
class
rural	0.6	0.4	0	0	0	0	0.0	0.0	0.0	0.0	assume 40% of LSM 1&2 living in rural scattered	0.008117
village	0.4	0.6	0	0	0	0	0.0	0.0	0.0	0.0	assume 60% of LSM 1&2 living in rural scattered	0.009898
informal settlement	0.0	0.0	1	1	0	0	0.0	0.0	0.0	0.0	NaN	0.213161
township	0.0	0.0	0	0	1	1	0.0	0.0	0.0	0.0	NaN	0.255818
urban residential 7	0.0	0.0	0	0	0	0	0.6	0.0	0.0	0.0	assume 60% of LSM 7	0.133731
urban townhouse 7&8	0.0	0.0	0	0	0	0	0.4	0.5	0.0	0.0	assume 40% of LSM 7 & 50% of LSM 8	0.154533
urban residential 8&9	0.0	0.0	0	0	0	0	0.0	0.5	0.5	0.0	assume 50% of LSM 8&9	0.120543
urban townhouse 9&10	0.0	0.0	0	0	0	0	0.0	0.0	0.5	0.5	assume 50% of LSM 9&10	0.079681
urban estate	0.0	0.0	0	0	0	0	0.0	0.0	0.0	0.5	assume 50% LSM 10	0.024517

	max income	lsm7low	lsm7high	lsm8low	lsm8high	lsm9low	lsm9high	lsm10low	lsm10high
min income
0	499	1182	705	645	470	35	477	215	0
500	599	2307	858	0	197	0	0	0	0
600	699	1234	752	0	0	0	0	0	0
700	799	1039	293	611	344	150	352	0	0
800	899	444	482	266	1076	564	0	0	0

	lsm7low	lsm7high	lsm8low	lsm8high	lsm9low	lsm9high	lsm10low	lsm10high
(0, 1800]	38858.850000	30827.250000	13914.250000	8510.250000	4709.750000	5159.100000	436.000000	864.750000
(1800, 3200]	63866.550000	34037.750000	25986.950000	12697.150000	8988.350000	6771.500000	2264.000000	1686.050000
(3200, 7800]	238428.200000	225564.100000	129699.700000	107037.700000	87627.200000	47350.000000	20245.500000	6256.700000
(7800, 11600]	191884.300000	187032.100000	147215.300000	126917.300000	131591.500000	112744.900000	52059.600000	22336.200000
(11600, 19116]	115805.900000	133593.800000	126419.800000	141928.600000	160740.400000	146641.300000	80955.500000	50421.300000
(19116, 24500]	25403.480000	42954.040000	66228.000000	59934.760000	71996.760000	85446.720000	66943.560000	51820.000000
(24500, 65500]	15529.337829	32321.574578	31542.578920	74723.937955	114935.653529	143351.479056	146132.109009	187405.798112
(65500, 240700]	1190.661772	231.517567	1354.194022	5387.377032	13146.890404	19332.635553	37030.867331	57937.271106

	lsm7low	lsm7high	lsm8low	lsm8high	lsm9low	lsm9high	lsm10low	lsm10high
(0, 1800]	0.056238	0.044901	0.025655	0.015844	0.007932	0.009102	0.001074	0.002283
(1800, 3200]	0.092431	0.049577	0.047915	0.023639	0.015139	0.011947	0.005575	0.004452
(3200, 7800]	0.345064	0.328541	0.239139	0.199274	0.147586	0.083540	0.049858	0.016520
(7800, 11600]	0.277704	0.272418	0.271434	0.236285	0.221633	0.198916	0.128204	0.058977
(11600, 19116]	0.167600	0.194584	0.233092	0.264232	0.270727	0.258719	0.199365	0.133133
(19116, 24500]	0.036765	0.062564	0.122111	0.111582	0.121260	0.150753	0.164858	0.136826
(24500, 65500]	0.022475	0.047077	0.058158	0.139115	0.193580	0.252915	0.359872	0.494829
(65500, 240700]	0.001723	0.000337	0.002497	0.010030	0.022143	0.034109	0.091194	0.152979