Disaggregation - Hart Active and Reactive data

Customary imports


In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
from os.path import join
from pylab import rcParams
import matplotlib.pyplot as plt
rcParams['figure.figsize'] = (13, 6)
plt.style.use('ggplot')
#import nilmtk
from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.disaggregate.hart_85 import Hart85
from nilmtk.disaggregate import CombinatorialOptimisation
from nilmtk.utils import print_dict, show_versions
from nilmtk.metrics import f1_score
#import seaborn as sns
#sns.set_palette("Set3", n_colors=12)

import warnings
warnings.filterwarnings("ignore") #suppress warnings, comment out if warnings required

Show versions for any diagnostics


In [2]:
#uncomment if required
#show_versions()

Load dataset


In [3]:
data_dir = '/Users/GJWood/nilm_gjw_data/HDF5/'
gjw = DataSet(join(data_dir, 'nilm_gjw_data.hdf5'))
print('loaded ' + str(len(gjw.buildings)) + ' buildings')
building_number=1


loaded 1 buildings

Period of interest 4 days during holiday

No human activity so all readings should be due to periodic automatic running of appliances such as fridge, freezer, central heating pump, shower pump (due to pressure loss)


In [4]:
gjw.set_window('2015-07-12 00:00:00', '2015-07-16 00:00:00')
elec = gjw.buildings[building_number].elec
mains = elec.mains()
house = elec['fridge'] #only one meter so any selection will do
df = house.load().next() #load the first chunk of data into a dataframe
df.info() #check that the data is what we want (optional)
#note the data has two columns and a time index


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 345600 entries, 2015-07-12 00:00:00+01:00 to 2015-07-15 23:59:59+01:00
Data columns (total 2 columns):
(power, reactive)    345600 non-null float32
(power, active)      345600 non-null float32
dtypes: float32(2)
memory usage: 5.3 MB

In [5]:
plotdata = df.ix['2015-07-12 00:00:00': '2015-07-16 00:00:00']
plotdata.plot()
plt.title("Raw Mains Usage")
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [6]:
plt.scatter(plotdata[('power','active')],plotdata[('power','reactive')])
plt.title("Raw Mains Usage Signature Space")
plt.ylabel("Reactive Power (VAR)")
plt.xlabel("Active Power (W)");


Training

We'll now do the training from the aggregate data. The algorithm segments the time series data into steady and transient states. Thus, we'll first figure out the transient and the steady states. Next, we'll try and pair the on and the off transitions based on their proximity in time and value.


In [17]:
h = Hart85()
h.train(mains,cols=[('power','active'),('power','reactive')],min_tolerance=100,noise_level=70,buffer_size=20,state_threshold=15)


Finding Edges, please wait ...
Edge detection complete.
Creating transition frame ...
Transition frame created.
Creating states frame ...
States frame created.
Finished.

In [21]:
plt.scatter(h.steady_states[('active average')],h.steady_states[('reactive average')])
plt.scatter(h.centroids[('power','active')],h.centroids[('power','reactive')],marker='x',c=(1.0, 0.0, 0.0))
plt.legend(['Steady states','Centroids'],loc=4)
plt.title("Training steady states Signature space")
plt.ylabel("Reactive average (VAR)")
plt.xlabel("Active average (W)");



In [23]:
h.steady_states.head()


Out[23]:
active average reactive average
2015-07-12 00:24:28+01:00 157.000000 104.019802
2015-07-12 04:09:56+01:00 208.989744 128.000000
2015-07-12 06:05:30+01:00 242.000000 120.400000
2015-07-12 08:44:40+01:00 180.000000 102.000000
2015-07-12 12:26:03+01:00 201.000000 124.998945

In [24]:
h.steady_states.tail()


Out[24]:
active average reactive average
2015-07-15 15:43:29+01:00 170.993902 110.000000
2015-07-15 18:46:54+01:00 90.000000 125.000000
2015-07-15 19:53:21+01:00 191.000000 117.996146
2015-07-15 20:10:39+01:00 93.000000 92.004858
2015-07-15 22:00:31+01:00 282.000000 151.000000

In [25]:
h.centroids


Out[25]:
(power, active) (power, reactive)
0 87.060310 5.926049
1 240.932203 0.000000
2 39.000000 37.446809
3 446.002135 21.014965

In [26]:
h.model


Out[26]:
{}

In [27]:
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");
#plt.show()


Out[27]:
<matplotlib.text.Text at 0x22249e48>

In [14]:
h.pair_df.head()


Out[14]:
T1 Time T1 Active T1 Reactive T2 Time T2 Active T2 Reactive
0 2015-07-12 11:26:03 105.000000 1.993097 2015-07-12 22:53:11 -74.967089 -35.964674
1 2015-07-12 17:51:32 73.666667 0.000000 2015-07-13 00:16:57 -80.000000 0.000000
2 2015-07-13 06:21:31 123.400000 -1.600000 2015-07-13 06:37:32 -95.167510 0.000000
3 2015-07-13 06:38:13 71.073171 0.000000 2015-07-13 06:38:31 -74.000000 0.000000
4 2015-07-13 02:58:28 86.944444 0.000000 2015-07-13 10:02:29 -71.838499 -23.829391

In [15]:
pair_shape_df = pd.DataFrame(columns=['Height','Duration'])
pair_shape_df['Height']= (h.pair_df['T1 Active'].abs()+h.pair_df['T2 Active'].abs())/2
pair_shape_df['Duration']= pd.to_timedelta(h.pair_df['T2 Time']-h.pair_df['T1 Time'],unit='s').dt.seconds
pair_shape_df.head()


Out[15]:
Height Duration
0 89.983544 41228
1 76.833333 23125
2 109.283755 961
3 72.536585 18
4 79.391472 25441

In [16]:
fig = plt.figure(figsize=(13,6))
ax = fig.add_subplot(1, 1, 1)
ax.set_yscale('log')
ax.scatter(pair_shape_df['Height'],pair_shape_df['Duration'])
#plt.plot((x1, x2), (y1, y2), 'k-')
ax.plot((h.centroids[('power','active')],
        h.centroids[('power','active')]),
        (h.centroids[('power','active')]*0,
         h.centroids[('power','active')]*0+10000)
         ,marker='x',c=(0.0, 0.0, 0.0))
#ax.axvline(h.centroids[('power','active')], color='k', linestyle='--')
plt.legend(['Transitions','Centroids'],loc=1)
plt.title("Paired event -  Signature Space")
plt.ylabel("Log Duration (sec)")
plt.xlabel("Transition (W)");


Set two days for Disaggregation period of interest

Inspect the data during a quiet period when we were on holiday, should only be autonomous appliances such as fidge, freeze and water heating + any standby devices not unplugged.


In [15]:
gjw.set_window('2015-07-13 00:00:00','2015-07-14 00:00:00')
elec = gjw.buildings[building_number].elec
mains = elec.mains()
mains.plot()


Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x212b9b38>

Disaggregate using Hart (Active data only)


In [16]:
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [17]:
disag_filename = join(data_dir, 'disag_gjw_hart.hdf5')
output = HDFDataStore(disag_filename, 'w')
h.disaggregate(mains,output,sample_period=1)
output.close()


Finding Edges, please wait ...
Edge detection complete.
Creating transition frame ...
Transition frame created.
Creating states frame ...
States frame created.
Finished.
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-17-15c16bb68109> in <module>()
      1 disag_filename = join(data_dir, 'disag_gjw_hart.hdf5')
      2 output = HDFDataStore(disag_filename, 'w')
----> 3 h.disaggregate(mains,output,sample_period=1)
      4 output.close()

c:\users\gjwood\nilmtk\nilmtk\disaggregate\hart_85.pyc in disaggregate(self, mains, output_datastore, **load_kwargs)
    430             measurement = chunk.name
    431             power_df = self.disaggregate_chunk(
--> 432                 chunk, prev, transients)
    433 
    434             cols = pd.MultiIndex.from_tuples([chunk.name])

c:\users\gjwood\nilmtk\nilmtk\disaggregate\hart_85.pyc in disaggregate_chunk(self, chunk, prev, transients)
    313         prev = states.iloc[-1].to_dict()
    314         power_chunk_dict = self.assign_power_from_states(states, prev)
--> 315         return pd.DataFrame(power_chunk_dict, index=chunk.index)
    316 
    317     def assign_power_from_states(self, states_chunk, prev):

c:\Users\GJWood\Anaconda\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
    211                                  dtype=dtype, copy=copy)
    212         elif isinstance(data, dict):
--> 213             mgr = self._init_dict(data, index, columns, dtype=dtype)
    214         elif isinstance(data, ma.MaskedArray):
    215             import numpy.ma.mrecords as mrecords

c:\Users\GJWood\Anaconda\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype)
    338 
    339         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 340                               dtype=dtype)
    341 
    342     def _init_ndarray(self, values, index, columns, dtype=None,

c:\Users\GJWood\Anaconda\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   4788 
   4789     # don't force copy because getting jammed in an ndarray anyway
-> 4790     arrays = _homogenize(arrays, index, dtype)
   4791 
   4792     # from BlockManager perspective

c:\Users\GJWood\Anaconda\lib\site-packages\pandas\core\frame.pyc in _homogenize(data, index, dtype)
   5102 
   5103             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5104                                 raise_cast_failure=False)
   5105 
   5106         homogenized.append(v)

c:\Users\GJWood\Anaconda\lib\site-packages\pandas\core\series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2711     elif subarr.ndim > 1:
   2712         if isinstance(data, np.ndarray):
-> 2713             raise Exception('Data must be 1-dimensional')
   2714         else:
   2715             subarr = _asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

In [18]:
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [18]:
disag_hart = DataSet(disag_filename)
disag_hart


Out[18]:
<nilmtk.dataset.DataSet at 0x2318c908>

In [19]:
disag_hart_elec = disag_hart.buildings[building_number].elec
disag_hart_elec


Out[19]:
MeterGroup(meters=
  ElecMeter(instance=1, building=1, dataset='Hart85_2015-10-07T11:51:36', site_meter, appliances=[])
  ElecMeter(instance=2, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=0)])
  ElecMeter(instance=3, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=1)])
  ElecMeter(instance=4, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=2)])
  ElecMeter(instance=5, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=3)])
  ElecMeter(instance=6, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=4)])
  ElecMeter(instance=7, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=5)])
  ElecMeter(instance=8, building=1, dataset='Hart85_2015-10-07T11:51:36', appliances=[Appliance(type='unknown', instance=6)])
)

In [20]:
disag_hart_elec.mains()


Out[20]:
ElecMeter(instance=1, building=1, dataset='Hart85_2015-10-07T11:51:36', site_meter, appliances=[])

In [21]:
h.centroids


Out[21]:
(power, active)
0 164.517390
1 2084.002353
2 2712.309782
3 1276.028081
4 3335.258491
5 4755.483947
6 4078.086616

In [22]:
h.model


Out[22]:
{}

In [25]:
h.steady_states


Out[25]:
active average
2015-05-01 00:13:32+01:00 2739.000000
2015-05-01 00:15:24+01:00 1029.034091
2015-05-01 00:26:15+01:00 920.000000
2015-05-01 00:36:18+01:00 645.333333
2015-05-01 00:55:06+01:00 3875.000000
2015-05-01 01:00:30+01:00 703.000000
2015-05-01 01:00:59+01:00 2856.000000
2015-05-01 01:02:26+01:00 3067.318182
2015-05-01 01:06:50+01:00 1009.000000
2015-05-01 01:07:40+01:00 4503.000000
2015-05-01 01:08:10+01:00 5414.000000
2015-05-01 01:11:06+01:00 1014.000000
2015-05-01 01:11:45+01:00 3129.000000
2015-05-01 01:13:57+01:00 995.000000
2015-05-01 01:17:02+01:00 2760.796875
2015-05-01 01:19:30+01:00 695.000000
2015-05-01 01:21:27+01:00 2808.000000
2015-05-01 01:23:37+01:00 656.900000
2015-05-01 01:25:10+01:00 611.000000
2015-05-01 01:25:20+01:00 2695.000000
2015-05-01 01:25:27+01:00 2784.000000
2015-05-01 01:25:31+01:00 2700.000000
2015-05-01 01:26:56+01:00 551.000000
2015-05-01 02:43:35+01:00 477.600000
2015-05-01 03:02:11+01:00 513.000000
2015-05-01 04:02:04+01:00 406.000000
2015-05-01 04:02:13+01:00 453.277778
2015-05-01 04:02:31+01:00 378.333333
2015-05-01 04:15:14+01:00 510.117188
2015-05-01 04:20:32+01:00 426.000000
... ...
2015-08-30 19:27:17+01:00 227.000000
2015-08-30 19:31:13+01:00 2477.000000
2015-08-30 19:34:22+01:00 325.777778
2015-08-30 19:40:54+01:00 330.500000
2015-08-30 19:42:25+01:00 2437.000000
2015-08-30 19:43:00+01:00 257.000000
2015-08-30 19:56:53+01:00 267.979487
2015-08-30 20:24:43+01:00 457.538462
2015-08-30 20:25:56+01:00 512.666667
2015-08-30 20:26:00+01:00 434.000000
2015-08-30 20:27:34+01:00 526.000000
2015-08-30 20:40:35+01:00 452.000000
2015-08-30 20:42:51+01:00 527.312500
2015-08-30 20:43:07+01:00 453.936170
2015-08-30 20:47:44+01:00 332.976378
2015-08-30 21:05:11+01:00 418.000000
2015-08-30 21:25:49+01:00 398.000000
2015-08-30 21:49:17+01:00 450.000000
2015-08-30 21:50:47+01:00 3221.000000
2015-08-30 21:56:41+01:00 396.983806
2015-08-30 22:36:04+01:00 399.000000
2015-08-30 22:46:16+01:00 494.000000
2015-08-30 22:55:14+01:00 543.953782
2015-08-30 22:59:12+01:00 445.000000
2015-08-30 23:21:36+01:00 765.000000
2015-08-30 23:22:45+01:00 439.000000
2015-08-30 23:48:23+01:00 327.000000
2015-08-30 23:53:13+01:00 3131.500000
2015-08-30 23:53:28+01:00 4484.500000
2015-08-30 23:55:10+01:00 424.789474

55488 rows × 1 columns


In [17]:
from nilmtk.metrics import f1_score
f1_hart= f1_score(disag_hart_elec, test_elec)
f1_hart.index = disag_hart_elec.get_labels(f1_hart.index)
f1_hart.plot(kind='barh')
plt.ylabel('appliance');
plt.xlabel('f-score');
plt.title("Hart");


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-17-631f0a07efc5> in <module>()
      1 from nilmtk.metrics import f1_score
----> 2 f1_hart= f1_score(disag_hart_elec, test_elec)
      3 f1_hart.index = disag_hart_elec.get_labels(f1_hart.index)
      4 f1_hart.plot(kind='barh')
      5 plt.ylabel('appliance');

NameError: name 'test_elec' is not defined

In [ ]: