The field of NILM has seen some exciting developments in the recent past. However, with the release of nilmtk, we are aiming to see how well different approaches line up. So, we were interested in bringing one of the seminal work in NILM by George Hart to nilmtk and give it a spin. In this notebook, I'll present a very quick overview of the algorithm and test it out on a home from the WikiEnergy data set.

Customary imports



In [1]:

    
import numpy as np
import pandas as pd
from os.path import join

from pylab import rcParams
import matplotlib.pyplot as plt
%matplotlib inline

rcParams['figure.figsize'] = (12, 6)

import nilmtk
from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.disaggregate import CombinatorialOptimisation
from nilmtk.utils import print_dict
from nilmtk.metrics import f1_score
import seaborn as sns
sns.set_palette("Set3", n_colors=12)


import warnings
warnings.filterwarnings("ignore")









    



/home/nipun/git/nilmtk/nilmtk/elecmeter.py:4: DeprecationWarning: The compiler package is deprecated and removed in Python 3.x.
  from compiler.ast import flatten

Version information



In [5]:

Loading data



In [6]:

    
data_dir = '/home/nipun/Copy/Dataset/'
we = DataSet(join(data_dir, 'iawe.h5'))
print('loaded ' + str(len(we.buildings)) + ' buildings')









    



loaded 1 buildings

Examine metadata for a single house



In [7]:

    
building_number = 1
print_dict(we.buildings[building_number].metadata)









    




instance: 1
dataset: iAWE
original_name: house_1

Let us perform our analysis on the first 2 days



In [8]:

    
we.store.window = TimeFrame(start='2013-05-24 00:00:00+05:30', end='2013-05-26 00:00:00+05:30')
elec = we.buildings[building_number].elec
elec.appliances









    Out[8]:





[Appliance(type='fridge', instance=1),
 Appliance(type='unknown', instance=1),
 Appliance(type='washing machine', instance=1),
 Appliance(type='clothes iron', instance=1),
 Appliance(type='television', instance=1),
 Appliance(type='wet appliance', instance=1),
 Appliance(type='air conditioner', instance=1),
 Appliance(type='air conditioner', instance=2),
 Appliance(type='motor', instance=1),
 Appliance(type='computer', instance=1)]



In [9]:

    
mains = elec.mains()

Let us quickly have a look at what contribution different appliances have to the mains.



In [12]:

    
mains.plot()









    Out[12]:





<matplotlib.axes.AxesSubplot at 0x7f6caf88c810>



In [10]:

    
elec.plot()
plt.tight_layout()
plt.ylabel("Power (W)")
plt.xlabel("Time");

From the plot above, a human could easily see the signature of 3 appliances: Fridge, Air conditioner and Electric furnace

We'll hope that our unsupervised implementation would be able to figure out these 3 appliances.

Another way of understanding contribution of different appliances is via the following pie chart



In [13]:

    
fraction = elec.submeters().fraction_per_meter().dropna()
labels = elec.get_appliance_labels(fraction.index)
plt.figure(figsize=(8,8))
fraction.plot(kind='pie', labels=labels);









    



1/10 ElecMeter(instance=3, building=1, dataset='iAWE', appliances=[Appliance(type='fridge', instance=1)])





    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-683776b415f6> in <module>()
----> 1 fraction = elec.submeters().fraction_per_meter().dropna()
      2 labels = elec.get_appliance_labels(fraction.index)
      3 plt.figure(figsize=(8,8))
      4 fraction.plot(kind='pie', labels=labels);

/home/nipun/git/nilmtk/nilmtk/metergroup.pyc in fraction_per_meter(self, **load_kwargs)
   1003         Each value is a float in the range [0,1].
   1004         """
-> 1005         energy_per_meter = self.energy_per_meter(**load_kwargs).max()
   1006         total_energy = energy_per_meter.sum()
   1007         return energy_per_meter / total_energy

/home/nipun/git/nilmtk/nilmtk/metergroup.pyc in energy_per_meter(self, **load_kwargs)
    993             print('\r{:d}/{:d} {}'.format(i+1, n_meters, meter), end='')
    994             stdout.flush()
--> 995             meter_energy = meter.total_energy(**load_kwargs)
    996             energy_per_meter[meter.identifier] = meter_energy
    997         return energy_per_meter.dropna(how='all')

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in total_energy(self, **loader_kwargs)
    439         nodes = [Clip, TotalEnergy]
    440         return self._get_stat_from_cache_or_compute(
--> 441             nodes, TotalEnergy.results_class(), loader_kwargs)
    442 
    443     def dropout_rate(self, **loader_kwargs):

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in _get_stat_from_cache_or_compute(self, nodes, results_obj, loader_kwargs)
    519         if loader_kwargs.get('preprocessing') is None:
    520             cached_stat = self.get_cached_stat(key_for_cached_stat)
--> 521             results_obj.import_from_cache(cached_stat, sections)
    522 
    523             # Get sections_to_compute

/home/nipun/git/nilmtk/nilmtk/results.pyc in import_from_cache(self, cached_stat, sections)
    151         for section in sections:
    152             try:
--> 153                 rows_matching_start = cached_stat.loc[section.start]
    154             except KeyError:
    155                 pass

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1192             return self._getitem_tuple(key)
   1193         else:
-> 1194             return self._getitem_axis(key, axis=0)
   1195 
   1196     def _getitem_axis(self, key, axis=0):

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1335 
   1336         # fall thru to straight lookup
-> 1337         self._has_valid_type(key, axis)
   1338         return self._get_label(key, axis=axis)
   1339 

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _has_valid_type(self, key, axis)
   1297                 raise
   1298             except:
-> 1299                 error()
   1300 
   1301         return True

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in error()
   1282                 if isnull(key):
   1283                     raise ValueError(
-> 1284                         "cannot use label indexing with a null key")
   1285                 raise KeyError("the label [%s] is not in the [%s]" %
   1286                                (key, self.obj._get_axis_name(axis)))

ValueError: cannot use label indexing with a null key

The air conditioner, fridge and electric furnace together consume around 80% of total energy.

Training

We'll now do the training from the aggregate data. The algorithm segments the time series data into steady and transient states. Thus, we'll first figure out the transient and the steady states. Next, we'll try and pair the on and the off transitions based on their proximity in time and value.



In [14]:

    
from nilmtk.disaggregate.hart_85 import Hart85
h = Hart85()



In [15]:

    
h.train(mains)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-75ddc06f7ce8> in <module>()
----> 1 h.train(mains)

/home/nipun/git/nilmtk/nilmtk/disaggregate/hart_85.py in train(self, metergroup, cluster_features, bsize, minTol)
    224 
    225         """
--> 226         [self.steady_states, self.transients] = find_steady_states_transients(metergroup)
    227 
    228         self.pair_df = self.pair(bsize, minTol)

/home/nipun/git/nilmtk/nilmtk/feature_detectors/steady_states.py in find_steady_states_transients(metergroup)
     14     transients_list = []
     15 
---> 16     for power_df in metergroup.power_series(load_all_power_columns=True):
     17         if len(power_df.columns) <= 2:
     18             # Use whatever is available

/home/nipun/git/nilmtk/nilmtk/electric.pyc in power_series(self, **kwargs)
    491         # Pull data through preprocessing pipeline
    492         generator = self.load(**kwargs)
--> 493         for chunk in generator:
    494             chunk_to_yield = chunk.icol(0).dropna()
    495             chunk_to_yield.timeframe = getattr(chunk, 'timeframe', None)

/home/nipun/git/nilmtk/nilmtk/metergroup.pyc in load(self, **kwargs)
    565         """
    566         # Get a list of generators
--> 567         generators = [meter.load(**kwargs) for meter in self.meters]
    568 
    569         # Load each generator and yield the sum

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in load(self, **kwargs)
    400         # Get source node
    401         preprocessing = kwargs.pop('preprocessing', [])
--> 402         last_node = self.get_source_node(**kwargs)
    403         generator = last_node.generator
    404 

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in get_source_node(self, **loader_kwargs)
    421             raise RuntimeError(
    422                 "Cannot get source node if meter.store is None!")
--> 423         generator = self.store.load(key=self.key, **loader_kwargs)
    424         self.metadata['device'] = self.device
    425         return Node(self, generator=generator)

/home/nipun/git/nilmtk/nilmtk/docinherit.pyc in f(*args, **kwargs)
     44         @wraps(self.mthd, assigned=('__name__','__module__'))
     45         def f(*args, **kwargs):
---> 46             return self.mthd(obj, *args, **kwargs)
     47 
     48         return self.use_parent_doc(f, overridden)

TypeError: load() got an unexpected keyword argument 'load_all_power_columns'

We obtain the set of steady states as follows (only top 10 entries shown for ease of view)



In [12]:

    
h.steady_states.head()









    Out[12]:






  
    
      
      active average
    
  
  
    
      2014-04-01 00:00:00-05:00
       755.500000
    
    
      2014-04-01 00:06:00-05:00
       607.500000
    
    
      2014-04-01 00:09:00-05:00
       505.250000
    
    
      2014-04-01 00:25:00-05:00
       358.666667
    
    
      2014-04-01 00:31:00-05:00
       502.090909

The table above shows the steady power states observed in the mains data. Another way of understanding the same is by looking at the start of steady states overlayed on top of mains power consumption, shown as follows.



In [13]:

    
sns.set_palette("Set1", n_colors=12)
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");

The difference b/w successive steady states gives us the transitions b/w them. The following table shows the obtained transitions (again only top 10 entries shown).



In [14]:

    
h.transients.head()









    Out[14]:






  
    
      
      active transition
    
  
  
    
      2014-04-01 00:00:00-05:00
       755.500000
    
    
      2014-04-01 00:06:00-05:00
      -148.000000
    
    
      2014-04-01 00:09:00-05:00
      -102.250000
    
    
      2014-04-01 00:25:00-05:00
      -146.583333
    
    
      2014-04-01 00:31:00-05:00
       143.424242

These transitions when seen on the time series would appear as follows:



In [15]:

    
h.transients['active transition'].plot(style='o')
plt.ylabel("Power (W)")
plt.xlabel("Time");

Next, we pair the on (positive) and the off (negative) transitions as shown in the table below.



In [16]:

    
h.pair_df.head()









    Out[16]:






  
    
      
      T1 Time
      T1 Active
      T2 Time
      T2 Active
    
  
  
    
      0
      2014-04-01 05:31:00
       143.424242
      2014-04-01 05:43:00
      -145.690909
    
    
      1
      2014-04-01 05:49:00
       144.145455
      2014-04-01 06:01:00
      -138.212121
    
    
      2
      2014-04-01 06:08:00
       141.484848
      2014-04-01 06:19:00
      -145.984848
    
    
      3
      2014-04-01 06:26:00
       144.766667
      2014-04-01 06:37:00
      -145.600000
    
    
      4
      2014-04-01 06:44:00
       145.600000
      2014-04-01 06:55:00
      -143.100000

The following image borrowed from Hart's paper summarises the above discussion.

Once, we obtain these pairings, we consider a particular appliance usage as the average of the magnitude of the on and the off transition. Then, we use a non-parametric clustering method to obtain the number of appliances and the corresponding power draws. This step in our implementation is different from what was originally proposed by George Hart. Once done, we obtain the learnt appliances as follows:



In [17]:

    
h.centroids









    Out[17]:






  
    
      
      active
    
  
  
    
      0
        146.458708
    
    
      1
       3532.218934
    
    
      2
        911.500000

So, we obtain 3 appliances as per our unsupervised learning algorithm. Based on the power draw, I think I see air conditioner and refrigerator there. What is the third one? Not sure.

Disaggregation

Disaggregation follows much of the same method as training. During disaggregation, we also assign transients from the learnt model.



In [18]:

    
disag_filename = join(data_dir, 'wikienergy-disag.h5')
output = HDFDataStore(disag_filename, 'w')
h.disaggregate(mains, output)









    



Finding Edges, please wait ...
Edge detection complete.
Creating transition frame ...
Transition frame created.
Creating states frame ...
States frame created.
Finished.
States done
0
1
2



In [19]:

    
output.close()

Comparison with ground truth

Since we used an unsupervised approach, we don't have labels (as of yet) for the learnt appliances. Thus, we can't directly use accuracy metrics directly (as of yet, maybe in a newer version we can). I'll compare the power draw using our disaggregation with ground truth manually.



In [20]:

    
disag = DataSet(disag_filename)
disag_elec = disag.buildings[building_number].elec

Comparing for Fridge



In [21]:

    
ax1 = disag_elec['unknown', 0].plot()
ax2 = elec['fridge', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");

Comparing for air conditioner



In [22]:

    
ax1 = disag_elec['unknown', 1].plot()
ax2 = elec['air conditioner', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");

While the timing looks impeccable, there seems to be some ambiguity in the power measurement :(

Comparing for the unknown appliance



In [23]:

    
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['electric furnace', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");

So, this doesn't look to be a good fit. Maybe, the dish washer?



In [24]:

    
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['dish washer', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");

Probably, a decent fit. Not really sure though!

Broadly speaking, this 30 year algorithm seems to be doing a fairly decent job. The fact that this is unsupervised makes it all the more useful. There is some more work to be done on this algorithm. So, watch out for more posts and updates. Looking forward to your comments and feedback.

	active average
2014-04-01 00:00:00-05:00	755.500000
2014-04-01 00:06:00-05:00	607.500000
2014-04-01 00:09:00-05:00	505.250000
2014-04-01 00:25:00-05:00	358.666667
2014-04-01 00:31:00-05:00	502.090909

	T1 Time	T1 Active	T2 Time	T2 Active
0	2014-04-01 05:31:00	143.424242	2014-04-01 05:43:00	-145.690909
1	2014-04-01 05:49:00	144.145455	2014-04-01 06:01:00	-138.212121
2	2014-04-01 06:08:00	141.484848	2014-04-01 06:19:00	-145.984848
3	2014-04-01 06:26:00	144.766667	2014-04-01 06:37:00	-145.600000
4	2014-04-01 06:44:00	145.600000	2014-04-01 06:55:00	-143.100000