The field of NILM has seen some exciting developments in the recent past. However, with the release of nilmtk, we are aiming to see how well different approaches line up. So, we were interested in bringing one of the seminal work in NILM by George Hart to nilmtk and give it a spin. In this notebook, I'll present a very quick overview of the algorithm and test it out on a home from the WikiEnergy data set.

Customary imports


In [1]:
import numpy as np
import pandas as pd
from os.path import join

from pylab import rcParams
import matplotlib.pyplot as plt
%matplotlib inline

rcParams['figure.figsize'] = (12, 6)

import nilmtk
from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.disaggregate import CombinatorialOptimisation
from nilmtk.utils import print_dict
from nilmtk.metrics import f1_score
import seaborn as sns
sns.set_palette("Set3", n_colors=12)


import warnings
warnings.filterwarnings("ignore")

Version information


In [2]:
from nilmtk.utils import dependencies_diagnostics
pd.DataFrame(pd.Series(dependencies_diagnostics(), name="Value"))


Out[2]:
Value
Date 2014-11-28 20:28:14.025264
Platform Linux-3.13.0-40-generic-x86_64-with-debian-jes...
System version 2.7
nilmtk version 0.2.0.dev
nilm_metadata version 0.2.0.dev-2a1e9f7
numpy version 1.9.0
matplotlib version 1.3.1
pandas version 0.15.0
sklearn version 0.14.1

Loading data


In [3]:
data_dir = '/home/nipun/Downloads/'
we = DataSet(join(data_dir, 'wikienergy.h5'))
print('loaded ' + str(len(we.buildings)) + ' buildings')


loaded 239 buildings

Examine metadata for a single house


In [4]:
building_number = 11
print_dict(we.buildings[building_number].metadata)


  • instance: 11
  • dataset: WikiEnergy
  • original_name: 434

Let us perform our analysis on the first 2 days


In [5]:
we.store.window = TimeFrame(start='2014-04-01 00:00:00-05:00', end='2014-04-02 00:00:00-05:00')
elec = we.buildings[building_number].elec
elec.appliances


Out[5]:
[Appliance(type='fridge', instance=1),
 Appliance(type='dish washer', instance=1),
 Appliance(type='electric water heating appliance', instance=1),
 Appliance(type='spin dryer', instance=1),
 Appliance(type='electric furnace', instance=1),
 Appliance(type='sockets', instance=1),
 Appliance(type='sockets', instance=2),
 Appliance(type='air conditioner', instance=1),
 Appliance(type='sockets', instance=3),
 Appliance(type='sockets', instance=4)]

In [6]:
mains = elec.mains()

Let us quickly have a look at what contribution different appliances have to the mains.


In [7]:
elec.plot()
plt.tight_layout()
plt.ylabel("Power (W)")
plt.xlabel("Time");


From the plot above, a human could easily see the signature of 3 appliances: Fridge, Air conditioner and Electric furnace

We'll hope that our unsupervised implementation would be able to figure out these 3 appliances.

Another way of understanding contribution of different appliances is via the following pie chart

In [8]:
fraction = elec.submeters().fraction_per_meter().dropna()
labels = elec.get_appliance_labels(fraction.index)
plt.figure(figsize=(8,8))
fraction.plot(kind='pie', labels=labels);


1/10 ElecMeter(instance=2, building=11, dataset='WikiEnergy', appliances=[Appliance(type='air conditioner', instance=1)])Using cached result.
Calculating energy for column ('power', 'active')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-683776b415f6> in <module>()
----> 1 fraction = elec.submeters().fraction_per_meter().dropna()
      2 labels = elec.get_appliance_labels(fraction.index)
      3 plt.figure(figsize=(8,8))
      4 fraction.plot(kind='pie', labels=labels);

/home/nipun/git/nilmtk/nilmtk/metergroup.pyc in fraction_per_meter(self, **load_kwargs)
    895         Each value is a float in the range [0,1].
    896         """
--> 897         energy_per_meter = self.energy_per_meter(**load_kwargs).max()
    898         total_energy = energy_per_meter.sum()
    899         return energy_per_meter / total_energy

/home/nipun/git/nilmtk/nilmtk/metergroup.pyc in energy_per_meter(self, **load_kwargs)
    885             print('\r{:d}/{:d} {}'.format(i+1, n_meters, meter), end='')
    886             stdout.flush()
--> 887             meter_energy = meter.total_energy(**load_kwargs)
    888             energy_per_meter[meter.identifier] = meter_energy
    889         return energy_per_meter.dropna(how='all')

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in total_energy(self, **loader_kwargs)
    446         nodes = [Clip, TotalEnergy]
    447         return self._get_stat_from_cache_or_compute(
--> 448             nodes, TotalEnergy.results_class(), loader_kwargs)
    449 
    450     def dropout_rate(self, **loader_kwargs):

/home/nipun/git/nilmtk/nilmtk/elecmeter.pyc in _get_stat_from_cache_or_compute(self, nodes, results_obj, loader_kwargs)
    545 
    546             # Merge cached results with newly computed
--> 547             results_obj.update(computed_result.results)
    548 
    549             # Save to disk newly computed stats

/home/nipun/git/nilmtk/nilmtk/results.pyc in update(self, new_result)
    102                             .format(self.__class__))
    103 
--> 104         self._data = self._data.append(new_result._data, verify_integrity=True)
    105         self._data.sort_index(inplace=True)
    106         self.check_for_overlap()

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
   3780             to_concat = [self, other]
   3781         return concat(to_concat, ignore_index=ignore_index,
-> 3782                       verify_integrity=verify_integrity)
   3783 
   3784     def join(self, other, on=None, how='left', lsuffix='', rsuffix='',

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    720                        keys=keys, levels=levels, names=names,
    721                        verify_integrity=verify_integrity,
--> 722                        copy=copy)
    723     return op.get_result()
    724 

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    852         self.copy = copy
    853 
--> 854         self.new_axes = self._get_new_axes()
    855 
    856     def get_result(self):

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)
    917                 new_axes[i] = ax
    918 
--> 919         new_axes[self.axis] = self._get_concat_axis()
    920         return new_axes
    921 

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_concat_axis(self)
    974                                                   self.levels, self.names)
    975 
--> 976         self._maybe_check_integrity(concat_axis)
    977 
    978         return concat_axis

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _maybe_check_integrity(self, concat_index)
    983                 overlap = concat_index.get_duplicates()
    984                 raise ValueError('Indexes have overlapping values: %s'
--> 985                                 % str(overlap))
    986 
    987 

ValueError: Indexes have overlapping values: <class 'pandas.tseries.index.DatetimeIndex'>
[2014-04-01 05:00:00]
Length: 1, Freq: None, Timezone: None

The air conditioner, fridge and electric furnace together consume around 80% of total energy.

Training

We'll now do the training from the aggregate data. The algorithm segments the time series data into steady and transient states. Thus, we'll first figure out the transient and the steady states. Next, we'll try and pair the on and the off transitions based on their proximity in time and value.


In [9]:
from nilmtk.disaggregate.hart_85 import Hart85
h = Hart85()

In [10]:
h.train(mains)


Finding Edges, please wait ...
Edge detection complete.
Creating transition frame ...
Transition frame created.
Creating states frame ...
States frame created.
Finished.

We obtain the set of steady states as follows (only top 10 entries shown for ease of view)


In [11]:
h.steady_states.head()


Out[11]:
active average
2014-04-01 00:00:00-05:00 755.500000
2014-04-01 00:06:00-05:00 607.500000
2014-04-01 00:09:00-05:00 505.250000
2014-04-01 00:25:00-05:00 358.666667
2014-04-01 00:31:00-05:00 502.090909

The table above shows the steady power states observed in the mains data. Another way of understanding the same is by looking at the start of steady states overlayed on top of mains power consumption, shown as follows.


In [12]:
sns.set_palette("Set1", n_colors=12)
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");


The difference b/w successive steady states gives us the transitions b/w them. The following table shows the obtained transitions (again only top 10 entries shown).


In [13]:
h.transients.head()


Out[13]:
active transition
2014-04-01 00:00:00-05:00 755.500000
2014-04-01 00:06:00-05:00 -148.000000
2014-04-01 00:09:00-05:00 -102.250000
2014-04-01 00:25:00-05:00 -146.583333
2014-04-01 00:31:00-05:00 143.424242

These transitions when seen on the time series would appear as follows:


In [14]:
h.transients['active transition'].plot(style='o')
plt.ylabel("Power (W)")
plt.xlabel("Time");


Next, we pair the on (positive) and the off (negative) transitions as shown in the table below.


In [15]:
h.pair_df.head()


Out[15]:
T1 Time T1 Active T2 Time T2 Active
0 2014-04-01 05:31:00 143.424242 2014-04-01 05:43:00 -145.690909
1 2014-04-01 05:49:00 144.145455 2014-04-01 06:01:00 -138.212121
2 2014-04-01 06:08:00 141.484848 2014-04-01 06:19:00 -145.984848
3 2014-04-01 06:26:00 144.766667 2014-04-01 06:37:00 -145.600000
4 2014-04-01 06:44:00 145.600000 2014-04-01 06:55:00 -143.100000

The following image borrowed from Hart's paper summarises the above discussion.

Once, we obtain these pairings, we consider a particular appliance usage as the average of the magnitude of the on and the off transition. Then, we use a non-parametric clustering method to obtain the number of appliances and the corresponding power draws. This step in our implementation is different from what was originally proposed by George Hart. Once done, we obtain the learnt appliances as follows:


In [16]:
h.centroids


Out[16]:
active
0 146.458708
1 3532.218934
2 911.500000

So, we obtain 3 appliances as per our unsupervised learning algorithm. Based on the power draw, I think I see air conditioner and refrigerator there. What is the third one? Not sure.

Disaggregation

Disaggregation follows much of the same method as training. During disaggregation, we also assign transients from the learnt model.


In [17]:
a = elec.power_series(load_all_power_columns=True).next()

In [18]:
a.index[0]


Out[18]:
Timestamp('2014-04-01 00:00:00-0500', tz='US/Central')

In [19]:
h.transients.index[0]


Out[19]:
Timestamp('2014-04-01 00:00:00-0500', tz='US/Central')

In [21]:
disag_filename = join(data_dir, 'wikienergy-disag.h5')
output = HDFDataStore(disag_filename, 'w')
h.disaggregate(mains, output)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-9275180041eb> in <module>()
      1 disag_filename = join(data_dir, 'wikienergy-disag.h5')
----> 2 output = HDFDataStore(disag_filename, 'w')
      3 h.disaggregate(mains, output)

/home/nipun/git/nilmtk/nilmtk/datastore.pyc in __init__(self, filename, mode)
     67             File open mode.  e.g. 'a', 'r' or 'w'
     68         """
---> 69         self.store = pd.HDFStore(filename, mode, complevel=9, complib='blosc')
     70         super(HDFDataStore, self).__init__()
     71 

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in __init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    421         self._fletcher32 = fletcher32
    422         self._filters = None
--> 423         self.open(mode=mode, **kwargs)
    424 
    425     @property

/home/nipun/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in open(self, mode, **kwargs)
    572                                                                                     hdf_version=tables.get_hdf5_version()))
    573 
--> 574             raise e
    575 
    576         except (Exception) as e:

ValueError: The file '/home/nipun/Downloads/wikienergy-disag.h5' is already opened.  Please close it before reopening in write mode.

In [22]:
h.schunk[0]


Out[22]:
localminute
2014-04-01 00:00:00-05:00   -1
2014-04-01 00:01:00-05:00   -1
2014-04-01 00:02:00-05:00   -1
2014-04-01 00:03:00-05:00   -1
2014-04-01 00:04:00-05:00   -1
2014-04-01 00:05:00-05:00   -1
2014-04-01 00:06:00-05:00    0
2014-04-01 00:07:00-05:00   -1
2014-04-01 00:08:00-05:00   -1
2014-04-01 00:09:00-05:00    0
2014-04-01 00:10:00-05:00   -1
2014-04-01 00:11:00-05:00   -1
2014-04-01 00:12:00-05:00   -1
2014-04-01 00:13:00-05:00   -1
2014-04-01 00:14:00-05:00   -1
...
2014-04-01 23:45:00-05:00   -1
2014-04-01 23:46:00-05:00   -1
2014-04-01 23:47:00-05:00   -1
2014-04-01 23:48:00-05:00   -1
2014-04-01 23:49:00-05:00   -1
2014-04-01 23:50:00-05:00   -1
2014-04-01 23:51:00-05:00   -1
2014-04-01 23:52:00-05:00   -1
2014-04-01 23:53:00-05:00   -1
2014-04-01 23:54:00-05:00   -1
2014-04-01 23:55:00-05:00   -1
2014-04-01 23:56:00-05:00   -1
2014-04-01 23:57:00-05:00   -1
2014-04-01 23:58:00-05:00   -1
2014-04-01 23:59:00-05:00   -1
Name: 0, Length: 1440

In [23]:
plt.plot(h.po[2])


Out[23]:
[<matplotlib.lines.Line2D at 0x7f51f6ac2dd0>]

In [24]:
a.index.values[0]


Out[24]:
numpy.datetime64('2014-04-01T01:00:00.000000000-0400')

In [25]:
cc/np.timedelta64(1,'ns')


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-47fb4f3d98bf> in <module>()
----> 1 cc/np.timedelta64(1,'ns')

NameError: name 'cc' is not defined

In [26]:
h.transients.index.tz


Out[26]:
<DstTzInfo 'US/Central' LMT-1 day, 18:09:00 STD>

In [27]:
output.close()

Comparison with ground truth

Since we used an unsupervised approach, we don't have labels (as of yet) for the learnt appliances. Thus, we can't directly use accuracy metrics directly (as of yet, maybe in a newer version we can). I'll compare the power draw using our disaggregation with ground truth manually.


In [28]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[building_number].elec

Comparing for Fridge


In [29]:
ax1 = disag_elec['unknown', 0].plot()
ax2 = elec['fridge', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [21]:
ax1 = disag_elec['unknown', 0].plot()
ax2 = elec['fridge', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");


Comparing for air conditioner


In [30]:
ax1 = disag_elec['unknown', 1].plot()
ax2 = elec['air conditioner', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [22]:
ax1 = disag_elec['unknown', 1].plot()
ax2 = elec['air conditioner', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");


While the timing looks impeccable, there seems to be some ambiguity in the power measurement :(

Comparing for the unknown appliance


In [31]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['electric furnace', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");



In [23]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['electric furnace', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");


So, this doesn't look to be a good fit. Maybe, the dish washer?


In [24]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['dish washer', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");


Probably, a decent fit. Not really sure though!

Broadly speaking, this 30 year algorithm seems to be doing a fairly decent job. The fact that this is unsupervised makes it all the more useful. There is some more work to be done on this algorithm. So, watch out for more posts and updates. Looking forward to your comments and feedback.