The field of NILM has seen some exciting developments in the recent past. However, with the release of nilmtk, we are aiming to see how well different approaches line up. So, we were interested in bringing one of the seminal work in NILM by George Hart to nilmtk and give it a spin. In this notebook, I'll present a very quick overview of the algorithm and test it out on a home from the WikiEnergy data set.
In [1]:
import numpy as np
import pandas as pd
from os.path import join
from pylab import rcParams
import matplotlib.pyplot as plt
%matplotlib inline
rcParams['figure.figsize'] = (12, 6)
import nilmtk
from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.disaggregate import CombinatorialOptimisation
from nilmtk.utils import print_dict
from nilmtk.metrics import f1_score
import seaborn as sns
sns.set_palette("Set3", n_colors=12)
import warnings
warnings.filterwarnings("ignore")
In [2]:
from nilmtk.utils import dependencies_diagnostics
pd.DataFrame(pd.Series(dependencies_diagnostics(), name="Value"))
Out[2]:
In [3]:
data_dir = '/home/nipun/Downloads/'
we = DataSet(join(data_dir, 'wikienergy.h5'))
print('loaded ' + str(len(we.buildings)) + ' buildings')
In [4]:
building_number = 11
print_dict(we.buildings[building_number].metadata)
In [5]:
we.store.window = TimeFrame(start='2014-04-01 00:00:00-05:00', end='2014-04-02 00:00:00-05:00')
elec = we.buildings[building_number].elec
elec.appliances
Out[5]:
In [6]:
mains = elec.mains()
Let us quickly have a look at what contribution different appliances have to the mains.
In [7]:
elec.plot()
plt.tight_layout()
plt.ylabel("Power (W)")
plt.xlabel("Time");
From the plot above, a human could easily see the signature of 3 appliances: Fridge
, Air conditioner
and Electric furnace
We'll hope that our unsupervised implementation would be able to figure out these 3 appliances.
In [8]:
fraction = elec.submeters().fraction_per_meter().dropna()
labels = elec.get_appliance_labels(fraction.index)
plt.figure(figsize=(8,8))
fraction.plot(kind='pie', labels=labels);
The air conditioner, fridge and electric furnace together consume around 80% of total energy.
We'll now do the training from the aggregate data. The algorithm segments the time series data into steady and transient states. Thus, we'll first figure out the transient and the steady states. Next, we'll try and pair the on and the off transitions based on their proximity in time and value.
In [9]:
from nilmtk.disaggregate.hart_85 import Hart85
h = Hart85()
In [10]:
h.train(mains)
We obtain the set of steady states as follows (only top 10 entries shown for ease of view)
In [11]:
h.steady_states.head()
Out[11]:
The table above shows the steady power states observed in the mains data. Another way of understanding the same is by looking at the start of steady states overlayed on top of mains power consumption, shown as follows.
In [12]:
sns.set_palette("Set1", n_colors=12)
ax = mains.plot()
h.steady_states['active average'].plot(style='o', ax = ax);
plt.ylabel("Power (W)")
plt.xlabel("Time");
The difference b/w successive steady states gives us the transitions b/w them. The following table shows the obtained transitions (again only top 10 entries shown).
In [13]:
h.transients.head()
Out[13]:
These transitions when seen on the time series would appear as follows:
In [14]:
h.transients['active transition'].plot(style='o')
plt.ylabel("Power (W)")
plt.xlabel("Time");
Next, we pair the on (positive) and the off (negative) transitions as shown in the table below.
In [15]:
h.pair_df.head()
Out[15]:
The following image borrowed from Hart's paper summarises the above discussion.
Once, we obtain these pairings, we consider a particular appliance usage as the average of the magnitude of the on and the off transition. Then, we use a non-parametric clustering method to obtain the number of appliances and the corresponding power draws. This step in our implementation is different from what was originally proposed by George Hart. Once done, we obtain the learnt appliances as follows:
In [16]:
h.centroids
Out[16]:
So, we obtain 3 appliances as per our unsupervised learning algorithm. Based on the power draw, I think I see air conditioner and refrigerator there. What is the third one? Not sure.
Disaggregation follows much of the same method as training. During disaggregation, we also assign transients from the learnt model.
In [17]:
a = elec.power_series(load_all_power_columns=True).next()
In [18]:
a.index[0]
Out[18]:
In [19]:
h.transients.index[0]
Out[19]:
In [21]:
disag_filename = join(data_dir, 'wikienergy-disag.h5')
output = HDFDataStore(disag_filename, 'w')
h.disaggregate(mains, output)
In [22]:
h.schunk[0]
Out[22]:
In [23]:
plt.plot(h.po[2])
Out[23]:
In [24]:
a.index.values[0]
Out[24]:
In [25]:
cc/np.timedelta64(1,'ns')
In [26]:
h.transients.index.tz
Out[26]:
In [27]:
output.close()
Since we used an unsupervised approach, we don't have labels (as of yet) for the learnt appliances. Thus, we can't directly use accuracy metrics directly (as of yet, maybe in a newer version we can). I'll compare the power draw using our disaggregation with ground truth manually.
In [28]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[building_number].elec
Comparing for Fridge
In [29]:
ax1 = disag_elec['unknown', 0].plot()
ax2 = elec['fridge', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
In [21]:
ax1 = disag_elec['unknown', 0].plot()
ax2 = elec['fridge', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
Comparing for air conditioner
In [30]:
ax1 = disag_elec['unknown', 1].plot()
ax2 = elec['air conditioner', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
In [22]:
ax1 = disag_elec['unknown', 1].plot()
ax2 = elec['air conditioner', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
While the timing looks impeccable, there seems to be some ambiguity in the power measurement :(
Comparing for the unknown
appliance
In [31]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['electric furnace', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
In [23]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['electric furnace', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
So, this doesn't look to be a good fit. Maybe, the dish washer?
In [24]:
ax1 = disag_elec['unknown', 2].plot()
ax2 = elec['dish washer', 1].plot()
ax1.legend(["Predicted", "Ground truth"]);
plt.ylabel("Power (W)")
plt.xlabel("Time");
Probably, a decent fit. Not really sure though!
Broadly speaking, this 30 year algorithm seems to be doing a fairly decent job. The fact that this is unsupervised makes it all the more useful. There is some more work to be done on this algorithm. So, watch out for more posts and updates. Looking forward to your comments and feedback.