In this notebook, we'll do some quick exploratory analysis on the COMBED data set


In [1]:
from nilmtk import DataSet

Download the data set


In [94]:
!wget http://combed.github.io/downloads/combed.h5 /Users/nipunbatra/Desktop/combed.h5


--2014-12-23 17:02:24--  http://combed.github.io/downloads/combed.h5
Resolving combed.github.io (combed.github.io)... 23.235.46.133
Connecting to combed.github.io (combed.github.io)|23.235.46.133|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21735549 (21M) [application/octet-stream]
Saving to: ‘combed.h5’

combed.h5           100%[=====================>]  20.73M  1.29MB/s   in 16s    

2014-12-23 17:02:40 (1.33 MB/s) - ‘combed.h5’ saved [21735549/21735549]

/Users/nipunbatra/Desktop/combed.h5: Scheme missing.
FINISHED --2014-12-23 17:02:40--
Total wall clock time: 16s
Downloaded: 1 files, 21M in 16s (1.33 MB/s)

In [2]:
ds = DataSet("/Users/nipunbatra/Desktop/combed.h5")

In [3]:
ds.metadata


Out[3]:
{'contact': 'nipunb@iiitd.ac.in',
 'creators': ['Batra, Nipun',
  'Parson, Oliver',
  'Berges, Mario',
  'Singh, Amarjeet',
  'Rogers, Alex'],
 'description': '30 days of electricity data from IIITD campus',
 'geo_location': {'country': 'IN',
  'latitude': 28.54,
  'locality': 'Delhi',
  'longitude': 77.27},
 'institution': 'Indraprastha Institute of Information Technology Delhi (IIITD)',
 'long_name': 'Commercial Building Energy Dataset',
 'meter_devices': {'EM6400': {'description': 'Multifunction meter for feeders',
   'manufacturer': 'Schneider Electric',
   'manufacturer_url': 'http://www.schneider-electric.com/',
   'max_sample_period': 300,
   'measurements': [{'lower_limit': 0,
     'physical_quantity': 'power',
     'type': 'active',
     'upper_limit': 1000000},
    {'lower_limit': 0,
     'physical_quantity': 'energy',
     'type': 'active',
     'upper_limit': 500000000000},
    {'lower_limit': 0,
     'physical_quantity': 'current',
     'type': None,
     'upper_limit': 1000}],
   'model': 'EM6400',
   'sample_period': 30}},
 'name': 'combed',
 'number_of_buildings': 2,
 'publication_date': 2014,
 'related_documents': ['http://combed.github.io',
  'A comparison of non-intrusive load monitoring methods for commercial and residential buildings\n'],
 'schema': 'https://github.com/nilmtk/nilm_metadata/tree/v0.2',
 'subject': 'First public data set from an educational campus',
 'timezone': 'Asia/Kolkata'}

In [4]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style('white')
%matplotlib inline
import seaborn as sns
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 8)

In [5]:
elec = ds.buildings[1].elec

In [6]:
df = elec.energy_per_meter()


14/14 ElecMeter(instance=14, building=1, dataset='combed', appliances=[Appliance(type='sockets', instance=2)])

In [7]:
df


Out[7]:
(1, 1, combed) (2, 1, combed) (3, 1, combed) (4, 1, combed) (5, 1, combed) (6, 1, combed) (7, 1, combed) (8, 1, combed) (9, 1, combed) (10, 1, combed) (11, 1, combed) (12, 1, combed) (13, 1, combed) (14, 1, combed)
active 1.408676e+13 3.090227e+11 2.956379e+12 1.690168e+12 1.316061e+12 1.197351e+12 1.892915e+12 2.153426e+11 1.831435e+11 2.251879e+11 1.032644e+12 1.060988e+11 2.343073e+10 9.585487e+10
apparent NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
reactive NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

In [8]:
df.ix['active'].plot(kind='bar');


Okay, with Mains up there, this does not look pretty! Let us compute this function only on the submeters.


In [9]:
df = elec.submeters().energy_per_meter()


13/13 ElecMeter(instance=14, building=1, dataset='combed', appliances=[Appliance(type='sockets', instance=2)])

In [10]:
df.ix['active'].plot(kind='barh')
labels = elec.get_labels(df.columns)
plt.yticks(range(len(df.columns)), labels);


Let us now zoom into individual floor energy consumption


In [11]:
from nilmtk import MeterGroup
floor_mg = MeterGroup([meter for meter in elec.meters if "name" in meter.metadata])

In [12]:
floor_mg


Out[12]:
MeterGroup(meters=
  ElecMeter(instance=3, building=1, dataset='combed', appliances=[])
  ElecMeter(instance=4, building=1, dataset='combed', appliances=[])
  ElecMeter(instance=5, building=1, dataset='combed', appliances=[])
  ElecMeter(instance=6, building=1, dataset='combed', appliances=[])
  ElecMeter(instance=7, building=1, dataset='combed', appliances=[])
)

In [13]:
floor_mg.plot()


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x10c5f3390>

In [14]:
df_floor = floor_mg.energy_per_meter().ix['active']
df_floor.plot(kind='barh');
labels = elec.get_labels(df_floor.index)
plt.yticks(range(len(df_floor.index)), labels);


5/5 ElecMeter(instance=7, building=1, dataset='combed', appliances=[])

Let us now try and see if we can break down individual floor energy consumption once we have some training data.

Splitting Training and Testing data


In [15]:
from nilmtk.disaggregate import CombinatorialOptimisation, FHMM
from nilmtk.metrics import f1_score, error_in_assigned_energy, fraction_energy_assigned_correctly

In [16]:
split_point = elec.train_test_split(train_fraction=0.5)

In [17]:
split_point


Out[17]:
Timestamp('2014-06-16 23:28:38+0530', tz='Asia/Kolkata')

Trying with deepcopy


In [18]:
from copy import copy, deepcopy

train = deepcopy(ds)
test = deepcopy(train)


Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in 

In [19]:
train.set_window(end='2014-06-16 23:28:38')
test.set_window(start='2014-06-16 23:28:38')

Plotting full DS. This should have data for the entire time duration


In [20]:
ds.buildings[1].elec.plot();



In [21]:
train.buildings[1].elec.plot()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-c30ba582b1f7> in <module>()
----> 1 train.buildings[1].elec.plot()

/Users/nipunbatra/git/nilmtk/nilmtk/metergroup.pyc in plot(self, kind, **kwargs)
   1363         except KeyError:
   1364             raise ValueError("'{}' not a valid setting for 'kind' parameter."
-> 1365                              .format(kind))
   1366         return ax
   1367 

ValueError: 'separate lines' not a valid setting for 'kind' parameter.

Okay this won't work, lets try and get some data from train


In [22]:
train.buildings[1].elec.meters[1].load(chunksize=100).next()


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-22-0576bc0f2c9f> in <module>()
----> 1 train.buildings[1].elec.meters[1].load(chunksize=100).next()

/Users/nipunbatra/git/nilmtk/nilmtk/datastore/hdfdatastore.pyc in load(self, key, cols, sections, n_look_ahead_rows, chunksize, verbose)
     86                     if str(e) == ("'NoneType' object has no attribute "
     87                                   "'read_coordinates'"):
---> 88                         raise KeyError("key '{}' not found".format(key))
     89                     else:
     90                         raise

KeyError: "key '/building1/elec/meter2' not found"

Maybe, we need to import metadat from ds?


In [23]:
train.import_metadata(ds.store)


Out[23]:
<nilmtk.dataset.DataSet at 0x10cc2a8d0>

In [24]:
train.set_window(end='2014-06-16 23:28:38')

In [25]:
train.buildings[1].elec.plot();


Great, train seems to have been divided at the right point now. What about ds again?


In [26]:
ds.buildings[1].elec.plot();


No! This ain't correct! ds should have retained the full window.

Let me try with test now.


In [27]:
test.buildings[1].elec.meters[1].load(chunksize=100).next()


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-27-012137d9c331> in <module>()
----> 1 test.buildings[1].elec.meters[1].load(chunksize=100).next()

/Users/nipunbatra/git/nilmtk/nilmtk/datastore/hdfdatastore.pyc in load(self, key, cols, sections, n_look_ahead_rows, chunksize, verbose)
     86                     if str(e) == ("'NoneType' object has no attribute "
     87                                   "'read_coordinates'"):
---> 88                         raise KeyError("key '{}' not found".format(key))
     89                     else:
     90                         raise

KeyError: "key '/building1/elec/meter2' not found"

In [28]:
test.import_metadata(ds.store)


Out[28]:
<nilmtk.dataset.DataSet at 0x10cc2a890>

In [29]:
test.buildings[1].elec.plot()


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x10adc4790>

This surely ain't right!


In [30]:
test.set_window(start='2014-06-16 23:28:38')

In [31]:
test.buildings[1].elec.plot();


What about train. Hope train doesn't change its window automatically now :(


In [32]:
train.buildings[1].elec.plot();


It does!

Train-test testing ends

Lets try again


In [33]:
ds_2 = DataSet("/Users/nipunbatra/Desktop/combed.h5")
tr_store = deepcopy(ds_2.store)


Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in 

In [36]:
tr = deepcopy(ds_2)
tr.import_metadata(tr_store)
te = deepcopy(ds_2)
te.import_metadata(deepcopy(ds_2.store))


Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in 
Out[36]:
<nilmtk.dataset.DataSet at 0x103855490>

In [37]:
tr.set_window(end='2014-06-16 23:28:38')
te.set_window(start='2014-06-16 23:28:38')

In [38]:
tr.buildings[1].elec.plot()


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-86adf469e76a> in <module>()
----> 1 tr.buildings[1].elec.plot()

/Users/nipunbatra/git/nilmtk/nilmtk/metergroup.pyc in plot(self, kind, **kwargs)
   1363         except KeyError:
   1364             raise ValueError("'{}' not a valid setting for 'kind' parameter."
-> 1365                              .format(kind))
   1366         return ax
   1367 

ValueError: 'separate lines' not a valid setting for 'kind' parameter.

In [32]:
co = CombinatorialOptimisation()
co.train(floor_mg)


Training model for submeter 'ElecMeter(instance=3, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=4, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=5, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=6, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=7, building=1, dataset='combed', appliances=[])'
Done training!
/Users/nipunbatra/git/nilmtk/nilmtk/electric.py:96: UserWarning: If you are using `preprocessing` to resample then please do not!  Instead, please use the `sample_period` parameter and set `resample=True`.
  warn("If you are using `preprocessing` to resample then please"

In [39]:
ds


Out[39]:
<nilmtk.dataset.DataSet at 0x109301b90>

In [41]:
b=ds.buildings[1]

In [42]:
elec = train.buildings[1].elec

In [43]:
elec.plot()


Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x10bd4cbd0>

In [15]:
co.model


Out[15]:
[{'states': array([   0, 2317, 7565], dtype=int32),
  'training_metadata': ElecMeter(instance=3, building=1, dataset='combed', appliances=[])},
 {'states': array([    0,  3190, 10190], dtype=int32),
  'training_metadata': ElecMeter(instance=4, building=1, dataset='combed', appliances=[])},
 {'states': array([   0, 1740, 3356], dtype=int32),
  'training_metadata': ElecMeter(instance=5, building=1, dataset='combed', appliances=[])},
 {'states': array([   0, 1456, 2843], dtype=int32),
  'training_metadata': ElecMeter(instance=6, building=1, dataset='combed', appliances=[])},
 {'states': array([    0,  1174, 11285], dtype=int32),
  'training_metadata': ElecMeter(instance=7, building=1, dataset='combed', appliances=[])}]

In [71]:
from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore

disag_filename = "/Users/nipunbatra/Desktop/out_co.h5"
output = HDFDataStore(disag_filename, 'w')
co.disaggregate(elec.mains(), output)
output.close()


vampire_power = 0.0 watts
Estimating power demand for 'ElecMeter(instance=3, building=1, dataset='combed', appliances=[])'
Estimating power demand for 'ElecMeter(instance=4, building=1, dataset='combed', appliances=[])'
Estimating power demand for 'ElecMeter(instance=5, building=1, dataset='combed', appliances=[])'
Estimating power demand for 'ElecMeter(instance=6, building=1, dataset='combed', appliances=[])'
Estimating power demand for 'ElecMeter(instance=7, building=1, dataset='combed', appliances=[])'
ElecMeter(instance=3, building=1, dataset='combed', appliances=[])
ElecMeter(instance=4, building=1, dataset='combed', appliances=[])
ElecMeter(instance=5, building=1, dataset='combed', appliances=[])
ElecMeter(instance=6, building=1, dataset='combed', appliances=[])
ElecMeter(instance=7, building=1, dataset='combed', appliances=[])

In [72]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec
disag_elec[(7)].plot()
elec[7].plot()
plt.tight_layout()
disag.store.close()



In [ ]:
HDFDataStore(disag_filename

In [30]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec
disag_elec.plot()
disag.store.close()


/Users/nipunbatra/git/nilmtk/nilmtk/metergroup.py:76: RuntimeWarning: Building 1 has an empty 'appliances' list.
  .format(building_id.instance), RuntimeWarning)

In [17]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec
disag_elec.plot()
disag.store.close()


/Users/nipunbatra/git/nilmtk/nilmtk/metergroup.py:76: RuntimeWarning: Building 1 has an empty 'appliances' list.
  .format(building_id.instance), RuntimeWarning)

In [31]:
e = elec.meters[1]

In [32]:
e.dominant_appliance().identifier.type


Out[32]:
'elevator'

In [33]:
disag_elec


Out[33]:
MeterGroup(meters=
  ElecMeter(instance=1, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', site_meter, appliances=[])
  ElecMeter(instance=3, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', appliances=[])
  ElecMeter(instance=4, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', appliances=[])
  ElecMeter(instance=5, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', appliances=[])
  ElecMeter(instance=6, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', appliances=[])
  ElecMeter(instance=7, building=1, dataset='NILMTK_CO_2014-12-22T14:52:03', appliances=[])
)

In [34]:
a = {2:None}

In [35]:
a =


  File "<ipython-input-35-beb00c8a707f>", line 1
    a =
        ^
SyntaxError: invalid syntax

In [18]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec

f1 = f1_score(disag_elec, elec)
f1.index = disag_elec.get_labels(f1.index)
f1.plot(kind='bar')
plt.xlabel('appliance');
plt.ylabel('f-score');

disag.store.close()



In [37]:
f1


Out[37]:
First floor total     0.615812
Second floor total    0.703973
Third floor total     0.952787
Fourth floor total    0.946971
Fifth floor total     0.689214
dtype: float64

In [31]:
from nilmtk.metrics import error_in_assigned_energy, mean_normalized_error_power

In [32]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec

e = mean_normalized_error_power(disag_elec, elec)

disag.store.close()

In [34]:
e


Out[34]:
3    0.394772
4    0.465725
5    0.396835
6    0.427563
7    0.423408
dtype: float64

In [22]:
disag = DataSet(disag_filename)

disag_elec = disag.buildings[1].elec

#%matplotlib inline
disag_elec[(3)].plot()
elec[3].plot()
plt.tight_layout()



In [23]:
disag_elec[(4)].plot()
elec[4].plot()
plt.tight_layout()



In [52]:
elec[3].load().next()['power','active'].plot()


Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ad5e5d0>

In [24]:
elec[3].load().next()['power','active'].plot()


Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x10cd0c390>

In [25]:
df = elec[3].load().next()

In [59]:
df['power','active']['13-06-2014'].plot()


Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e3558d0>

In [64]:
plt.plot(df['power','active']['16-06-2014'].values)
#df['power','active']['15-06-2014'].plot()
plt.plot(df['power','active']['15-06-2014'].values)
plt.plot(df['power','active']['14-06-2014'].values)
plt.plot(df['power','active']['13-06-2014'].values)


Out[64]:
[<matplotlib.lines.Line2D at 0x112058c10>]

In [65]:
plt.plot(df['power','active']['16-06-2014'].values)


Out[65]:
[<matplotlib.lines.Line2D at 0x1123c0110>]

In [28]:
#disag_elec[('AHU', 2)].plot()
elec[('AHU', 2)].plot()


Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x10a617a90>

In [29]:
#disag_elec[('AHU', 3)].plot()
elec[('AHU', 3)].plot()


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x10cd17610>

In [30]:
#disag_elec[('sockets', 1)].plot()
elec[('sockets', 1)].plot()


Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x10b1523d0>

In [30]:
df = elec[('sockets',1)].load().next()

In [32]:
df2 = disag_elec[('sockets',1)].load().next()

In [35]:
from nilmtk.disaggregate import fhmm_exact

In [36]:
fh = fhmm_exact.FHMM()

In [37]:
fh.train(floor_mg)


Training model for submeter 'ElecMeter(instance=3, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=4, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=5, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=6, building=1, dataset='combed', appliances=[])'
Training model for submeter 'ElecMeter(instance=7, building=1, dataset='combed', appliances=[])'

In [38]:
disag_filename = "/Users/nipunbatra/Desktop/out_fhmm.h5"
output = HDFDataStore(disag_filename, 'w')
fh.disaggregate(elec.mains(), output)
output.close()


Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing the Group None',) in 

In [39]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec

e = mean_normalized_error_power(disag_elec, elec)

disag.store.close()

In [40]:
e


Out[40]:
3    0.333697
4    0.400300
5    0.277672
6    0.303830
7    0.288941
dtype: float64

In [42]:
e.plot(kind='bar')


Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x110b45150>

In [67]:
disag = DataSet(disag_filename)
disag_elec = disag.buildings[1].elec
disag_elec[(7)].plot()
elec[7].plot()
plt.tight_layout()
disag.store.close()



In [ ]: