Loading data into memory

Loading API is central to a lot of nilmtk operations and provides a great deal of flexibility. Let's look at ways in which we can load data from a NILMTK DataStore into memory. To see the full range of possible queries, we'll use the iAWE data set (whose HDF5 file can be downloaded here).

The load function returns a generator of DataFrames loaded from the DataStore based on the conditions specified. If no conditions are specified, then all data from all the columns is loaded. (If you have not come across Python generators, it might be worth reading this quick guide to Python generators.)

NOTE: If you are on Windows, remember to escape the back-slashes, use forward-slashs, or use raw-strings when passing paths in Python, e.g. one of the following would work:

iawe = DataSet('c:\\data\\iawe.h5')
iawe = DataSet('c:/data/iawe.h5')
iawe = DataSet(r'c:\data\iawe.h5')

In [1]:
from nilmtk import DataSet

iawe = DataSet('/data/iawe.h5')
elec = iawe.buildings[1].elec
elec


Out[1]:
MeterGroup(meters=
  ElecMeter(instance=1, building=1, dataset='iAWE', site_meter, appliances=[])
  ElecMeter(instance=2, building=1, dataset='iAWE', site_meter, appliances=[])
  ElecMeter(instance=3, building=1, dataset='iAWE', appliances=[Appliance(type='fridge', instance=1)])
  ElecMeter(instance=4, building=1, dataset='iAWE', appliances=[Appliance(type='air conditioner', instance=1)])
  ElecMeter(instance=5, building=1, dataset='iAWE', appliances=[Appliance(type='air conditioner', instance=2)])
  ElecMeter(instance=6, building=1, dataset='iAWE', appliances=[Appliance(type='washing machine', instance=1)])
  ElecMeter(instance=7, building=1, dataset='iAWE', appliances=[Appliance(type='computer', instance=1)])
  ElecMeter(instance=8, building=1, dataset='iAWE', appliances=[Appliance(type='clothes iron', instance=1)])
  ElecMeter(instance=9, building=1, dataset='iAWE', appliances=[Appliance(type='unknown', instance=1)])
  ElecMeter(instance=10, building=1, dataset='iAWE', appliances=[Appliance(type='television', instance=1)])
  ElecMeter(instance=11, building=1, dataset='iAWE', appliances=[Appliance(type='wet appliance', instance=1)])
)

Let us see what measurements we have for the fridge:


In [2]:
fridge = elec['fridge']
fridge.available_columns()


Out[2]:
[('current', None),
 ('power', 'active'),
 ('frequency', None),
 ('power factor', None),
 ('power', 'apparent'),
 ('power', 'reactive'),
 ('voltage', None)]

Loading data

Load all columns (default)


In [3]:
df = next(fridge.load())
df.head()


Out[3]:
physical_quantity current power frequency power voltage
type active apparent reactive
2013-07-13 05:30:00+05:30 0.011000 0.166925 50.157169 2.660094 2.652679 241.494720
2013-07-13 05:31:00+05:30 0.010981 0.169385 50.148460 2.647615 2.640115 242.189423
2013-07-13 05:32:00+05:30 0.011000 0.177887 50.143394 2.672245 2.666358 243.750381
2013-07-13 05:33:00+05:30 0.010982 0.175929 50.095535 2.685518 2.677607 245.131790
2013-07-13 05:34:00+05:30 0.010978 0.177044 50.099998 2.694733 2.688200 246.001328

Load a single column of power data

Use fridge.power_series() which returns a generator of 1-dimensional pandas.Series objects, each containing power data using the most 'sensible' AC type:


In [4]:
series = next(fridge.power_series())
series.head()


Out[4]:
2013-07-13 05:30:00+05:30    0.166925
2013-07-13 05:31:00+05:30    0.169385
2013-07-13 05:32:00+05:30    0.177887
2013-07-13 05:33:00+05:30    0.175929
2013-07-13 05:34:00+05:30    0.177044
Name: (power, active), dtype: float32

or, to get reactive power:


In [5]:
series = next(fridge.power_series(ac_type='reactive'))
series.head()


Out[5]:
2013-07-13 05:30:00+05:30    2.652679
2013-07-13 05:31:00+05:30    2.640115
2013-07-13 05:32:00+05:30    2.666358
2013-07-13 05:33:00+05:30    2.677607
2013-07-13 05:34:00+05:30    2.688200
Name: (power, reactive), dtype: float32

Specify physical_quantity or AC type


In [6]:
df = next(fridge.load(physical_quantity='power', ac_type='reactive'))
df.head()


Out[6]:
physical_quantity power
type reactive
2013-07-13 05:30:00+05:30 2.652679
2013-07-13 05:31:00+05:30 2.640115
2013-07-13 05:32:00+05:30 2.666358
2013-07-13 05:33:00+05:30 2.677607
2013-07-13 05:34:00+05:30 2.688200

To load voltage data:


In [7]:
df = next(fridge.load(physical_quantity='voltage'))
df.head()


Out[7]:
physical_quantity voltage
type
2013-07-13 05:30:00+05:30 241.494720
2013-07-13 05:31:00+05:30 242.189423
2013-07-13 05:32:00+05:30 243.750381
2013-07-13 05:33:00+05:30 245.131790
2013-07-13 05:34:00+05:30 246.001328

In [8]:
df = next(fridge.load(physical_quantity = 'power'))
df.head()


Out[8]:
physical_quantity power
type active apparent reactive
2013-07-13 05:30:00+05:30 0.166925 2.660094 2.652679
2013-07-13 05:31:00+05:30 0.169385 2.647615 2.640115
2013-07-13 05:32:00+05:30 0.177887 2.672245 2.666358
2013-07-13 05:33:00+05:30 0.175929 2.685518 2.677607
2013-07-13 05:34:00+05:30 0.177044 2.694733 2.688200

Loading by specifying AC type


In [9]:
df = next(fridge.load(ac_type='active'))
df.head()


Out[9]:
physical_quantity power
type active
2013-07-13 05:30:00+05:30 0.166925
2013-07-13 05:31:00+05:30 0.169385
2013-07-13 05:32:00+05:30 0.177887
2013-07-13 05:33:00+05:30 0.175929
2013-07-13 05:34:00+05:30 0.177044

Loading by resampling to a specified period


In [10]:
# resample to minutely (i.e. with a sample period of 60 secs)
df = next(fridge.load(ac_type='active', sample_period=60))
df.head()


Out[10]:
physical_quantity power
type active
2013-07-13 05:30:00+05:30 0.166925
2013-07-13 05:31:00+05:30 0.169385
2013-07-13 05:32:00+05:30 0.177887
2013-07-13 05:33:00+05:30 0.175929
2013-07-13 05:34:00+05:30 0.177044