Loading data into memory

Loading API is central to a lot of nilmtk operations and provides a great deal of flexibility. Let's look at ways in which we can load data from a NILMTK DataStore into memory. To see the full range of possible queries, we'll use the iAWE data set (whose HDF5 file can be downloaded here).

The load function returns a generator of DataFrames loaded from the DataStore based on the conditions specified. If no conditions are specified, then all data from all the columns is loaded. (If you have not come across Python generators, it might be worth reading this quick guide to Python generators.)


In [1]:
from nilmtk import DataSet

iawe = DataSet('/data/iawe/iawe.h5')
elec = iawe.buildings[1].elec
elec


Out[1]:
MeterGroup(meters=
  ElecMeter(instance=1, building=1, dataset='iAWE', site_meter, appliances=[])
  ElecMeter(instance=2, building=1, dataset='iAWE', site_meter, appliances=[])
  ElecMeter(instance=3, building=1, dataset='iAWE', appliances=[Appliance(type='fridge', instance=1)])
  ElecMeter(instance=4, building=1, dataset='iAWE', appliances=[Appliance(type='air conditioner', instance=1)])
  ElecMeter(instance=5, building=1, dataset='iAWE', appliances=[Appliance(type='air conditioner', instance=2)])
  ElecMeter(instance=6, building=1, dataset='iAWE', appliances=[Appliance(type='washing machine', instance=1)])
  ElecMeter(instance=7, building=1, dataset='iAWE', appliances=[Appliance(type='computer', instance=1)])
  ElecMeter(instance=8, building=1, dataset='iAWE', appliances=[Appliance(type='clothes iron', instance=1)])
  ElecMeter(instance=9, building=1, dataset='iAWE', appliances=[Appliance(type='unknown', instance=1)])
  ElecMeter(instance=10, building=1, dataset='iAWE', appliances=[Appliance(type='television', instance=1)])
  ElecMeter(instance=11, building=1, dataset='iAWE', appliances=[Appliance(type='wet appliance', instance=1)])
  ElecMeter(instance=12, building=1, dataset='iAWE', appliances=[Appliance(type='motor', instance=1)])
)

Let us see what measurements we have for the fridge:


In [2]:
fridge = elec['fridge']
fridge.available_columns()


Out[2]:
[('power factor', None),
 ('current', None),
 ('power', 'apparent'),
 ('voltage', None),
 ('power', 'active'),
 ('frequency', None),
 ('power', 'reactive')]

Loading data

Load all columns (default)


In [3]:
df = fridge.load().next()
df.head()


Out[3]:
physical_quantity current power voltage power frequency power
type apparent active reactive
2013-06-07 05:30:00+05:30 0.011 2.486 235.070007 0.111 50.070000 2.483
2013-06-07 05:30:01+05:30 0.011 2.555 235.020004 0.200 50.080002 2.547
2013-06-07 05:30:02+05:30 0.011 2.485 234.979996 0.152 50.080002 2.480
2013-06-07 05:30:03+05:30 0.010 2.449 235.000000 0.159 50.060001 2.444
2013-06-07 05:30:04+05:30 0.011 2.519 234.949997 0.215 50.060001 2.510

Load a single column of power data

Use fridge.power_series() which returns a generator of 1-dimensional pandas.Series objects, each containing power data using the most 'sensible' AC type:


In [4]:
series = fridge.power_series().next()
series.head()


Out[4]:
2013-06-07 05:30:00+05:30    0.111
2013-06-07 05:30:01+05:30    0.200
2013-06-07 05:30:02+05:30    0.152
2013-06-07 05:30:03+05:30    0.159
2013-06-07 05:30:04+05:30    0.215
Name: (power, active), dtype: float64

or, to get reactive power:


In [8]:
series = fridge.power_series(ac_type='reactive').next()
series.head()


Out[8]:
2013-06-07 05:30:00+05:30    2.483
2013-06-07 05:30:01+05:30    2.547
2013-06-07 05:30:02+05:30    2.480
2013-06-07 05:30:03+05:30    2.444
2013-06-07 05:30:04+05:30    2.510
Name: (power, reactive), dtype: float64

Specify physical_quantity or AC type


In [5]:
df = fridge.load(physical_quantity='power', ac_type='reactive').next()
df.head()


Out[5]:
physical_quantity power
type reactive
2013-06-07 05:30:00+05:30 2.483
2013-06-07 05:30:01+05:30 2.547
2013-06-07 05:30:02+05:30 2.480
2013-06-07 05:30:03+05:30 2.444
2013-06-07 05:30:04+05:30 2.510

To load voltage data:


In [7]:
df = fridge.load(physical_quantity='voltage').next()
df.head()


Out[7]:
physical_quantity voltage
type
2013-06-07 05:30:00+05:30 235.070007
2013-06-07 05:30:01+05:30 235.020004
2013-06-07 05:30:02+05:30 234.979996
2013-06-07 05:30:03+05:30 235.000000
2013-06-07 05:30:04+05:30 234.949997

In [9]:
df = fridge.load(physical_quantity = 'power').next()
df.head()


Out[9]:
physical_quantity power
type apparent active reactive
2013-06-07 05:30:00+05:30 2.486 0.111 2.483
2013-06-07 05:30:01+05:30 2.555 0.200 2.547
2013-06-07 05:30:02+05:30 2.485 0.152 2.480
2013-06-07 05:30:03+05:30 2.449 0.159 2.444
2013-06-07 05:30:04+05:30 2.519 0.215 2.510

Loading by specifying AC type


In [10]:
df = fridge.load(ac_type = 'active').next()
df.head()


Out[10]:
physical_quantity power
type active
2013-06-07 05:30:00+05:30 0.111
2013-06-07 05:30:01+05:30 0.200
2013-06-07 05:30:02+05:30 0.152
2013-06-07 05:30:03+05:30 0.159
2013-06-07 05:30:04+05:30 0.215

Loading by resampling to a specified period


In [11]:
# resample to minutely (i.e. with a sample period of 60 secs)
df = fridge.load(ac_type = 'active', sample_period=60).next()
df.head()


Out[11]:
physical_quantity power
type active
2013-06-07 05:30:00+05:30 0.157583
2013-06-07 05:31:00+05:30 0.160567
2013-06-07 05:32:00+05:30 0.158170
2013-06-07 05:33:00+05:30 105.332802
2013-06-07 05:34:00+05:30 120.265068

In [ ]: