Note:

Most users of the EEmeter stack do not directly use the eemeter package for loading their data. Instead, they use the datastore, which uses the eemeter internally. To learn to use the datastore, head over to the datastore basic usage tutorial.

Data preparation

The basic container for project data is the eemeter.structures.Project object. This object contains all of the data necessary for running a meter.

There are three items it requires:

  1. An EnergyTraceSet, which is a collection of EnergyTraces
  2. An list of Interventions
  3. An eemeter.structures.ZIPCodeSite

Let's start by creating an EnergyTrace. Internally, EnergyTrace objects use numpy and pandas, which are nearly ubiquitous python packages for efficient numerical computation and data analysis, respectively.

Since this data is not in a format eemeter recognizes, we need to load it. Let's load this data using a parser we create to turn this data into a format that eemeter recognizes.

We will load data from formatted records using an eemeter.io.serializer.ArbitraryStartSerializer.


In [1]:
# library imports
from eemeter.structures import (
    EnergyTrace,
    EnergyTraceSet,
    Intervention,
    ZIPCodeSite,
    Project
)
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz

First, we import the energy data from the sample CSV and transform it into records


In [2]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
                          parse_dates=['date'], dtype={'zipcode': str})
records = [{
    "start": pytz.UTC.localize(row.date.to_datetime()),
    "value": row.value,
    "estimated": row.estimated,
} for _, row in energy_data.iterrows()]

The records we just created look like this:

>>> records
[
    {
        'estimated': False,
        'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
        'value': 57.8
    },
    {
        'estimated': False,
        'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
        'value': 64.8
    },
    {
        'estimated': False,
        'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
        'value': 49.5
    },
    ...
]

Next, we load our records into an EnergyTrace. We give it units "kWh" and interpretation "ELECTRICITY_CONSUMPTION_SUPPLIED", which means that this is electricity consumed by the building and supplied by a utility (rather than by solar panels or other on-site generation). We also pass in an instance of the record serializer ArbitraryStartSerializer to show it how to interpret the records.


In [3]:
energy_trace = EnergyTrace(
    records=records,
    unit="KWH",
    interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
    serializer=ArbitraryStartSerializer())

The energy trace data looks like this:

>>> energy_trace.data[:3]
                           value estimated
2011-01-01 00:00:00+00:00   57.8     False
2011-01-02 00:00:00+00:00   64.8     False
2011-01-03 00:00:00+00:00   49.5     False

Though we only have one trace here, we will often have more than one trace. Because of that, projects expect an EnergyTraceSet, which is a labeled set of EnergyTraces. We give it the trace_id supplied in the CSV.


In [4]:
energy_trace_set = EnergyTraceSet([energy_trace], labels=["DEF"])

Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (Which we don't use in this tutorial), the ZIP code of the building, and the dates retrofit work for this project started and completed.


In [5]:
project_data = pd.read_csv('sample-project-data.csv',
                           parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]

We create an Intervention from the retrofit start and end dates and wrap it in a list:


In [6]:
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)

interventions = [Intervention(retrofit_start_date, retrofit_end_date)]

Then we create a ZIPCodeSite for the project by passing in the zipcode:


In [7]:
site = ZIPCodeSite(project_data.zipcode)

Now we can create a project using the data we've loaded


In [8]:
project = Project(energy_trace_set=energy_trace_set, interventions=interventions, site=site)

Running meters

To run the EEmeter on the project, instantiate an EnergyEfficiencyMeter and run the .evaluate(project) method, passing in the project we just created:


In [9]:
meter = EnergyEfficiencyMeter()
results = meter.evaluate(project)


/Users/philngo/.virtualenvs/eemeter/lib/python3.4/site-packages/pandas/core/indexing.py:128: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
<matplotlib.figure.Figure at 0x10c383a20>
/Users/philngo/.virtualenvs/eemeter/lib/python3.4/site-packages/pandas/core/indexing.py:128: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
<matplotlib.figure.Figure at 0x10d045ba8>

That's it! Now we can inspect and use our results.

Inspecting results

Let's quickly look through the results object so that we can understand what they mean. The results are embedded in a nested python dict:

>>> results
{
    'weather_normal_source': TMY3WeatherSource("725460"),
    'weather_source': ISDWeatherSource("725460"),
    'modeling_period_set': ModelingPeriodSet(),
    'modeled_energy_traces': {
        'DEF': SplitModeledEnergyTrace()
    },
    'modeled_energy_trace_derivatives': {
        'DEF': {
            ('baseline', 'reporting'): {
                'BASELINE': {
                    'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
                    'gross_predicted': (31806.3, 251.5, 276.1, 1138)
                },
                'REPORTING': {
                    'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
                     'gross_predicted': (25208.1, 215.2, 242.3, 1138)
                }
            }
        }
    },
    'project_derivatives': {
        ('baseline', 'reporting'): {
            'ALL_FUELS_CONSUMPTION_SUPPLIED': {
                'BASELINE': {
                    'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
                    'gross_predicted': (31806.3, 251.5, 276.1, 1138)
                },
                'REPORTING': {
                    'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
                     'gross_predicted': (25208.1, 215.2, 242.3, 1138)
                }
            },
            'ELECTRICITY_CONSUMPTION_SUPPLIED': {
                'BASELINE': {
                    'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
                    'gross_predicted': (31806.3, 251.5, 276.1, 1138)
                },
                'REPORTING': {
                    'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
                     'gross_predicted': (25208.1, 215.2, 242.3, 1138)
                }
            },
            'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED': None,
            'NATURAL_GAS_CONSUMPTION_SUPPLIED': None
        }
    },
}

Note the contents of the dictionary:

  • 'weather_source': An instance of eemeter.weather.ISDWeatherSource. The weather source used to gather observed weather data. The station at which this weather was recorded can be found by inspecting weather_source.station.(Matched by ZIP code)
  • 'weather_normal_source': An instance of eemeter.weather.TMY3WeatherSource. The weather normal source used to gather weather normal data. The station at which this weather normal data was recorded can be found by inspecting weather_normal_source.station.(Matched by ZIP code)
  • 'modeling_period_set': An instance of eemeter.structures.ModelingPeriodSet. The modeling periods determined by the intervention start and end dates; includes groupings. The default grouping for a single intervention is into two modeling periods called "baseline" and "reporting".
  • 'modeled_energy_traces': SplitModeledEnergyTraces instances keyed by trace_id (as given in the EnergyTraceSet; includes models and fit statistics for each modeling period.
  • 'modeled_energy_trace_derivatives': energy results specific to each modeled energy trace, organized by trace_id and modeling period group.
  • 'project_derivatives': Project-level results which are aggregated up from the 'modeled_energy_trace_derivatives'.

The project derivatives are nested quite deeply. The nesting of key-value pairs is as follows:

  • 1st layer: Modeling Period Set id: a tuple of 1 baseline period id and 1 reporting period id, usually ('baseline', 'reporting') - contains the results specific to this pair of modeling periods.
  • 2nd layer: Trace interpretation: a string describing the trace interpretation; in our case "ELECTRICITY_CONSUMPTION_SUPPLIED"
  • 3rd layer: 'BASELINE' and 'REPORTING' - these are fixed labels that always appear at this level; they demarcate the baseline aggregations and the reporting aggregations.
  • 4th layer: 'annualized_weather_normal' and 'gross_predicted' - these are also fixed labels that always appear at this level to indicate the type of the savings values

At the final layers are a 4-tuple of results (value, lower, upper, n): value, indicating the estimated expected value of the selected result; lower, a number which can be subtracted from value to obtain the lower 95% confidence interval bound; upper, a number which can be added to value to obtain the upper 95% confidence interval bound, and n, the total number of records that went into calculation of this value.

To obtain savings numbers, the reporting value should be subtracted from the baseline value as described in the methods overview.

Let's select the most useful results from the eemeter, the project-level derivatives. Note the modeling_period_set selector at the first level: ('baseline', 'reporting')


In [10]:
project_derivatives = results['project_derivatives']

In [11]:
project_derivatives.keys()


Out[11]:
dict_keys([('baseline', 'reporting')])

In [12]:
modeling_period_set_results = project_derivatives[('baseline', 'reporting')]

Now we can select the desired interpretation; four are available.


In [13]:
modeling_period_set_results.keys()


Out[13]:
dict_keys(['ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED', 'NATURAL_GAS_CONSUMPTION_SUPPLIED', 'ALL_FUELS_CONSUMPTION_SUPPLIED', 'ELECTRICITY_CONSUMPTION_SUPPLIED'])

In [14]:
electricity_consumption_supplied_results = modeling_period_set_results['ELECTRICITY_CONSUMPTION_SUPPLIED']

The interpretation level results are broken into "BASELINE" and "REPORTING" in all cases in which they are available; otherwise; the value is None.


In [15]:
electricity_consumption_supplied_results.keys()


Out[15]:
dict_keys(['BASELINE', 'REPORTING'])

In [16]:
baseline_results = electricity_consumption_supplied_results["BASELINE"]
reporting_results = electricity_consumption_supplied_results["REPORTING"]

These results have two components as well - the type of savings.


In [17]:
baseline_results.keys()


Out[17]:
dict_keys(['gross_predicted', 'annualized_weather_normal'])

In [18]:
reporting_results.keys()


Out[18]:
dict_keys(['gross_predicted', 'annualized_weather_normal'])

We select the results for one of them:


In [19]:
baseline_normal = baseline_results['annualized_weather_normal']
reporting_normal = reporting_results['annualized_weather_normal']

As described above, each energy value also includes upper and lower bounds, but can also be used directly to determine savings.


In [20]:
percent_savings = (baseline_normal[0] - reporting_normal[0]) / baseline_normal[0]

In [21]:
percent_savings


Out[21]:
0.20751319075256849

This percent savings value (~20%) is consistent with the savings created in the fake data.