Note:
Most users of the EEmeter stack do not directly use the
eemeter
package for loading their data. Instead, they use thedatastore
, which uses the eemeter internally. To learn to use the datastore, head over to the datastore basic usage tutorial.
The basic container for project data is the eemeter.structures.Project
object. This object contains all of the data necessary for running a meter.
There are three items it requires:
EnergyTraceSet
, which is a collection of EnergyTrace
slist
of Intervention
seemeter.structures.ZIPCodeSite
Let's start by creating an EnergyTrace
. Internally, EnergyTrace
objects use numpy and
pandas, which are nearly
ubiquitous python packages for efficient numerical computation and
data analysis, respectively.
Since this data is not in a format eemeter recognizes, we need to load it. Let's load this data using a parser we create to turn this data into a format that eemeter recognizes.
We will load data from formatted records using an
eemeter.io.serializer.ArbitraryStartSerializer
.
In [1]:
# library imports
from eemeter.structures import (
EnergyTrace,
EnergyTraceSet,
Intervention,
ZIPCodeSite,
Project
)
from eemeter.io.serializers import ArbitraryStartSerializer
from eemeter.ee.meter import EnergyEfficiencyMeter
import pandas as pd
import pytz
First, we import the energy data from the sample CSV and transform it into records
In [2]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
parse_dates=['date'], dtype={'zipcode': str})
records = [{
"start": pytz.UTC.localize(row.date.to_datetime()),
"value": row.value,
"estimated": row.estimated,
} for _, row in energy_data.iterrows()]
The records we just created look like this:
>>> records
[
{
'estimated': False,
'start': datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<UTC>),
'value': 57.8
},
{
'estimated': False,
'start': datetime.datetime(2011, 1, 2, 0, 0, tzinfo=<UTC>),
'value': 64.8
},
{
'estimated': False,
'start': datetime.datetime(2011, 1, 3, 0, 0, tzinfo=<UTC>),
'value': 49.5
},
...
]
Next, we load our records into an EnergyTrace
. We give it units "kWh"
and interpretation "ELECTRICITY_CONSUMPTION_SUPPLIED"
, which means that this is electricity consumed by the building and supplied by a utility (rather than by solar panels or other on-site generation). We also pass in an instance of the record serializer ArbitraryStartSerializer
to show it how to interpret the records.
In [3]:
energy_trace = EnergyTrace(
records=records,
unit="KWH",
interpretation="ELECTRICITY_CONSUMPTION_SUPPLIED",
serializer=ArbitraryStartSerializer())
The energy trace data looks like this:
>>> energy_trace.data[:3]
value estimated
2011-01-01 00:00:00+00:00 57.8 False
2011-01-02 00:00:00+00:00 64.8 False
2011-01-03 00:00:00+00:00 49.5 False
Though we only have one trace here, we will often have more than one trace. Because of that, projects expect an EnergyTraceSet
, which is a labeled set of EnergyTraces
. We give it the trace_id
supplied in the CSV.
In [4]:
energy_trace_set = EnergyTraceSet([energy_trace], labels=["DEF"])
Now we load the rest of the project data from the sample project data CSV. This CSV includes the project_id (Which we don't use in this tutorial), the ZIP code of the building, and the dates retrofit work for this project started and completed.
In [5]:
project_data = pd.read_csv('sample-project-data.csv',
parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]
We create an Intervention
from the retrofit start and end dates and wrap it in a list:
In [6]:
retrofit_start_date = pytz.UTC.localize(project_data.retrofit_start_date)
retrofit_end_date = pytz.UTC.localize(project_data.retrofit_end_date)
interventions = [Intervention(retrofit_start_date, retrofit_end_date)]
Then we create a ZIPCodeSite
for the project by passing in the zipcode:
In [7]:
site = ZIPCodeSite(project_data.zipcode)
Now we can create a project using the data we've loaded
In [8]:
project = Project(energy_trace_set=energy_trace_set, interventions=interventions, site=site)
In [9]:
meter = EnergyEfficiencyMeter()
results = meter.evaluate(project)
That's it! Now we can inspect and use our results.
Let's quickly look through the results object so that we can understand what they mean. The results are embedded in a nested python dict
:
>>> results
{
'weather_normal_source': TMY3WeatherSource("725460"),
'weather_source': ISDWeatherSource("725460"),
'modeling_period_set': ModelingPeriodSet(),
'modeled_energy_traces': {
'DEF': SplitModeledEnergyTrace()
},
'modeled_energy_trace_derivatives': {
'DEF': {
('baseline', 'reporting'): {
'BASELINE': {
'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
'gross_predicted': (31806.3, 251.5, 276.1, 1138)
},
'REPORTING': {
'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
'gross_predicted': (25208.1, 215.2, 242.3, 1138)
}
}
}
},
'project_derivatives': {
('baseline', 'reporting'): {
'ALL_FUELS_CONSUMPTION_SUPPLIED': {
'BASELINE': {
'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
'gross_predicted': (31806.3, 251.5, 276.1, 1138)
},
'REPORTING': {
'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
'gross_predicted': (25208.1, 215.2, 242.3, 1138)
}
},
'ELECTRICITY_CONSUMPTION_SUPPLIED': {
'BASELINE': {
'annualized_weather_normal': (11051.6, 142.4, 156.4, 365),
'gross_predicted': (31806.3, 251.5, 276.1, 1138)
},
'REPORTING': {
'annualized_weather_normal': (8758.2, 121.9, 137.2, 365),
'gross_predicted': (25208.1, 215.2, 242.3, 1138)
}
},
'ELECTRICITY_ON_SITE_GENERATION_UNCONSUMED': None,
'NATURAL_GAS_CONSUMPTION_SUPPLIED': None
}
},
}
Note the contents of the dictionary:
'weather_source'
: An instance of eemeter.weather.ISDWeatherSource
. The weather source used to gather observed weather data. The station at which this weather was recorded can be found by inspecting weather_source.station
.(Matched by ZIP code)'weather_normal_source'
: An instance of eemeter.weather.TMY3WeatherSource
. The weather normal source used to gather weather normal data. The station at which this weather normal data was recorded can be found by inspecting weather_normal_source.station
.(Matched by ZIP code)'modeling_period_set'
: An instance of eemeter.structures.ModelingPeriodSet
. The modeling periods determined by the intervention start and end dates; includes groupings. The default grouping for a single intervention is into two modeling periods called "baseline" and "reporting".'modeled_energy_traces'
: SplitModeledEnergyTraces
instances
keyed by trace_id
(as given in the EnergyTraceSet
; includes
models and fit statistics for each modeling period.'modeled_energy_trace_derivatives'
: energy results specific to each
modeled energy trace, organized by trace_id and modeling period group.'project_derivatives'
: Project-level results which are aggregated up from the 'modeled_energy_trace_derivatives'
.The project derivatives are nested quite deeply. The nesting of key-value pairs is as follows:
('baseline', 'reporting')
- contains the results specific to this pair of modeling periods."ELECTRICITY_CONSUMPTION_SUPPLIED"
'BASELINE'
and 'REPORTING'
- these are fixed labels that always appear at this level; they demarcate the baseline aggregations and the reporting aggregations.'annualized_weather_normal'
and 'gross_predicted'
- these are also fixed labels that always appear at this level to indicate the type of the savings valuesAt the final layers are a 4-tuple of results (value, lower, upper, n)
: value
, indicating the estimated expected value of the selected result; lower
, a number which can be subtracted from value
to obtain the lower 95% confidence interval bound; upper
, a number which can be added to value
to obtain the upper 95% confidence interval bound, and n
, the total number of records that went into calculation of this value.
To obtain savings numbers, the reporting value should be subtracted from the baseline value as described in the methods overview.
Let's select the most useful results from the eemeter, the project-level derivatives. Note the modeling_period_set selector at the first level: ('baseline', 'reporting')
In [10]:
project_derivatives = results['project_derivatives']
In [11]:
project_derivatives.keys()
Out[11]:
In [12]:
modeling_period_set_results = project_derivatives[('baseline', 'reporting')]
Now we can select the desired interpretation; four are available.
In [13]:
modeling_period_set_results.keys()
Out[13]:
In [14]:
electricity_consumption_supplied_results = modeling_period_set_results['ELECTRICITY_CONSUMPTION_SUPPLIED']
The interpretation level results are broken into "BASELINE"
and "REPORTING"
in all cases in which they are available; otherwise; the value is None.
In [15]:
electricity_consumption_supplied_results.keys()
Out[15]:
In [16]:
baseline_results = electricity_consumption_supplied_results["BASELINE"]
reporting_results = electricity_consumption_supplied_results["REPORTING"]
These results have two components as well - the type of savings.
In [17]:
baseline_results.keys()
Out[17]:
In [18]:
reporting_results.keys()
Out[18]:
We select the results for one of them:
In [19]:
baseline_normal = baseline_results['annualized_weather_normal']
reporting_normal = reporting_results['annualized_weather_normal']
As described above, each energy value also includes upper and lower bounds, but can also be used directly to determine savings.
In [20]:
percent_savings = (baseline_normal[0] - reporting_normal[0]) / baseline_normal[0]
In [21]:
percent_savings
Out[21]:
This percent savings value (~20%) is consistent with the savings created in the fake data.