This jupyter notebook is an interactive tutorial. It walks through loading data, running the CalTRACK methods, and plotting results. You'll run all the code yourself. Cells can be executed with `<shift><enter>`

. If you feel so inspired, make edits to the code in these cells and dig deeper.

This tutorial assumes the reader has properly installed python and the eemeter package (`pip install eemeter`

) and has a basic working knowledge of python syntax and usage.

This tutorial is a self-paced walkthrough of how to use the eemeter package. We'll cover the following:

- Background - why this library
- Loading data
- Plotting and visualization
- Filtering data to baseline or reporting periods
- Creating design matrix datasets
- Fitting baseline (and reporting) models
- Using fitted models for prediction
- Computing CalTRACK metered savings

The tutorial is focused on demonstrating how to use the package to run the CalTRACK Hourly, Daily, and Billing methods on hourly, daily, and billing meter data.

At time of writing (Sept 2018), the OpenEEmeter, as implemented in the `eemeter`

package, contains the most complete open source implementation of the CalTRACK methods, which specify a way of calculating avoided energy use at a single meter. However, using the OpenEEmeter to calculate avoided energy use does not in itself guarantee compliance with the CalTRACK method specification. Nor is using the OpenEEmeter a requirement of the CalTRACK methods. The eemeter package is a toolkit that may help with implementing a CalTRACK compliant analysis, as it provides a particular implementation of the CalTRACK methods which consists of a set of functions, parameters, and classes which can be configured to run the CalTRACK methods and variants. Please keep in mind while using the package that the eemeter assumes certain data cleaning tasks that are specified in the CalTRACK methods have occurred *prior* to usage with the eemeter. The package will create warnings to expose errors of this nature where possible.

The eemeter package is built for flexibility and modularity. While this is generally helpful and makes it easier to use the package, one potential consequence of this for users is that without being careful to follow the both the eemeter documentation *and* the guidance provided in the CalTRACK methods, it is very possible to use the eemeter in a way that does not comply with the CalTRACK methods. For example, while the CalTRACK methods set specific hard limits for the purpose of standardization and consistency, the eemeter can be configured to edit or entirely ignore those limits. The main reason for this flexibility is that the emeter package is used not only to comply with the CalTRACK methods, but also to develop, test, and propose potential changes to those methods.

Rather than providing a single method that directly calculates avoided energy use from the required inputs, the eemeter library provides a series of modular functions that can be strung together in a variety of ways. The tutorial below describes common usage and sequencing of these functions, especially when it might not otherwise be apparent from the API documentation.

Some new users have assumed that the eemeter package constitutes an entire application suitable for running metering analytics at scale. This is not necessarily the case. It is designed instead to be embedded within other applications or to be used in one-off analyses. The eemeter is a toolbox that leaves to the user decisions about when to use or how to embed the provided tools within other applications. This limitation is an important consequence of the decision to make the methods and implementation as open and accessible as possible.

As you dive in, remember that this is a work in progress and that we welcome feedback and contributions. To contribute, please open an issue or a pull request on github.

*Note: these Jupyter cell magics enable some useful special features but are unrelated to eemeter.*

```
In [1]:
```# inline plotting
%matplotlib inline
# allow live package editing
%load_ext autoreload
%autoreload 2

```
In [2]:
```import eemeter

```
In [3]:
```eemeter.get_version()

```
Out[3]:
```

The three essential inputs to eemeter library functions are the following:

- Meter data
- Temperature data from a nearby weather station
- Project or intervention dates

Users of the library are responsible for obtaining and formatting this data (to get weather data, see eeweather, which helps perform site to weather station matching and can pull and cache temperature data directly from public (US) data sources). Some samples come loaded with the library and we'll load these first to save you the trouble of loading in your own data. The simulated sample data additionally has the useful property that we can load the same underlying data in three different frequencies: hourly, daily, and billing data.

We directly use pandas `DataFrame`

amd `Series`

objects to hold the input meter and temperature time series data, which allows us to easily take advantage of the powerful methods provided by the pandas package. Use pandas has the added advantage that usage is a bit more familiar to pythonistas who work frequently with data of this nature in python. These formats are discussed in more detail below. If working with your own data instead of these samples, please refer directly to the excellent pandas documentation for instructions for loading data (e.g., pandas.read_csv). For some common cases, eemeter does come packaged with loading methods, but these will only work for particular data formats.

Useful eemeter methods for loading and manipulating data:

`eemeter.meter_data_from_csv`

: Load meter data from CSV.`eemeter.temperature_data_from_csv`

: Load temperature data from CSV.`eemeter.meter_data_from_json`

: Load meter data from JSON.`eemeter.temperature_data_from_json`

: Load temperature data from JSON.`eemeter.samples`

: Return a list of sample data names.`eemeter.load_sample`

: Load sample data by name.`eemeter.as_freq`

: Coerce meter data into a different frequency.

*Remember: the sample data is simulated, not real!*

```
In [4]:
```meter_data_hourly, temperature_data_hourly, metadata_hourly = \
eemeter.load_sample('il-electricity-cdd-hdd-hourly')
meter_data_daily, temperature_data_daily, metadata_daily = \
eemeter.load_sample('il-electricity-cdd-hdd-daily')
meter_data_billing, temperature_data_billing, metadata_billing = \
eemeter.load_sample('il-electricity-cdd-hdd-billing_monthly')

```
In [5]:
```baseline_end_date = metadata_billing['blackout_start_date']
baseline_end_date

```
Out[5]:
```

The convention for formatting meter data is to create a pandas DataFrame with a DatetimeIndex called `start`

and a single column of meter readings called `value`

. The index datetime values represent the start dates of each metering period. The end of each period is the start of the next period, even for data with variable period lengths like billing data. The end date of the last period can be supplied by appending an extra period with the final end date and a NaN value. Missing data is represented by one or more periods of value NaN. Data should be sorted by time and deduplicated prior to use with eemeter. Timestamps must be timezone aware.

Data is formatted like this as a convenience to avoid the need to store a start and an end period for each data point. However, the convention that uses start dates as timestamps can be a bit confusing. Make sure that if you are starting with billing data, which is sometimes defined primarily by period end dates that the transformation is done properly so that the meter data ends up with start dates as time stamps.

Take a look at the hourly, daily, and billing data we just loaded. It follows the conventions described above. Notice that the format is identical but the timestamps and values are different.

```
In [6]:
```meter_data_hourly.head() # pandas.DataFrame.head filters to just the first 5 rows

```
Out[6]:
```

```
In [7]:
```meter_data_daily.head()

```
Out[7]:
```

```
In [8]:
```meter_data_billing.tail() # last 5 rows

```
Out[8]:
```

*always* start with hourly temperature data. This is necessary even for daily and billing analyses because we must be able to aggregate the temperatures in different ways over different time series - including dates in many different timezones, which have midnight timestamps which don't always align with the UTC midnights provided in preaggregated daily data.

```
In [9]:
```temperature_data_hourly.head()

```
Out[9]:
```

```
In [10]:
```temperature_data_daily.head()

```
Out[10]:
```

```
In [11]:
```temperature_data_billing.head()

```
Out[11]:
```

The eemeter plotting functions allow visual exploration of meter and temperature data.

Plotting in time series, we see the difference in the frequency of the data more clearly.

`eemeter.plot_time_series`

: Plot meter and temperature data in time series.

```
In [12]:
```eemeter.plot_time_series(meter_data_hourly, temperature_data_hourly, figsize=(16, 4))

```
Out[12]:
```

```
In [13]:
```eemeter.plot_time_series(meter_data_daily, temperature_data_daily, figsize=(16, 4))

```
Out[13]:
```

```
In [14]:
```eemeter.plot_time_series(meter_data_billing, temperature_data_billing, figsize=(16, 4))

```
Out[14]:
```

The following stacks the three versions of the data - hourly, billing and daily - right on top of each other in energy signature form. This shows the temperature dependence of usage on external temperatures. These plots convert the meter data to "usage per day", which normalizes things and makes usage patterns appear roughly comparable at different sampling intervals.

`eemeter.plot_energy_signature`

: Plot meter and temperature data as an energy signature.

*Remember, this data is simulated. If these correlations look too good to be true, they are!*

```
In [15]:
```ax = eemeter.plot_energy_signature(meter_data_hourly, temperature_data_hourly, figsize=(14, 8))
eemeter.plot_energy_signature(meter_data_daily, temperature_data_daily, ax=ax)
eemeter.plot_energy_signature(meter_data_billing, temperature_data_billing, ax=ax)
ax.legend(labels=['hourly', 'daily', 'billing'])

```
Out[15]:
```

The CalTRACK methods require building a model of the usage during the baseline period and then projecting that forward into the reporting period. Before we can build the baseline model we need to get isolate 365 days of meter data as immediately prior to the end of the baseline period as we can. The following function performs this filtering for us an returns a new dataset with only baseline data.

`eemeter.get_baseline_data`

: Filter a dataset to baseline period data.

```
In [16]:
```baseline_meter_data_hourly, baseline_warnings_hourly = eemeter.get_baseline_data(
meter_data_hourly, end=baseline_end_date, max_days=365)
baseline_meter_data_daily, baseline_warnings_daily = eemeter.get_baseline_data(
meter_data_daily, end=baseline_end_date, max_days=365)
baseline_meter_data_billing, baseline_warnings_billing = eemeter.get_baseline_data(
meter_data_billing, end=baseline_end_date, max_days=365)

`end`

argument. It's also no more than 365 days long, as we specified above with the `max_days`

argument. Notice that the billing data is a bit shorter because of the unevenness of billing periods. Billing periods that fall across (rather than exactly at) the boundaries are removed in this method.

```
In [17]:
```baseline_meter_data_hourly.tail()

```
Out[17]:
```

```
In [18]:
```baseline_meter_data_daily.tail()

```
Out[18]:
```

```
In [19]:
```baseline_meter_data_billing.tail()

```
Out[19]:
```

```
In [20]:
```baseline_warnings_hourly, baseline_warnings_daily, baseline_warnings_billing

```
Out[20]:
```

CalTRACK defines certain changes to the meter data such as:

- CalTRACK 2.2.3.4: Off-cycle reads (spanning less than 25 days) should be dropped from analysis
- CalTRACK 2.2.3.5: For pseudo-monthly billing cycles, periods spanning more than 35 days should be dropped from analysis. For bi-monthly billing cycles, periods spanning more than 70 days should be dropped from the analysis.
- CalTRACK 2.2.2.1: If summing to daily usage from higher frequency interval data, no more than 50% of high-frequency values should be missing. Missing values should be filled in with average of non-missing values (e.g., for hourly data, 24 * average hourly usage).

A helper function was created to handle these cases called `clean_caltrack_billing_daily_data`

```
In [24]:
```baseline_meter_data_billing = eemeter.clean_caltrack_billing_daily_data(baseline_meter_data_billing, 'billing')
baseline_meter_data_daily = eemeter.clean_caltrack_billing_daily_data(baseline_meter_data_daily, 'daily')
baseline_meter_data_daily_from_hourly = eemeter.clean_caltrack_billing_daily_data(baseline_meter_data_hourly, 'hourly')

The CalTRACK daily and billing methods specify a way of modeling the energy signature we plotted a few cells above. We need to select a model which fits the data as well as possible. The parameters in the model are heating and cooling balance points (i.e., the temperatures at which heating/cooling related energy use tend to kick in), and the heating and cooling beta parameters, which define the slope of the energy response to incremental differences between outdoor temperature and the balance point. We'll do a grid search over possible heating and cooling balance points and fit models to the heating and cooling degree days) defined by the outdoor temperatures and each of those balance points. To do this, we precompute the heating and cooling degree days using the methods below before we feed them into the modeling routines.

To make this dataset, we need to merge the meter data and temperature data into a single DataFrame. The `compute_usage_per_day_feature`

function transforms the meter data into usage per day. The `compute_temperature_features`

function lets us create a bunch of heating and cooling degree day values if we specify balance points to use. In this case, we'll use the wide balance point ranges recommended in the CalTRACK spec. Then we can combine the two using `merge_features`

.

`eemeter.create_caltrack_daily_design_matrix`

: Create a design matrix for CalTRACK daily methods.`eemeter.create_caltrack_billing_design_matrix`

: Create a design matrix for CalTRACK billing methods.`eemeter.compute_usage_per_day_feature`

: Transform meter data into usage per day.`eemeter.compute_temperature_features`

: Compute heating and cooling degree days and other useful temperature features.`eemeter.merge_features`

: Combine a list of Dataframe or Series objects which share an index into a single DataFrame.

```
In [25]:
```design_matrix_daily = eemeter.create_caltrack_daily_design_matrix(
baseline_meter_data_daily, temperature_data_daily,
)

A preview of this dataset is shown below:

```
In [26]:
```design_matrix_daily.tail()

```
Out[26]:
```

```
In [27]:
```design_matrix_daily.index.min(), design_matrix_daily.index.max()

```
Out[27]:
```

```
In [28]:
```design_matrix_billing = eemeter.create_caltrack_billing_design_matrix(
baseline_meter_data_billing, temperature_data_billing,
)

`compute_usage_per_day_feature`

- that is because the values are returned as average usage per day, as specified by the CalTRACK methods, not as totals per period, as they are represented in the inputs. The heating/cooling degree days returned by `compute_temperature_features`

are also average heating/cooling degree days per day, and not total heating/cooling degree days per period. This averaging behavior can be modified with the `use_mean_daily_values`

parameter, which is set to `True`

by default.

```
In [29]:
```design_matrix_billing.tail()

```
Out[29]:
```

```
In [30]:
```design_matrix_billing.index.min(), design_matrix_billing.index.max()

```
Out[30]:
```

The hourly methods require a multi-stage dataset creation process which is a bit more involved than the daily/billing dataset creation process above. There are two primary reasons for this extra complexity. First, unlike the daily/billing methods, the hourly methods build separate models for each calendar month, which adds a few extra steps. Second, also unlike the billing and daily methods, there are two features of the dataset creation which must themselves be fitted to a preliminary dataset -- the occupancy feature and the temperature bin features.

The preliminary dataset has some simple time and temperature features. These features do not vary by segment and are precursors to other features (See below for a better explanation of segmentation). This step looks a lot like the daily/billing dataset creation. These features are used subsequently to fit the occupancy and temperature bin features.

`eemeter.create_caltrack_hourly_preliminary_design_matrix`

: Create a design matrix for the first stage of CalTRACK hourly.`eemeter.compute_time_features`

: Create a time feature for the index (`time_of_week`

).`eemeter.compute_temperature_features`

: Compute heating and cooling degree days and other useful temperature features.`eemeter.merge_features`

: Combine a list of Dataframe or Series objects which share an index into a single DataFrame.

```
In [31]:
```preliminary_design_matrix_hourly = eemeter.create_caltrack_hourly_preliminary_design_matrix(
baseline_meter_data_hourly, temperature_data_hourly,
)

`7*24`

).

```
In [32]:
```preliminary_design_matrix_hourly.tail()

```
Out[32]:
```

To handle creating multiple independent models on a shared dataset (as is required for CalTRACK hourly), we have introduced a concept which we are calling segmentation. Segmentation breaks a dataset into $n$ named and weighted subsets.

Before we can move on to the next steps of creating the CalTRACK hourly dataset, we need to create a segmentation for the hourly data. We will use this to create 12 independent hourly models - one for each month of the calendar year. The eemeter function for creating these weights is called `segment_time_series`

and it takes a `DatetimeIndex`

as input.

This segmentation matrix contains 1 column for each segment (12 in all), each of which contains the segmentation weights for that column. The segmentation scheme we use here is to have one segment for each month which contains a single fully weighted calendar month and two half-weighted neighboring calendar months. The eemeter code name for this segmentation scheme is called `'three_month_weighted'`

(There's also `all`

, `one_month`

, and `three_month`

, each of which behaves a bit differently).

We are creating this segmentation over the time index of the baseline period that is represented in the preliminary hourly design matrix.

`eemeter.segment_time_series`

: Create a segmentation using the specified scheme.

```
In [33]:
```segmentation_hourly = eemeter.segment_time_series(
preliminary_design_matrix_hourly.index,
'three_month_weighted'
)
segmentation_hourly.head()

```
Out[33]:
```

`dec-jan-feb-weighted`

segment (which will eventually be used to estimate usage for january) includes a fully weighted january but also half-weighted december and february. These weights wrap around the calendar year, so both January and December of 2017 might end up in the same dataset.

```
In [34]:
```# example segmentation weights
segmentation_hourly[[
'dec-jan-feb-weighted',
'apr-may-jun-weighted',
'jun-jul-aug-weighted'
]].plot.area(stacked=False, alpha=0.3, figsize=(15, 2.5))

```
Out[34]:
```

Occupancy is estimated by building a simple model from the preliminary design matrix hdd_50 and cdd_65 columns. This is done for each segment independently, so results are returned as a dataframe with one segment of results per column. The `segmentation`

argument indicates that the analysis should be done once per segment. Occupancy is determined by hour of week category. A value of 1 for a particular hour indicates an "occupied" mode, and a value of 0 indicates "unoccupied" mode. These modes are determined by the tendency of the hdd_50/cdd_65 model to over- or under-predict usage for that hour, given a particular threshold between 0 and 1 (if the percent of underpredictions (by count) is lower than that threshold, then the mode is "unoccupied", otherwise the mode is "occupied").

`eemeter.estimate_hour_of_week_occupancy`

: Estimate occupancy by time of week for each segment.

```
In [35]:
```occupancy_lookup_hourly = eemeter.estimate_hour_of_week_occupancy(
preliminary_design_matrix_hourly,
segmentation=segmentation_hourly,
# threshold=0.65 # default
)

The occupancy lookup is organized by hour of week (rows) and model segment (columns).

```
In [36]:
```occupancy_lookup_hourly.head()

```
Out[36]:
```

Temperature bins are fit for each segment such that each bin has sufficient number of temperature readings. Bins are defined by starting with a proposed set of bins (see the `default_bins`

argument) and systematically dropping bin endpoints. Bins themselves are not dropped but are effectively combined with neighboring bins. Except for the fact that zero-weighted times are dropped, segment weights are not considered when fitting temperature bins.

`eemeter.fit_temperature_bins`

: Fit temperature bins to data, dropping bin endpoints for bins that do not meet the minimum temperature count such that remaining bins meet the minimum count.

```
In [37]:
```temperature_bins_hourly = eemeter.fit_temperature_bins(
preliminary_design_matrix_hourly,
segmentation=segmentation_hourly,
# default_bins=[30, 45, 55, 65, 75, 90], # default
# min_temperature_count=20 # default
)

`False`

. You'll notice in this dataset that the the winter months tend to have combined high temperature bins and the summer months tend to have combined low temperature bins.

```
In [38]:
``````
temperature_bins_hourly
```

```
Out[38]:
```

`iterate_segmented_dataset`

and a prefabricated feature processor `caltrack_hourly_fit_feature_processor`

which is provided to assist creating the segmented dataset given a preliminary design matrix of the form created above. The feature processor transforms the each segment of the dataset using the occupancy lookup and temperature bins created above. We are creating a python `dict`

of pandas `Dataframes`

- one for each time series segment encountered in the baseline data.

```
In [39]:
```segmented_design_matrices_hourly = eemeter.create_caltrack_hourly_segmented_design_matrices(
preliminary_design_matrix_hourly,
segmentation_hourly,
occupancy_lookup_hourly,
temperature_bins_hourly,
)

```
In [40]:
```print(segmented_design_matrices_hourly.keys())
segmented_design_matrices_hourly['dec-jan-feb-weighted'].head()

```
Out[40]:
```

```
In [41]:
```baseline_model_results_daily = eemeter.fit_caltrack_usage_per_day_model(
design_matrix_daily,
)

```
In [42]:
```baseline_model_results_billing = eemeter.fit_caltrack_usage_per_day_model(
design_matrix_billing,
use_billing_presets=True,
weights_col='n_days_kept',
)

```
In [43]:
```baseline_segmented_model_hourly = eemeter.fit_caltrack_hourly_model(
segmented_design_matrices_hourly,
occupancy_lookup_hourly,
temperature_bins_hourly,
)

```
In [44]:
```ax = eemeter.plot_energy_signature(meter_data_daily, temperature_data_daily)
baseline_model_results_daily.plot(ax=ax, temp_range=(-5, 88))

```
Out[44]:
```

```
In [45]:
```ax = eemeter.plot_energy_signature(meter_data_billing, temperature_data_billing)
baseline_model_results_billing.plot(ax=ax, temp_range=(18, 80))

```
Out[45]:
```

```
In [46]:
```ax = baseline_model_results_daily.model.plot(color='C0', best=True, label='daily')
ax = baseline_model_results_billing.model.plot(ax=ax, color='C1', best=True, label='billing')
ax.legend()

```
Out[46]:
```

```
In [47]:
```reporting_start_date = metadata_billing['blackout_start_date']

Now we get the first year of data for that period.

```
In [48]:
```reporting_meter_data_hourly, warnings = eemeter.get_reporting_data(
meter_data_hourly, start=reporting_start_date, max_days=365)
reporting_meter_data_daily, warnings = eemeter.get_reporting_data(
meter_data_daily, start=reporting_start_date, max_days=365)
reporting_meter_data_billing, warnings = eemeter.get_reporting_data(
meter_data_billing, start=reporting_start_date, max_days=365)

`eemeter.metered_savings`

method performs the logic of estimating counterfactual baseline reporting period usage. For this, it requires the fitted baseline model, the reporting period meter data (for its index - so that it can be properly joined later), and corresponding temperature data. Note that this method can return results disaggregated into base load, cooling load, or heating load or as the aggregated usage. We do both here for demonstration purposes.

```
In [49]:
```metered_savings_hourly, error_bands = eemeter.metered_savings(
baseline_segmented_model_hourly, reporting_meter_data_hourly,
temperature_data_hourly
)
metered_savings_daily, error_bands = eemeter.metered_savings(
baseline_model_results_daily, reporting_meter_data_daily,
temperature_data_daily, with_disaggregated=True
)
metered_savings_billing, error_bands = eemeter.metered_savings(
baseline_model_results_billing, reporting_meter_data_billing,
temperature_data_billing, with_disaggregated=True
)

```
In [50]:
```metered_savings_hourly.head()

```
Out[50]:
```

```
In [51]:
```metered_savings_daily.head()

```
Out[51]:
```

```
In [52]:
```metered_savings_billing.head()

```
Out[52]:
```

```
In [53]:
```columns = ["reporting_observed", "counterfactual_usage", "metered_savings"]

```
In [54]:
```metered_savings_hourly[columns].resample('MS').sum().plot(figsize=(10, 6), drawstyle="steps-post")

```
Out[54]:
```

```
In [55]:
```metered_savings_daily[columns].resample('MS').sum().plot(figsize=(10, 6), drawstyle="steps-post")

```
Out[55]:
```

```
In [56]:
```metered_savings_billing[columns].plot(figsize=(10, 6), drawstyle="steps-post")

```
Out[56]:
```

These can be easily aggregated

```
In [57]:
```total_savings_hourly = metered_savings_hourly.metered_savings.sum()
percent_savings_hourly = total_savings_hourly / metered_savings_hourly.counterfactual_usage.sum() * 100
print('Hourly: Saved {:.1f} kWh in first year ({:.1f}%)'.format(total_savings_hourly, percent_savings_hourly))
total_savings_daily = metered_savings_daily.metered_savings.sum()
percent_savings_daily = total_savings_daily / metered_savings_daily.counterfactual_usage.sum() * 100
print('Daily: Saved {:.1f} kWh in first year ({:.1f}%)'.format(total_savings_daily, percent_savings_daily))
total_savings_billing = metered_savings_billing.metered_savings.sum()
percent_savings_billing = total_savings_billing / metered_savings_billing.counterfactual_usage.sum() * 100
print('Billing: Saved {:.1f} kWh in first year ({:.1f}%)'.format(total_savings_billing, percent_savings_billing))

```
```

**NOTE**: These results differ somewhat due to the different lengths of the reporting periods - the billing version of the reporting period was a bit shorter because the billing periods over which we computed savings didn't exactly align with 365 day period we requested, as they did for the daily reporting period data.

```
In [58]:
```reporting_preliminary_design_matrix_hourly = eemeter.create_caltrack_hourly_preliminary_design_matrix(
reporting_meter_data_hourly, temperature_data_hourly,
)
reporting_segmentation_hourly = eemeter.segment_time_series(
reporting_preliminary_design_matrix_hourly.index,
'three_month_weighted'
)
reporting_occupancy_lookup_hourly = eemeter.estimate_hour_of_week_occupancy(
reporting_preliminary_design_matrix_hourly,
segmentation=reporting_segmentation_hourly,
)
reporting_temperature_bins_hourly = eemeter.fit_temperature_bins(
reporting_preliminary_design_matrix_hourly,
segmentation=reporting_segmentation_hourly,
)
reporting_segmentation_design_matrices_hourly = eemeter.create_caltrack_hourly_segmented_design_matrices(
reporting_preliminary_design_matrix_hourly,
reporting_segmentation_hourly,
reporting_occupancy_lookup_hourly,
reporting_temperature_bins_hourly
)

```
In [59]:
```reporting_design_matrix_daily = eemeter.create_caltrack_daily_design_matrix(
reporting_meter_data_daily, temperature_data_daily,
)

```
In [60]:
```reporting_design_matrix_billing = eemeter.create_caltrack_billing_design_matrix(
reporting_meter_data_billing, temperature_data_billing,
)

```
In [61]:
```reporting_segmented_model_hourly = eemeter.fit_caltrack_hourly_model(
reporting_segmentation_design_matrices_hourly,
reporting_occupancy_lookup_hourly,
reporting_temperature_bins_hourly
)

```
In [62]:
```reporting_model_results_daily = eemeter.fit_caltrack_usage_per_day_model(
reporting_design_matrix_daily,
)

```
In [63]:
```reporting_model_results_billing = eemeter.fit_caltrack_usage_per_day_model(
reporting_design_matrix_billing,
use_billing_presets=True,
weights_col='n_days_kept',
)

```
In [64]:
```ax = eemeter.plot_energy_signature(meter_data_daily, temperature_data_daily)
ax = baseline_model_results_daily.model.plot(ax=ax, color='C1', best=True, label='baseline', temp_range=(-5, 88))
ax = reporting_model_results_daily.model.plot(ax=ax, color='C2', best=True, label='reporting', temp_range=(-5, 88))
ax.legend()

```
Out[64]:
```

```
In [65]:
```ax = eemeter.plot_energy_signature(meter_data_billing, temperature_data_billing)
ax = baseline_model_results_billing.model.plot(ax=ax, color='C1', best=True, label='baseline', temp_range=(18, 80))
ax = reporting_model_results_billing.model.plot(ax=ax, color='C2', best=True, label='reporting', temp_range=(18, 80))
ax.legend()

```
Out[65]:
```

```
In [66]:
```import pandas as pd
normal_year_temperatures = temperature_data_daily[temperature_data_daily.index.year == 2017]
result_index = pd.date_range('2017-01-01', periods=365, freq='D', tz='UTC')

Now we are ready to obtain our annualized savings.

```
In [69]:
```annualized_savings_hourly, annualized_savings_warnings_hourly = eemeter.modeled_savings(
baseline_segmented_model_hourly, reporting_segmented_model_hourly,
result_index, normal_year_temperatures, with_disaggregated=True
)
annualized_savings_daily, annualized_savings_warnings_daily = eemeter.modeled_savings(
baseline_model_results_daily, reporting_model_results_daily,
result_index, normal_year_temperatures, with_disaggregated=True
)
annualized_savings_billing, annualized_savings_warnings_billing = eemeter.modeled_savings(
baseline_model_results_billing, reporting_model_results_billing,
result_index, normal_year_temperatures, with_disaggregated=True
)

```
In [70]:
```annualized_savings_hourly.head()

```
Out[70]:
```

```
In [71]:
```annualized_savings_daily.head()

```
Out[71]:
```

```
In [72]:
```annualized_savings_billing.head()

```
Out[72]:
```

```
In [73]:
```import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 1, figsize=(10, 4))
annualized_savings_hourly[[
'modeled_baseline_usage',
'modeled_reporting_usage',
'modeled_savings',
]].plot(ax=axes)
axes.set_title('Total normalized/annualized savings')
plt.show()

```
```

```
In [74]:
```import matplotlib.pyplot as plt
fig, axes = plt.subplots(4, 1, figsize=(10, 16))
annualized_savings_daily[[
'modeled_baseline_usage',
'modeled_reporting_usage',
'modeled_savings',
]].plot(ax=axes[0])
axes[0].set_title('Total normalized/annualized savings')
annualized_savings_daily[[
'modeled_baseline_cooling_load',
'modeled_reporting_cooling_load',
'modeled_cooling_load_savings',
]].plot(ax=axes[1])
axes[1].set_title('Modeled cooling load savings')
annualized_savings_daily[[
'modeled_baseline_heating_load',
'modeled_reporting_heating_load',
'modeled_heating_load_savings',
]].plot(ax=axes[2])
axes[2].set_title('Modeled heating load savings')
ax = annualized_savings_daily[[
'modeled_baseline_base_load',
'modeled_reporting_base_load',
'modeled_base_load_savings',
]].plot(ax=axes[3])
axes[3].set_title('Modeled base load savings')
lim = axes[3].set_ylim((0, None))
plt.show()

```
```

```
In [75]:
```import matplotlib.pyplot as plt
fig, axes = plt.subplots(4, 1, figsize=(10, 16))
annualized_savings_billing[[
'modeled_baseline_usage',
'modeled_reporting_usage',
'modeled_savings',
]].plot(ax=axes[0])
axes[0].set_title('Total normalized/annualized savings')
annualized_savings_billing[[
'modeled_baseline_cooling_load',
'modeled_reporting_cooling_load',
'modeled_cooling_load_savings',
]].plot(ax=axes[1])
axes[1].set_title('Modeled cooling load savings')
annualized_savings_billing[[
'modeled_baseline_heating_load',
'modeled_reporting_heating_load',
'modeled_heating_load_savings',
]].plot(ax=axes[2])
axes[2].set_title('Modeled heating load savings')
ax = annualized_savings_billing[[
'modeled_baseline_base_load',
'modeled_reporting_base_load',
'modeled_base_load_savings',
]].plot(ax=axes[3])
axes[3].set_title('Modeled base load savings')
lim = axes[3].set_ylim((0, None))
plt.show()

```
```