Datastore basic usage

The datastore is a tool for using the eemeter which automates and helps to scales some of the most frequent tasks accomplished by the eemeter. These tasks include data loading and storage, meter running, and result storage and inspection. It puts a REST API in front of the eemeter.

Note:

For small and large datasets, the ETL toolkit exists to ease and speed up this process. That toolkit relies upon the API described in this tutorial and in the datastore API documentation. For the purpose of this tutorial, we will not be using the ETL toolkit. For more information on the ETL toolkit, see its API documentation.

Loading data

For this tutorial, we will use the python requests package to make requests to the datastore. We will use the same dataset used in the eemeter tutorial, available for download here:

  • project data CSV
  • energy data CSV

This tutorial is also available as a jupyter notebook.


In [1]:
# library imports
import pandas as pd
import requests
import pytz

If you followed the datastore development setup instructions, you will already have run the command to create a superuser and access credentials.

python manage.py dev_seed

If you haven't already done so, do so now. The dev_seed command creates a demo admin user and a sample project.

  • username: demo,
  • password: demo-password,
  • API access token: tokstr.
  • project owner: 1

Ensure that your development server is running locally on port 8000 before continuing.

python manage.py runserver

Each request will include an Authorization header

Authorization: Bearer tokstr

In [2]:
base_url = "http://0.0.0.0:8000"
headers = {"Authorization": "Bearer tokstr"}

Let's test the API by requesting a list of projects in the datastore. Since the dev_seed command creates a sample project, this will return a response showing that project.


In [3]:
url = base_url + "/api/v1/projects/"
projects = requests.get(url, headers=headers).json()

In [4]:
projects


Out[4]:
[{'baseline_period_end': '2012-01-01T00:00:00Z',
  'baseline_period_start': None,
  'id': 1,
  'project_id': 'ABC',
  'project_owner': 1,
  'reporting_period_end': None,
  'reporting_period_start': '2012-02-01T00:00:00Z',
  'zipcode': '91104'}]

Although we'll delete this one in a moment, we can first explore a bit to get a feel for the API. Then we'll create a project of our own.

Energy trace data will be associated with this project by foreign key. It is organized into time series by trace_id, and the following request will show all traces associated with a particular project. Note the difference between the 'id' field and the 'project_id' field. The 'project_id' field is the unique label that was associated with it by an external source; the 'id' field is the database table primary key.

There are two tables used to store energy data:

  1. Consumption metadata:
    • project_id: foreign key of the project this belongs to
    • trace_id: unique id of the trace
    • interpretation: the fuel (electricity/natural gas) and the direction and type of flow (net/total consumption, supply, generation)
    • unit: physical units of the values provided in records
  2. Consumption records:
    • metadata_id: foreign key to consumption metadata
    • start: the start date and time of record
    • value: the energy reading as reported
    • estimated: boolean indicating whether or not the reading was estimated

For consumption records, the end is implicit in the start of the next temporal record. The last record should be null (if it's not, it will be treated as such).

Let's inspect the traces associated with this project. We can do so using the project primary key 'id' as a filter (we use the summary flag so that we don't pull every record):


In [5]:
url = base_url + "/api/v1/consumption_metadatas/?summary=True&projects={}".format(projects[0]['id'])
consumption_metadatas = requests.get(url, headers=headers).json()

In [6]:
consumption_metadatas[0]


Out[6]:
{'id': 1,
 'interpretation': 'NG_C_S',
 'label': None,
 'project': {'baseline_period_end': '2012-01-01T00:00:00Z',
  'baseline_period_start': None,
  'id': 1,
  'project_id': 'ABC',
  'project_owner': 1,
  'reporting_period_end': None,
  'reporting_period_start': '2012-02-01T00:00:00Z',
  'zipcode': '91104'},
 'unit': 'THM'}

We can also query for consumption records by metadata primary key.


In [7]:
url = base_url + "/api/v1/consumption_records/?metadata={}".format(consumption_metadatas[0]['id'])
consumption_records = requests.get(url, headers=headers).json()

In [8]:
consumption_records[:3]


Out[8]:
[{'estimated': False,
  'id': 1,
  'metadata': 1,
  'start': '2010-01-01T00:00:00Z',
  'value': None},
 {'estimated': False,
  'id': 2,
  'metadata': 1,
  'start': '2010-02-01T00:00:00Z',
  'value': 1.0},
 {'estimated': False,
  'id': 3,
  'metadata': 1,
  'start': '2010-03-01T00:00:00Z',
  'value': 1.0}]

Now we'll delete the project that was created by the dev_seed command and make one of our own.


In [9]:
url = base_url + "/api/v1/projects/{}/".format(projects[0]['id'])
requests.delete(url, headers=headers)


Out[9]:
<Response [204]>

In [10]:
project_data = pd.read_csv('sample-project-data.csv',
                           parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]

In [11]:
project_data


Out[11]:
project_id                             ABC
zipcode                              50321
retrofit_start_date    2013-06-01 00:00:00
retrofit_end_date      2013-07-01 00:00:00
Name: 0, dtype: object

In [12]:
data = {
    "project_id": project_data.project_id,
    "zipcode": str(project_data.zipcode),
    "baseline_period_end": pytz.UTC.localize(project_data.retrofit_start_date).isoformat(),
    "reporting_period_start": pytz.UTC.localize(project_data.retrofit_end_date).isoformat(),
    "project_owner": 1,
}
print(data)


{'project_owner': 1, 'baseline_period_end': '2013-06-01T00:00:00+00:00', 'zipcode': '50321', 'reporting_period_start': '2013-07-01T00:00:00+00:00', 'project_id': 'ABC'}

In [13]:
url = base_url + "/api/v1/projects/"
new_project = requests.post(url, json=data, headers=headers).json()
new_project


Out[13]:
{'baseline_period_end': '2013-06-01T00:00:00Z',
 'baseline_period_start': None,
 'id': 2,
 'project_id': 'ABC',
 'project_owner': 1,
 'reporting_period_end': None,
 'reporting_period_start': '2013-07-01T00:00:00Z',
 'zipcode': '50321'}

If you try to post another project with the same project_id, you'll get an error message.


In [14]:
url = base_url + "/api/v1/projects/"
requests.post(url, json=data, headers=headers).json()


Out[14]:
{'project_id': ['project with this project id already exists.']}

However, there is another endpoint you can hit to sync the project - update it if it exists, create it if it doesn't. This endpoint works almost the same way, but expects a list of data in a slightly different format:


In [15]:
data = [
    {
        "project_id": project_data.project_id,
        "zipcode": str(project_data.zipcode),
        "baseline_period_end": pytz.UTC.localize(project_data.retrofit_start_date).isoformat(),
        "reporting_period_start": pytz.UTC.localize(project_data.retrofit_end_date).isoformat(),
        "project_owner_id": 1,
    }
]
print(data)


[{'baseline_period_end': '2013-06-01T00:00:00+00:00', 'zipcode': '50321', 'project_owner_id': 1, 'reporting_period_start': '2013-07-01T00:00:00+00:00', 'project_id': 'ABC'}]

In [16]:
url = base_url + "/api/v1/projects/sync/"
requests.post(url, json=data, headers=headers).json()


Out[16]:
[{'baseline_period_end': '2013-06-01T00:00:00Z',
  'baseline_period_start': None,
  'id': 2,
  'project_id': 'ABC',
  'project_owner': 1,
  'reporting_period_end': None,
  'reporting_period_start': '2013-07-01T00:00:00Z',
  'status': 'unchanged - same record',
  'zipcode': '50321'}]

Now we can give this project some consumption data. Ene


In [17]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
                          parse_dates=['date'], dtype={'zipcode': str})
energy_data.head()


Out[17]:
project_id trace_id date value unit fuel estimated
0 ABC DEF 2011-01-01 57.8 kWh electricity False
1 ABC DEF 2011-01-02 64.8 kWh electricity False
2 ABC DEF 2011-01-03 49.5 kWh electricity False
3 ABC DEF 2011-01-04 80.0 kWh electricity False
4 ABC DEF 2011-01-05 51.7 kWh electricity False

Then we'll the sync endpoint for consumption metadata, which will create a new record or update an existing record. We have one trace here:


In [18]:
interpretation_mapping = {"electricity": "E_C_S"}
data = [
    {
        "project_project_id": energy_data.iloc[0]["project_id"],
        "interpretation": interpretation_mapping[energy_data.iloc[0]["fuel"]],
        "unit": energy_data.iloc[0]["unit"].upper(),
        "label": energy_data.iloc[0]["trace_id"].upper()
    }
]
data


Out[18]:
[{'interpretation': 'E_C_S',
  'label': 'DEF',
  'project_project_id': 'ABC',
  'unit': 'KWH'}]

In [19]:
url = base_url + "/api/v1/consumption_metadatas/sync/"
consumption_metadatas = requests.post(url, json=data, headers=headers).json()

In [20]:
consumption_metadatas


Out[20]:
[{'id': 7,
  'interpretation': 'E_C_S',
  'label': 'DEF',
  'project': {'baseline_period_end': '2013-06-01T00:00:00Z',
   'baseline_period_start': None,
   'id': 2,
   'project_id': 'ABC',
   'project_owner': 1,
   'reporting_period_end': None,
   'reporting_period_start': '2013-07-01T00:00:00Z',
   'zipcode': '50321'},
  'status': 'created',
  'unit': 'KWH'}]

Let's turn that CSV into records.


In [21]:
data = [{
    "metadata_id": consumption_metadatas[0]['id'],
    "start": pytz.UTC.localize(row.date.to_datetime()).isoformat(),
    "value": row.value,
    "estimated": row.estimated,
} for _, row in energy_data.iterrows()]
data[:3]


Out[21]:
[{'estimated': False,
  'metadata_id': 7,
  'start': '2011-01-01T00:00:00+00:00',
  'value': 57.8},
 {'estimated': False,
  'metadata_id': 7,
  'start': '2011-01-02T00:00:00+00:00',
  'value': 64.8},
 {'estimated': False,
  'metadata_id': 7,
  'start': '2011-01-03T00:00:00+00:00',
  'value': 49.5}]

In [22]:
url = base_url + "/api/v1/consumption_records/sync2/"
consumption_records = requests.post(url, json=data, headers=headers)

In [23]:
consumption_records.text


Out[23]:
'{"status":"success"}'

We can verify that these records were created by querying by consumption metadata id.


In [24]:
url = base_url + "/api/v1/consumption_records/?metadata={}".format(consumption_metadatas[0]['id'])
consumption_records = requests.get(url, json=data, headers=headers).json()

In [25]:
consumption_records[:3]


Out[25]:
[{'estimated': False,
  'id': 290027,
  'metadata': 7,
  'start': '2011-01-08T00:00:00Z',
  'value': 64.1},
 {'estimated': False,
  'id': 290028,
  'metadata': 7,
  'start': '2011-01-12T00:00:00Z',
  'value': 97.3},
 {'estimated': False,
  'id': 290029,
  'metadata': 7,
  'start': '2011-01-13T00:00:00Z',
  'value': 51.4}]

We now have a simple project with a single trace of data. Now we will move to running a meter on that project:

Running meters

To run a meter, make a request to create a "project run". This request will start a job that runs a meter and saves its results.

There are a few components to this request.

  1. "project": the primary key of the project.
  2. "meter_class": the name of the class of the eemeter meter to run.
  3. "meter_settings": any special settings to send to the meter class.

In [26]:
data = {
    "project": new_project['id'],
    "meter_class": "EnergyEfficiencyMeter",
    "meter_settings": {}
}
data


Out[26]:
{'meter_class': 'EnergyEfficiencyMeter', 'meter_settings': {}, 'project': 2}

In [27]:
url = base_url + "/api/v1/project_runs/"
project_run = requests.post(url, json=data, headers=headers).json()
project_run


Out[27]:
{'added': '2016-08-12T22:08:19.554549Z',
 'id': 1,
 'meter_class': 'EnergyEfficiencyMeter',
 'meter_settings': {},
 'project': 2,
 'status': 'PENDING',
 'traceback': None,
 'updated': '2016-08-12T22:08:19.554586Z'}

This creates a task to run the meter on the indicated project.

These results can be viewed by requesting the project run by primary key - as it completes, its status will change to SUCCESS or FAILED. If FAILED, it will indicate a traceback of the error that occured. While it runs, its status will be RUNNING; before it has started running, its status will be PENDING.


In [28]:
url = base_url + "/api/v1/project_runs/{}/".format(project_run['id'])
project_runs = requests.get(url, headers=headers).json()
project_runs


Out[28]:
{'added': '2016-08-12T22:08:19.554549Z',
 'id': 1,
 'meter_class': 'EnergyEfficiencyMeter',
 'meter_settings': {},
 'project': 2,
 'status': 'SUCCESS',
 'traceback': None,
 'updated': '2016-08-12T22:08:22.971937Z'}

If this project run succeeded, we can inspect its results.

Inspecting results

Results all fall under the ProjectResult API


In [29]:
url = base_url + "/api/v1/project_results/"
project_results = requests.get(url, headers=headers).json()
project_results


Out[29]:
[{'added': '2016-08-12T22:08:22.767276Z',
  'derivative_aggregations': [{'baseline_lower': 276.314732221288,
    'baseline_n': 1139,
    'baseline_upper': 251.679621341039,
    'baseline_value': 31806.3708558697,
    'id': 4,
    'interpretation': 'gross_predicted',
    'modeling_period_group': 1,
    'reporting_lower': 242.446604842159,
    'reporting_n': 1139,
    'reporting_upper': 215.374360422304,
    'reporting_value': 25208.1013739325,
    'trace_interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED'},
   {'baseline_lower': 156.418677953027,
    'baseline_n': 365,
    'baseline_upper': 142.473017350216,
    'baseline_value': 11051.6386089923,
    'id': 3,
    'interpretation': 'annualized_weather_normal',
    'modeling_period_group': 1,
    'reporting_lower': 137.246310027507,
    'reporting_n': 365,
    'reporting_upper': 121.92101539941,
    'reporting_value': 8758.27781819607,
    'trace_interpretation': 'ELECTRICITY_CONSUMPTION_SUPPLIED'},
   {'baseline_lower': 276.314732221288,
    'baseline_n': 1139,
    'baseline_upper': 251.679621341039,
    'baseline_value': 31806.3708558697,
    'id': 2,
    'interpretation': 'gross_predicted',
    'modeling_period_group': 1,
    'reporting_lower': 242.446604842159,
    'reporting_n': 1139,
    'reporting_upper': 215.374360422304,
    'reporting_value': 25208.1013739325,
    'trace_interpretation': 'ALL_FUELS_CONSUMPTION_SUPPLIED'},
   {'baseline_lower': 156.418677953027,
    'baseline_n': 365,
    'baseline_upper': 142.473017350216,
    'baseline_value': 11051.6386089923,
    'id': 1,
    'interpretation': 'annualized_weather_normal',
    'modeling_period_group': 1,
    'reporting_lower': 137.246310027507,
    'reporting_n': 365,
    'reporting_upper': 121.92101539941,
    'reporting_value': 8758.27781819607,
    'trace_interpretation': 'ALL_FUELS_CONSUMPTION_SUPPLIED'}],
  'eemeter_version': '0.4.6',
  'energy_trace_model_results': [{'cvrmse': 0.264591493635319,
    'derivatives': [{'id': 3,
      'interpretation': 'annualized_weather_normal',
      'lower': 156.418677953027,
      'n': 365,
      'upper': 142.473017350216,
      'value': 11051.6386089923},
     {'id': 4,
      'interpretation': 'gross_predicted',
      'lower': 276.314732221288,
      'n': 1139,
      'upper': 251.679621341039,
      'value': 31806.3708558697}],
    'energy_trace': 7,
    'id': 2,
    'lower': 7.45737856047136,
    'model_serializiation': None,
    'modeling_period': 1,
    'n': 882.0,
    'project_result': 1,
    'r2': 0.880792249035995,
    'rmse': 7.80520906995742,
    'status': 'SUCCESS',
    'upper': 8.18732779805489},
   {'cvrmse': 0.283371252891545,
    'derivatives': [{'id': 1,
      'interpretation': 'annualized_weather_normal',
      'lower': 137.246310027507,
      'n': 365,
      'upper': 121.92101539941,
      'value': 8758.27781819607},
     {'id': 2,
      'interpretation': 'gross_predicted',
      'lower': 242.446604842159,
      'n': 1139,
      'upper': 215.374360422304,
      'value': 25208.1013739325}],
    'energy_trace': 7,
    'id': 1,
    'lower': 6.38163761265412,
    'model_serializiation': None,
    'modeling_period': 2,
    'n': 549.0,
    'project_result': 1,
    'r2': 0.896040924767889,
    'rmse': 6.75884311655434,
    'status': 'SUCCESS',
    'upper': 7.18380019556305}],
  'id': 1,
  'meter_class': 'EnergyEfficiencyMeter',
  'meter_settings': {},
  'modeling_period_groups': [{'baseline_period': 1,
    'id': 1,
    'reporting_period': 2}],
  'modeling_periods': [{'end_date': '2013-06-01T00:00:00Z',
    'id': 1,
    'interpretation': 'BASELINE',
    'start_date': None},
   {'end_date': None,
    'id': 2,
    'interpretation': 'REPORTING',
    'start_date': '2013-07-01T00:00:00Z'}],
  'project': 2,
  'updated': '2016-08-12T22:08:22.767313Z'}]

These results can be interpreted as follows:

  • 'id': the primary key of the project result
  • 'eemeter_version': the version of the eemeter package installed in the datastore
  • 'meter_class': the name of the class of the meter used to obtain the results
  • 'meter_settings: the settings of the eemeter
  • 'project': the primary key of the project
  • 'modeling_period_groups': pairs of baseline/reporting modeling periods
  • 'modeling_periods': all modeling periods
  • 'energy_trace_model_restuls': the trace/modeling period specific results
  • 'derivative_aggregations': the aggregated results for this project
  • 'added': When the project result was created
  • 'updated': when the project result was created or updated (should not have been updated)

The most useful of these are the 'derivative_aggregations', which include baseline/reporting values for each of the possible trace interpretations.