The datastore is a tool for using the eemeter which automates and helps to scales some of the most frequent tasks accomplished by the eemeter. These tasks include data loading and storage, meter running, and result storage and inspection. It puts a REST API in front of the eemeter.
Note:
For small and large datasets, the ETL toolkit exists to ease and speed up this process. That toolkit relies upon the API described in this tutorial and in the datastore API documentation. For the purpose of this tutorial, we will not be using the ETL toolkit. For more information on the ETL toolkit, see its API documentation.
For this tutorial, we will use the python requests package to make requests to the datastore. We will use the same dataset used in the eemeter tutorial, available for download here:
This tutorial is also available as a jupyter notebook.
In [1]:
# library imports
import pandas as pd
import requests
import pytz
If you followed the datastore development setup instructions, you will already have run the command to create a superuser and access credentials.
python manage.py dev_seed
If you haven't already done so, do so now. The dev_seed
command
creates a demo admin user and a sample project.
demo
,demo-password
,tokstr
.1
Ensure that your development server is running locally on port 8000 before continuing.
python manage.py runserver
Each request will include an Authorization header
Authorization: Bearer tokstr
In [2]:
base_url = "http://0.0.0.0:8000"
headers = {"Authorization": "Bearer tokstr"}
Let's test the API by requesting a list of projects in the datastore. Since the dev_seed command creates a sample project, this will return a response showing that project.
In [3]:
url = base_url + "/api/v1/projects/"
projects = requests.get(url, headers=headers).json()
In [4]:
projects
Out[4]:
Although we'll delete this one in a moment, we can first explore a bit to get a feel for the API. Then we'll create a project of our own.
Energy trace data will be associated with this project by foreign key.
It is organized into time series by trace_id, and the following request
will show all traces associated with a particular project. Note the
difference between the 'id'
field and the 'project_id'
field.
The 'project_id'
field is the unique label that was associated with
it by an external source; the 'id'
field is the database table primary
key.
There are two tables used to store energy data:
project_id
: foreign key of the project this belongs totrace_id
: unique id of the traceinterpretation
: the fuel (electricity/natural gas) and the direction and type of flow (net/total consumption, supply, generation)unit
: physical units of the values provided in recordsmetadata_id
: foreign key to consumption metadatastart
: the start date and time of recordvalue
: the energy reading as reportedestimated
: boolean indicating whether or not the reading was estimatedFor consumption records, the end is implicit in the start
of the next temporal record. The last record should be null (if it's not, it will be treated as such).
Let's inspect the traces associated with this project. We can do so
using the project primary key 'id'
as a filter (we use the summary flag so that we
don't pull every record):
In [5]:
url = base_url + "/api/v1/consumption_metadatas/?summary=True&projects={}".format(projects[0]['id'])
consumption_metadatas = requests.get(url, headers=headers).json()
In [6]:
consumption_metadatas[0]
Out[6]:
We can also query for consumption records by metadata primary key.
In [7]:
url = base_url + "/api/v1/consumption_records/?metadata={}".format(consumption_metadatas[0]['id'])
consumption_records = requests.get(url, headers=headers).json()
In [8]:
consumption_records[:3]
Out[8]:
Now we'll delete the project that was created by the dev_seed command and make one of our own.
In [9]:
url = base_url + "/api/v1/projects/{}/".format(projects[0]['id'])
requests.delete(url, headers=headers)
Out[9]:
In [10]:
project_data = pd.read_csv('sample-project-data.csv',
parse_dates=['retrofit_start_date', 'retrofit_end_date']).iloc[0]
In [11]:
project_data
Out[11]:
In [12]:
data = {
"project_id": project_data.project_id,
"zipcode": str(project_data.zipcode),
"baseline_period_end": pytz.UTC.localize(project_data.retrofit_start_date).isoformat(),
"reporting_period_start": pytz.UTC.localize(project_data.retrofit_end_date).isoformat(),
"project_owner": 1,
}
print(data)
In [13]:
url = base_url + "/api/v1/projects/"
new_project = requests.post(url, json=data, headers=headers).json()
new_project
Out[13]:
If you try to post another project with the same project_id
, you'll get an error message.
In [14]:
url = base_url + "/api/v1/projects/"
requests.post(url, json=data, headers=headers).json()
Out[14]:
However, there is another endpoint you can hit to sync the project - update it if it exists, create it if it doesn't. This endpoint works almost the same way, but expects a list of data in a slightly different format:
In [15]:
data = [
{
"project_id": project_data.project_id,
"zipcode": str(project_data.zipcode),
"baseline_period_end": pytz.UTC.localize(project_data.retrofit_start_date).isoformat(),
"reporting_period_start": pytz.UTC.localize(project_data.retrofit_end_date).isoformat(),
"project_owner_id": 1,
}
]
print(data)
In [16]:
url = base_url + "/api/v1/projects/sync/"
requests.post(url, json=data, headers=headers).json()
Out[16]:
Now we can give this project some consumption data. Ene
In [17]:
energy_data = pd.read_csv('sample-energy-data_project-ABC_zipcode-50321.csv',
parse_dates=['date'], dtype={'zipcode': str})
energy_data.head()
Out[17]:
Then we'll the sync
endpoint for consumption metadata, which will create a new record or update an existing record. We have one trace here:
In [18]:
interpretation_mapping = {"electricity": "E_C_S"}
data = [
{
"project_project_id": energy_data.iloc[0]["project_id"],
"interpretation": interpretation_mapping[energy_data.iloc[0]["fuel"]],
"unit": energy_data.iloc[0]["unit"].upper(),
"label": energy_data.iloc[0]["trace_id"].upper()
}
]
data
Out[18]:
In [19]:
url = base_url + "/api/v1/consumption_metadatas/sync/"
consumption_metadatas = requests.post(url, json=data, headers=headers).json()
In [20]:
consumption_metadatas
Out[20]:
Let's turn that CSV into records.
In [21]:
data = [{
"metadata_id": consumption_metadatas[0]['id'],
"start": pytz.UTC.localize(row.date.to_datetime()).isoformat(),
"value": row.value,
"estimated": row.estimated,
} for _, row in energy_data.iterrows()]
data[:3]
Out[21]:
In [22]:
url = base_url + "/api/v1/consumption_records/sync2/"
consumption_records = requests.post(url, json=data, headers=headers)
In [23]:
consumption_records.text
Out[23]:
We can verify that these records were created by querying by consumption metadata id.
In [24]:
url = base_url + "/api/v1/consumption_records/?metadata={}".format(consumption_metadatas[0]['id'])
consumption_records = requests.get(url, json=data, headers=headers).json()
In [25]:
consumption_records[:3]
Out[25]:
We now have a simple project with a single trace of data. Now we will move to running a meter on that project:
To run a meter, make a request to create a "project run". This request will start a job that runs a meter and saves its results.
There are a few components to this request.
eemeter
meter to run.
In [26]:
data = {
"project": new_project['id'],
"meter_class": "EnergyEfficiencyMeter",
"meter_settings": {}
}
data
Out[26]:
In [27]:
url = base_url + "/api/v1/project_runs/"
project_run = requests.post(url, json=data, headers=headers).json()
project_run
Out[27]:
This creates a task to run the meter on the indicated project.
These results can be viewed by requesting the project run by primary key - as it completes, its status will change to SUCCESS
or FAILED
. If FAILED
, it will indicate a traceback of the error that occured. While it runs, its status will be RUNNING
; before it has started running, its status will be PENDING
.
In [28]:
url = base_url + "/api/v1/project_runs/{}/".format(project_run['id'])
project_runs = requests.get(url, headers=headers).json()
project_runs
Out[28]:
In [29]:
url = base_url + "/api/v1/project_results/"
project_results = requests.get(url, headers=headers).json()
project_results
Out[29]:
These results can be interpreted as follows:
'id'
: the primary key of the project result'eemeter_version'
: the version of the eemeter package installed in the datastore'meter_class'
: the name of the class of the meter used to obtain the results'meter_settings
: the settings of the eemeter'project'
: the primary key of the project'modeling_period_groups'
: pairs of baseline/reporting modeling periods'modeling_periods'
: all modeling periods'energy_trace_model_restuls'
: the trace/modeling period specific results'derivative_aggregations'
: the aggregated results for this project'added'
: When the project result was created'updated'
: when the project result was created or updated (should not have been updated)The most useful of these are the 'derivative_aggregations'
, which include baseline/reporting values
for each of the possible trace interpretations.