Processing of ARIS data for Benchmarking Script

Looking at the Fairbanks script, these are the fields that are necessary to do the benchmarking:

'Site ID': This was a 4 letter code identifying the particular facility. It doesn't have to be a 4-letter code for ARIS, just some unique ID.
'From': The start date of the billing period. Any date format works.
'Thru': The end date of the billing period. This could be multiple months after the start date in the case of oil or propane.
'Service Name': This was ['Oil #1', 'Oil #2', 'Electricity', 'Steam', 'Natural Gas', 'Water', 'Sewer', 'Refuse'] for FNSB.
'Item Description': This was the description of the particular line item on the bill, like "Customer Charge", "Cost Adjustments", "Demand Charge" plus a very wide variety of other types of charges. In the script, I combined all the miscellaneous charges into "Other Charge". These were billing line items that had no Usage associated with them.
'Usage': For usage-based charges, like kWh, kW, gallons, etc, this contains the usage. For "Other Charges" (fixed charges), this is blank.
'Cost': $ cost of the line item.
'Units': For Usage based charges, the units. These are what was in the Fairbanks data: 'Gallons', 'kWh', 'klbs', 'lbs', 'MMBtu', 'kVAR', 'kW', 'CCF', 'kVARh', 'Loads', 'Cgallons', 'kGal', 'Tons'.

So if we can get the ARIS data into this format, and if we could keep Service Name, Units, and some of the Item Descriptions the same, the script should run. As far as Item descriptions that need to stay the same, I think it is just peak Electric Demand that needs to be labeled like Fairbanks. It can be any one of the following: 'KW Charge', 'On peak demand', 'Demand Charge'

To Do

Standardize values in each field to match FNSB values.
- To be safe, change spelling of units to match (may not really be necessary although there are some specific tests for CCF)
- Map 'Service Type' to categories and then only do processing based on those categories.
Think more about how long of a billing period is too long.
Document the Zip-to-DD_site file. Tell people that Zip code is needed to determine degree-day site. Have them look in Buildings.xlsx file to see if a Degree-day site was assigned.
Find degree-day sites for the missing zip codes.
Add a README.md in the Data directory to explain what the files are.
Update the main README.md.
Add a section to the Settings File to accomodate ARIS passwords/URL and other ARIS specific settings.
Incorporate new graphs into the benchmark.py script, including creation of new Template.
Labels in Existing Template Files
- 'FNSB' appears in Energy Index Comparison.
- No Propane or Wood present in headings.
- Oil #2 and #1 probably should be combined and labeled "Fuel Oil", "Heating Oil", or "Oil"
- Hot Water District Heat and Steam should probably be combined



In [1]:

    
import time
from datetime import timedelta
import pandas as pd
import numpy as np
import requests

###################### Matplotlib #######################
# import matplotlib pyplot commands
from matplotlib.pyplot import *
 
# Show Plots in the Notebook
%matplotlib inline
 
# 'style' the plot like fivethirtyeight.com website
style.use('bmh')



In [2]:

    
rcParams['figure.figsize']= (10, 8)   # set Chart Size
rcParams['font.size'] = 14            # set Font size in Chart

Read Data from ARIS API



In [3]:

    
# URLs and Passwords
my_username = 'buildingenergyapp'
my_password = 'buildingenergyappTest1!'
base_url = 'http://arisapi.test.ahfc.us/api/buildingenergy'
building_list = '/GetBuildingList'
building_energy_detail = '/GetBuildingEnergyDetail'
my_params = {'username': my_username,
             'password':my_password}
building_list_url = base_url + building_list
building_energy_url = base_url + building_energy_detail



In [4]:

    
results = requests.post(building_list_url, params=my_params).json()
df_bldgs = pd.DataFrame(results)
df_bldgs.head()









    Out[4]:







  
    
      
      BuildingAddress
      BuildingCity
      BuildingEnergyDetailList
      BuildingId
      BuildingName
      BuildingOwnerId
      BuildingOwnerName
      BuildingState
      BuildingStreet
      BuildingUsageName
      BuildingZip
      OrganizationId
      OrganizationName
      SquareFeet
      YearBuilt
    
  
  
    
      0
      207 Spruce Dr PORT LIONS AK 99615
      PORT LIONS
      None
      44
      Library
      44
      City of Port Lions
      AK
      207 Spruce Dr
      Office
      99615
      2.0
      Municipal Government/Subdivision
      2000.0
      1988.0
    
    
      1
      441 Bayview PORT LIONS AK 99550
      PORT LIONS
      None
      45
      Public Safety Building
      45
      City of Port Lions
      AK
      441 Bayview
      Office
      99550
      2.0
      Municipal Government/Subdivision
      1700.0
      1980.0
    
    
      2
      443 Bayview Dr PORT LIONS AK 99550
      PORT LIONS
      None
      46
      Water Building
      46
      City of Port Lions
      AK
      443 Bayview Dr
      Other
      99550
      2.0
      Municipal Government/Subdivision
      1200.0
      2008.0
    
    
      3
      177 N. Birch SOLDOTNA AK 99669
      SOLDOTNA
      None
      47
      City Hall
      47
      City of Soldotna
      AK
      177 N. Birch
      Office
      99669
      2.0
      Municipal Government/Subdivision
      11676.0
      1986.0
    
    
      4
      235 N. Binkley SOLDOTNA AK 99669
      SOLDOTNA
      None
      48
      Public Library
      48
      City of Soldotna
      AK
      235 N. Binkley
      Other
      99669
      2.0
      Municipal Government/Subdivision
      10727.0
      1972.0



In [5]:

    
# Add a Degree-Day Site column by looking up via zip code
df_zip_to_dd = pd.read_excel('../data/Zip_to_DD_Site.xlsx', skiprows=4)
df_zip_to_dd['zip_code'] = df_zip_to_dd.zip_code.astype(str)
print(df_zip_to_dd.head())
zip_to_dd = dict(zip(df_zip_to_dd.zip_code, df_zip_to_dd.dd_site))
zip_to_dd.get('99645')









    



  zip_code dd_site
0    99553    PADU
1    99571    PACD
2    99583    PACD
3    99612    PACD
4    99661    PASD






    Out[5]:





'PAAQ'



In [6]:

    
df_bldgs['dd_site'] = df_bldgs.BuildingZip.map(zip_to_dd)
df_bldgs.head()









    Out[6]:







  
    
      
      BuildingAddress
      BuildingCity
      BuildingEnergyDetailList
      BuildingId
      BuildingName
      BuildingOwnerId
      BuildingOwnerName
      BuildingState
      BuildingStreet
      BuildingUsageName
      BuildingZip
      OrganizationId
      OrganizationName
      SquareFeet
      YearBuilt
      dd_site
    
  
  
    
      0
      207 Spruce Dr PORT LIONS AK 99615
      PORT LIONS
      None
      44
      Library
      44
      City of Port Lions
      AK
      207 Spruce Dr
      Office
      99615
      2.0
      Municipal Government/Subdivision
      2000.0
      1988.0
      PADQ
    
    
      1
      441 Bayview PORT LIONS AK 99550
      PORT LIONS
      None
      45
      Public Safety Building
      45
      City of Port Lions
      AK
      441 Bayview
      Office
      99550
      2.0
      Municipal Government/Subdivision
      1700.0
      1980.0
      PADQ
    
    
      2
      443 Bayview Dr PORT LIONS AK 99550
      PORT LIONS
      None
      46
      Water Building
      46
      City of Port Lions
      AK
      443 Bayview Dr
      Other
      99550
      2.0
      Municipal Government/Subdivision
      1200.0
      2008.0
      PADQ
    
    
      3
      177 N. Birch SOLDOTNA AK 99669
      SOLDOTNA
      None
      47
      City Hall
      47
      City of Soldotna
      AK
      177 N. Birch
      Office
      99669
      2.0
      Municipal Government/Subdivision
      11676.0
      1986.0
      PAEN
    
    
      4
      235 N. Binkley SOLDOTNA AK 99669
      SOLDOTNA
      None
      48
      Public Library
      48
      City of Soldotna
      AK
      235 N. Binkley
      Other
      99669
      2.0
      Municipal Government/Subdivision
      10727.0
      1972.0
      PAEN



In [7]:

    
# Check that all zip codes produced a degree-day site; NO
print(sum(df_bldgs.BuildingZip.isna()), sum(df_bldgs.dd_site.isna()))

# So need to find the zip codes that don't map to a Degree-Day site
# 'dd_site != dd_site' is a hack for finding NaN values. 
df_no_map = df_bldgs.query('(BuildingZip > "") and (dd_site != dd_site)')
df_no_map.BuildingZip.unique()









    



174 248






    Out[7]:





array(['99577', '99506', '99567', '99587', '99505', '99642', '99779',
       '     ', '99783', '99707', '99109', '99725', '99373'], dtype=object)



In [8]:

    
col_map = [
    ('BuildingId', 'site_id'),
    ('BuildingName', 'site_name'),
    ('BuildingOwnerName', 'site_category'),
    ('BuildingStreet', 'address'),
    ('BuildingCity', 'city'),
    ('BuildingUsageName', 'primary_func'),
    ('YearBuilt', 'year_built'),
    ('SquareFeet', 'sq_ft'),
    ('dd_site', 'dd_site')
]
old_cols, new_cols = zip(*col_map)
df_bldgs2 = df_bldgs[list(old_cols)].copy()
df_bldgs2.columns = new_cols
df_bldgs2['onsite_gen'] = ''    # not used
df_bldgs2.to_excel('data/Buildings.xlsx', startrow=3, index=False)
df_bldgs2.head()









    Out[8]:







  
    
      
      site_id
      site_name
      site_category
      address
      city
      primary_func
      year_built
      sq_ft
      dd_site
      onsite_gen
    
  
  
    
      0
      44
      Library
      City of Port Lions
      207 Spruce Dr
      PORT LIONS
      Office
      1988.0
      2000.0
      PADQ
      
    
    
      1
      45
      Public Safety Building
      City of Port Lions
      441 Bayview
      PORT LIONS
      Office
      1980.0
      1700.0
      PADQ
      
    
    
      2
      46
      Water Building
      City of Port Lions
      443 Bayview Dr
      PORT LIONS
      Other
      2008.0
      1200.0
      PADQ
      
    
    
      3
      47
      City Hall
      City of Soldotna
      177 N. Birch
      SOLDOTNA
      Office
      1986.0
      11676.0
      PAEN
      
    
    
      4
      48
      Public Library
      City of Soldotna
      235 N. Binkley
      SOLDOTNA
      Other
      1972.0
      10727.0
      PAEN



In [9]:

    
# Now work on the detailed records
print(len(df_bldgs2.site_id.unique()))
my_data = {'username': my_username,
           'password':my_password,
           'buildingId':44}

detail = requests.post(building_energy_url, data=my_data).json()
df_detail = pd.DataFrame(detail['BuildingEnergyDetailList'])
df_detail['UsageDate'] = pd.to_datetime(df_detail.UsageDate)
df_detail.head()









    



2767






    Out[9]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeId
      EnergyTypeName
      EnergyUnitId
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      UsageYear
    
  
  
    
      0
      44
      None
      None
      81.0
      412.0
      1
      Electric
      1
      kWh
      None
      2009-01-01
      2009
    
    
      1
      44
      None
      None
      99.0
      538.0
      1
      Electric
      1
      kWh
      None
      2009-02-01
      2009
    
    
      2
      44
      None
      None
      81.0
      432.0
      1
      Electric
      1
      kWh
      None
      2009-03-01
      2009
    
    
      3
      44
      None
      None
      78.0
      406.0
      1
      Electric
      1
      kWh
      None
      2009-04-01
      2009
    
    
      4
      44
      None
      None
      116.0
      771.0
      1
      Electric
      1
      kWh
      None
      2009-05-01
      2009



In [86]:

    
my_data = {'username': my_username,
           'password':my_password,
           'buildingId': None}
dfd = None
next_prn = time.time()
for bldg_id in df_bldgs2.site_id.unique():
    my_data['buildingId'] =  bldg_id
    detail = requests.post(building_energy_url, data=my_data).json()
    if len(detail['BuildingEnergyDetailList']):
        df_detail = pd.DataFrame(detail['BuildingEnergyDetailList'])
        if dfd is not None:
            dfd = dfd.append(df_detail, ignore_index=True)
        else:
            dfd = df_detail.copy()
        if time.time() > next_prn:
            print('{:,} records fetched'.format(len(dfd)))
            next_prn += 10.0   # wait 10 seconds before printing
dfd = dfd.apply(pd.to_numeric, errors='ignore')
dfd[['UsageDate', 'MeterReadDate']] = dfd[['UsageDate', 'MeterReadDate']].apply(pd.to_datetime)
dfd.to_pickle('dfd.pkl')









    



48 records fetched
2,520 records fetched
4,920 records fetched
6,886 records fetched
8,743 records fetched






    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\Anaconda2\envs\py36\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    379             try:  # Python 2.7, use buffering of HTTP responses
--> 380                 httplib_response = conn.getresponse(buffering=True)
    381             except TypeError:  # Python 2.6 and older, Python 3

TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-86-ae4adbdf165b> in <module>()
      6 for bldg_id in df_bldgs2.site_id.unique():
      7     my_data['buildingId'] =  bldg_id
----> 8     detail = requests.post(building_energy_url, data=my_data).json()
      9     if len(detail['BuildingEnergyDetailList']):
     10         df_detail = pd.DataFrame(detail['BuildingEnergyDetailList'])

~\Anaconda2\envs\py36\lib\site-packages\requests\api.py in post(url, data, json, **kwargs)
    110     """
    111 
--> 112     return request('post', url, data=data, json=json, **kwargs)
    113 
    114 

~\Anaconda2\envs\py36\lib\site-packages\requests\api.py in request(method, url, **kwargs)
     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59 
     60 

~\Anaconda2\envs\py36\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    506         }
    507         send_kwargs.update(settings)
--> 508         resp = self.send(prep, **send_kwargs)
    509 
    510         return resp

~\Anaconda2\envs\py36\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    616 
    617         # Send the request
--> 618         r = adapter.send(request, **kwargs)
    619 
    620         # Total elapsed time of the request (approximately)

~\Anaconda2\envs\py36\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    438                     decode_content=False,
    439                     retries=self.max_retries,
--> 440                     timeout=timeout
    441                 )
    442 

~\Anaconda2\envs\py36\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   timeout=timeout_obj,
    600                                                   body=body, headers=headers,
--> 601                                                   chunked=chunked)
    602 
    603             # If we're going to release the connection in ``finally:``, then

~\Anaconda2\envs\py36\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    381             except TypeError:  # Python 2.6 and older, Python 3
    382                 try:
--> 383                     httplib_response = conn.getresponse()
    384                 except Exception as e:
    385                     # Remove the TypeError from the exception chain in Python 3;

~\Anaconda2\envs\py36\lib\http\client.py in getresponse(self)
   1329         try:
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:
   1333                 self.close()

~\Anaconda2\envs\py36\lib\http\client.py in begin(self)
    295         # read until we get a non-100 response
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:
    299                 break

~\Anaconda2\envs\py36\lib\http\client.py in _read_status(self)
    256 
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:
    260             raise LineTooLong("status line")

~\Anaconda2\envs\py36\lib\socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True

KeyboardInterrupt:



In [11]:

    
dfd.head()









    Out[11]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeId
      EnergyTypeName
      EnergyUnitId
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      UsageYear
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      1
      Electric
      1
      kWh
      NaT
      2009-01-01
      2009
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      1
      Electric
      1
      kWh
      NaT
      2009-02-01
      2009
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      1
      Electric
      1
      kWh
      NaT
      2009-03-01
      2009
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      1
      Electric
      1
      kWh
      NaT
      2009-04-01
      2009
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      1
      Electric
      1
      kWh
      NaT
      2009-05-01
      2009



In [12]:

    
len(dfd)









    Out[12]:





67787



In [40]:

    
dfd.to_pickle('dfd.pkl')
dfd.head()









    Out[40]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeId
      EnergyTypeName
      EnergyUnitId
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      UsageYear
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      1
      Electric
      1
      kWh
      NaT
      2009-01-01
      2009
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      1
      Electric
      1
      kWh
      NaT
      2009-02-01
      2009
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      1
      Electric
      1
      kWh
      NaT
      2009-03-01
      2009
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      1
      Electric
      1
      kWh
      NaT
      2009-04-01
      2009
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      1
      Electric
      1
      kWh
      NaT
      2009-05-01
      2009



In [41]:

    
dfd.dtypes









    Out[41]:





BuildingId                     int64
DemandCost                   float64
DemandUse                    float64
DollarCost                   float64
EnergyQuantity               float64
EnergyTypeId                   int64
EnergyTypeName                object
EnergyUnitId                   int64
EnergyUnitTypeName            object
MeterReadDate         datetime64[ns]
UsageDate             datetime64[ns]
UsageYear                      int64
dtype: object

Process ARIS API data into Form useable by FNSB Script



In [15]:

    
# All of the columns in the FNSB CSV file:
fnsb_cols = ["Site ID","Site Name","Vendor Code","Vendor Name","Account Number","Bill Date","Due Date","Entry Date","Invoice #","Voucher #","From","Thru","Service Name","Item Description","Meter Number","Usage","Cost","Units","Account Financial Code","Site Financial Code"]
fnsb_cols









    Out[15]:





['Site ID',
 'Site Name',
 'Vendor Code',
 'Vendor Name',
 'Account Number',
 'Bill Date',
 'Due Date',
 'Entry Date',
 'Invoice #',
 'Voucher #',
 'From',
 'Thru',
 'Service Name',
 'Item Description',
 'Meter Number',
 'Usage',
 'Cost',
 'Units',
 'Account Financial Code',
 'Site Financial Code']



In [3]:

    
dfd = pd.read_pickle('dfd.pkl')
dfd = dfd.apply(pd.to_numeric, errors='ignore')
dfd[['UsageDate', 'MeterReadDate']] = dfd[['UsageDate', 'MeterReadDate']].apply(pd.to_datetime)

dfd.head()









    Out[3]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeId
      EnergyTypeName
      EnergyUnitId
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      UsageYear
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      1
      Electric
      1
      kWh
      NaT
      2009-01-01
      2009
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      1
      Electric
      1
      kWh
      NaT
      2009-02-01
      2009
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      1
      Electric
      1
      kWh
      NaT
      2009-03-01
      2009
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      1
      Electric
      1
      kWh
      NaT
      2009-04-01
      2009
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      1
      Electric
      1
      kWh
      NaT
      2009-05-01
      2009



In [76]:

    
# Get rid of unneeded columns
dfd.drop(columns=['EnergyTypeId', 'EnergyUnitId', 'UsageYear'], inplace=True)
dfd.head()









    Out[76]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      Electric
      kWh
      NaT
      2009-01-01
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      Electric
      kWh
      NaT
      2009-02-01
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      Electric
      kWh
      NaT
      2009-03-01
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      Electric
      kWh
      NaT
      2009-04-01
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      Electric
      kWh
      NaT
      2009-05-01



In [5]:

    
dfd.query('EnergyTypeName=="Coal"').BuildingId.unique()









    Out[5]:





array([ 1026,  1125,   423,     4,  1240,  1241, 13805], dtype=int64)



In [6]:

    
# The dictionary that renames the columns to names needed 
# by the benchmarking script
col_map = {
    'BuildingId': 'Site ID',
    'EnergyTypeName': 'Service Name',
    'EnergyUnitTypeName': 'Units',
    'EnergyQuantity': 'Usage',
    'DollarCost': 'Cost',
}
def add_to_final(df_to_add):
    global df_final
    df_add = df_to_add.copy()
    df_add.rename(columns=col_map, inplace=True)
    df_add.drop(columns=['DemandUse', 'DemandCost', 'UsageDate', 'MeterReadDate'], inplace=True)
    df_final = df_final.append(df_add, ignore_index=True)



In [7]:

    
# For the usage end date, 'Thru', use the MeterReadDate if available, otherwise
# use the middle of the UsageDate month.
def thru_date(row):
    if pd.isnull(row.MeterReadDate):
        return row.UsageDate.replace(day=15)
    else:
        return row.MeterReadDate
dfd['Thru'] = dfd.apply(thru_date, axis=1)
dfd.head()









    Out[7]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      Electric
      kWh
      NaT
      2009-01-01
      2009-01-15
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      Electric
      kWh
      NaT
      2009-02-01
      2009-02-15
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      Electric
      kWh
      NaT
      2009-03-01
      2009-03-15
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      Electric
      kWh
      NaT
      2009-04-01
      2009-04-15
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      Electric
      kWh
      NaT
      2009-05-01
      2009-05-15



In [8]:

    
dfd.query('MeterReadDate > "2000-01-01"').head()









    Out[8]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
    
  
  
    
      21255
      1
      0.00
      0.00
      3909.87
      4085.0
      Natural Gas
      CCF
      2014-01-15
      2014-01-01
      2014-01-15
    
    
      21256
      1
      3752.91
      221.28
      3959.78
      82000.0
      Electric
      kWh
      2014-01-21
      2014-01-01
      2014-01-21
    
    
      21257
      1
      0.00
      0.00
      2716.99
      2712.0
      Natural Gas
      CCF
      2014-02-15
      2014-02-01
      2014-02-15
    
    
      21258
      1
      3752.91
      221.28
      3921.15
      81200.0
      Electric
      kWh
      2014-02-21
      2014-02-01
      2014-02-21
    
    
      21259
      1
      0.00
      0.00
      3223.50
      3295.0
      Natural Gas
      CCF
      2014-03-15
      2014-03-01
      2014-03-15



In [9]:

    
dfd.EnergyTypeName.value_counts()









    Out[9]:





Electric               36813
#1 Fuel Oil            14547
Natural Gas            10815
Propane                 1560
#2 Fuel Oil             1553
Steam District Ht       1542
Hot Wtr District Ht      657
Coal                     168
Spruce Wood              120
Demand - Electric         12
Name: EnergyTypeName, dtype: int64



In [10]:

    
dfd.query('EnergyTypeName=="Demand - Electric"')









    Out[10]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
    
  
  
    
      45174
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-07-01
      2016-07-15
    
    
      45175
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-08-01
      2016-08-15
    
    
      45176
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-09-01
      2016-09-15
    
    
      45177
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-10-01
      2016-10-15
    
    
      45178
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-11-01
      2016-11-15
    
    
      45179
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-12-01
      2016-12-15
    
    
      45180
      13650
      0.0
      0.0
      440.79
      39039.0
      Demand - Electric
      kWh
      NaT
      2016-01-01
      2016-01-15
    
    
      45181
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-02-01
      2016-02-15
    
    
      45182
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-03-01
      2016-03-15
    
    
      45183
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-04-01
      2016-04-15
    
    
      45184
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-05-01
      2016-05-15
    
    
      45185
      13650
      0.0
      0.0
      0.00
      0.0
      Demand - Electric
      kWh
      NaT
      2016-06-01
      2016-06-15



In [11]:

    
# Change these to 'Electric'
dfd.loc[dfd.EnergyTypeName == 'Demand - Electric', 'EnergyTypeName'] = 'Electric'
dfd.EnergyTypeName.value_counts()









    Out[11]:





Electric               36825
#1 Fuel Oil            14547
Natural Gas            10815
Propane                 1560
#2 Fuel Oil             1553
Steam District Ht       1542
Hot Wtr District Ht      657
Coal                     168
Spruce Wood              120
Name: EnergyTypeName, dtype: int64



In [12]:

    
# There are a number of records where the EnergyQuantity is 0 or NaN,
# which probably occurs because someone doesn't have the bill for that 
# month or there was no fuel fill-up in that month.  We will eliminate
# those records, because they distort the period over which fuel usage
# occurred for sporadically bought fuels like oil and wood.  For 
# monthly-billed fuels, we will later in the code make sure that the 
# From - Thru billing period only covers 1 month.

# Start by converting 0s to NaN to make future tests easier.
dfd.loc[dfd.EnergyQuantity == 0.0, 'EnergyQuantity'] = np.NaN
dfd.loc[dfd.DemandUse == 0.0, 'DemandUse'] = np.NaN

# Also found that there were a bunch of -1.0 values for DemandUse that
# are very likely not valid.
dfd.loc[dfd.DemandUse == -1.0, 'DemandUse'] = np.NaN

dfd.query('(EnergyQuantity == 0.0) or (DemandUse == 0.0) or (DemandUse == -1.0)')   # should be no records









    Out[12]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru



In [13]:

    
# Now filter down to just the records where we have a number for
# either EnergyQuantity or DemandUse.  
print(len(dfd))
mask = ~(dfd.EnergyQuantity.isnull() & dfd.DemandUse.isnull())
dfd = dfd[mask].copy()
print(len(dfd))



In [14]:

    
# These have Demand but no energy
dfd.query('EnergyQuantity != EnergyQuantity').describe()









    Out[14]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
    
  
  
    
      count
      53.000000
      53.000000
      53.000000
      53.000000
      0.0
    
    
      mean
      813.698113
      177.547925
      4.778302
      43.115094
      NaN
    
    
      std
      2.197769
      64.323284
      2.938123
      6.566051
      NaN
    
    
      min
      798.000000
      1.510000
      0.240000
      15.180000
      NaN
    
    
      25%
      814.000000
      182.510000
      2.130000
      38.380000
      NaN
    
    
      50%
      814.000000
      182.510000
      6.800000
      42.420000
      NaN
    
    
      75%
      814.000000
      237.360000
      8.000000
      48.730000
      NaN
    
    
      max
      814.000000
      237.360000
      8.000000
      52.760000
      NaN



In [27]:

    
# Fill out the From date by using the Thru date from the prior bill 
# for the building and for the particular fuel type
df_final = None
for gp, recs in dfd.groupby(['BuildingId', 'EnergyTypeName']):
    recs = recs.sort_values(['Thru']).copy()
    # Start date comes from prior record
    recs['From'] = recs.Thru.shift(1)
    recs['Item Description'] =  'Energy'
    if df_final is None:
        df_final = recs.copy()
    else:
        df_final = df_final.append(recs, ignore_index=True)
len(df_final)









    Out[27]:





56873



In [49]:

    
# For the services that are normally billed on a monthly basis, fill out
# any missing From dates (e.g. the first bill for a building) with a value
# 30 days prior to Thru.  Also, restrict the Thru - From difference to 25 to 35 days.
# If it is outside that range, set to Thru - 30 days.

# Fuel types that are normally billed on a monthly basis
mo_fuels = ['Electric', 'Natural Gas', 'Steam District Ht', 'Hot Wtr District Ht']
mask_mo = df_final.EnergyTypeName.isin(mo_fuels)

# Find records of that type that have NaT for From date and 
# set to 30 days prior to Thru
df_final.loc[mask_mo & df_final.From.isnull(), 'From'] = df_final.Thru - timedelta(days=30)



In [45]:

    
# Now find any records where Thru - From is outside 25 - 35 window and fix those.
# Perhaps they are buildings where there are two separate electric bills.
bill_len = df_final.Thru - df_final.From
mask2 = mask_mo & ((bill_len < timedelta(days=25)) | (bill_len > timedelta(days=35)))
df_final.loc[mask2, 'From'] = df_final.Thru - timedelta(days=30)



In [47]:

    
print(len(df_final[mask2]))
df_final[mask2].head()









    



2243






    Out[47]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
      From
      Item Description
    
  
  
    
      24
      1
      NaN
      NaN
      10561.95
      91200.0
      Electric
      kWh
      NaT
      2011-03-01
      2011-03-15
      2011-02-13
      Energy
    
    
      34
      1
      2973.58
      NaN
      4814.98
      133600.0
      Electric
      kWh
      NaT
      2012-03-01
      2012-03-15
      2012-02-14
      Energy
    
    
      56
      1
      3752.91
      221.28
      3959.78
      82000.0
      Electric
      kWh
      2014-01-21
      2014-01-01
      2014-01-21
      2013-12-22
      Energy
    
    
      128
      1
      NaN
      NaN
      653.77
      395.0
      Natural Gas
      CCF
      NaT
      2012-04-01
      2012-04-15
      2012-03-16
      Energy
    
    
      164
      1
      0.00
      NaN
      4043.89
      4145.0
      Natural Gas
      CCF
      2015-04-17
      2015-04-01
      2015-04-17
      2015-03-18
      Energy



In [57]:

    
# Now work on the fuel types that are not billed monthly. Some of these records
# have NaT for the From date because they were the first record for the building
# and a particular fuel type.  We will ultimately delete these.  In this step
# find sporadically billed records that have a billing length of greater than 450
# days and put a NaT in for From, so that deleting all From==NaT records will catch
# them as well. A billing period more than 450 days probably indicates that a fuel
# fill was missed making the record invalid.
mask_sporadic = ~mask_mo
mask3 = mask_sporadic & (bill_len > timedelta(days=450))
df_final.loc[mask3, 'From'] = pd.NaT
len(df_final)









    Out[57]:





56873



In [59]:

    
# Now eliminate all the sporadically billed records that have a From
# with a NaT
mask_elim = (mask_sporadic & df_final.From.isnull())
df_final = df_final[~mask_elim].copy()
len(df_final)









    Out[59]:





56085



In [60]:

    
# Now add the Electric Demand Charge records.  The From-Thru dates on these
# have already been set. The demand quantity and cost
# appear in separate, dedicated columns, but we will move them to the 'EnergyQuantity'
# and 'DollarCost' columns.
df_demand = df_final.query('DemandUse > 0 and EnergyTypeName=="Electric"').copy()
df_demand['EnergyQuantity'] = df_demand.DemandUse
df_demand['DollarCost'] =  df_demand.DemandCost
df_demand['EnergyUnitTypeName'] = 'kW'
df_demand['Item Description'] = 'Demand Charge'

# add these to the final DataFrame
df_final = df_final.append(df_demand, ignore_index=True)
len(df_final)









    Out[60]:





59965



In [61]:

    
df_final.columns









    Out[61]:





Index(['BuildingId', 'DemandCost', 'DemandUse', 'DollarCost', 'EnergyQuantity',
       'EnergyTypeName', 'EnergyUnitTypeName', 'MeterReadDate', 'UsageDate',
       'Thru', 'From', 'Item Description'],
      dtype='object')



In [62]:

    
# Eliminate the columns that are not needed
df_final.drop(columns=['DemandCost', 'DemandUse', 'MeterReadDate', 'UsageDate'], inplace=True)
df_final.head()









    Out[62]:







  
    
      
      BuildingId
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      Thru
      From
      Item Description
    
  
  
    
      0
      1
      10472.0
      129800.0
      Electric
      kWh
      2009-01-15
      2008-12-16
      Energy
    
    
      1
      1
      9768.0
      117600.0
      Electric
      kWh
      2009-02-15
      2009-01-15
      Energy
    
    
      2
      1
      11883.0
      116000.0
      Electric
      kWh
      2009-03-15
      2009-02-15
      Energy
    
    
      3
      1
      15421.0
      158800.0
      Electric
      kWh
      2009-04-15
      2009-03-15
      Energy
    
    
      4
      1
      13802.0
      137400.0
      Electric
      kWh
      2009-05-15
      2009-04-15
      Energy



In [65]:

    
df_final.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59965 entries, 0 to 59964
Data columns (total 8 columns):
BuildingId            59965 non-null int64
DollarCost            58769 non-null float64
EnergyQuantity        59912 non-null float64
EnergyTypeName        59965 non-null object
EnergyUnitTypeName    59965 non-null object
Thru                  59965 non-null datetime64[ns]
From                  59965 non-null datetime64[ns]
Item Description      59965 non-null object
dtypes: datetime64[ns](2), float64(2), int64(1), object(3)
memory usage: 3.7+ MB



In [66]:

    
col_map = {
    'BuildingId': 'Site ID',
    'EnergyTypeName': 'Service Name',
    'EnergyUnitTypeName': 'Units',
    'EnergyQuantity': 'Usage',
    'DollarCost': 'Cost',
}
df_final.rename(col_map, axis=1, inplace=True)
df_final.head()









    Out[66]:







  
    
      
      Site ID
      Cost
      Usage
      Service Name
      Units
      Thru
      From
      Item Description
    
  
  
    
      0
      1
      10472.0
      129800.0
      Electric
      kWh
      2009-01-15
      2008-12-16
      Energy
    
    
      1
      1
      9768.0
      117600.0
      Electric
      kWh
      2009-02-15
      2009-01-15
      Energy
    
    
      2
      1
      11883.0
      116000.0
      Electric
      kWh
      2009-03-15
      2009-02-15
      Energy
    
    
      3
      1
      15421.0
      158800.0
      Electric
      kWh
      2009-04-15
      2009-03-15
      Energy
    
    
      4
      1
      13802.0
      137400.0
      Electric
      kWh
      2009-05-15
      2009-04-15
      Energy



In [67]:

    
df_final['Units'].value_counts()









    Out[67]:





kWh                33976
Gallon             10275
CCF                 9660
kW                  3880
thousand pounds     1377
Btu                  609
Ton                   96
Cord                  92
Name: Units, dtype: int64



In [68]:

    
df_final['Service Name'].value_counts()









    Out[68]:





Electric               37856
Natural Gas             9660
#1 Fuel Oil             8918
Steam District Ht       1377
#2 Fuel Oil              900
Hot Wtr District Ht      609
Propane                  457
Coal                      96
Spruce Wood               92
Name: Service Name, dtype: int64



In [69]:

    
df_final.dtypes









    Out[69]:





Site ID                      int64
Cost                       float64
Usage                      float64
Service Name                object
Units                       object
Thru                datetime64[ns]
From                datetime64[ns]
Item Description            object
dtype: object



In [70]:

    
set(zip(df_final['Service Name'], df_final['Units']))









    Out[70]:





{('#1 Fuel Oil', 'Gallon'),
 ('#2 Fuel Oil', 'Gallon'),
 ('Coal', 'Ton'),
 ('Electric', 'kW'),
 ('Electric', 'kWh'),
 ('Hot Wtr District Ht', 'Btu'),
 ('Natural Gas', 'CCF'),
 ('Propane', 'Gallon'),
 ('Spruce Wood', 'Cord'),
 ('Steam District Ht', 'thousand pounds')}



In [71]:

    
df_final['Item Description'].value_counts()









    Out[71]:





Energy           56085
Demand Charge     3880
Name: Item Description, dtype: int64



In [72]:

    
# These fields are used in the report summarizing vendors.
df_final['Account Number'] = ''
df_final['Vendor Name'] = ''



In [73]:

    
df_final.groupby('Service Name').sum()









    Out[73]:







  
    
      
      Site ID
      Cost
      Usage
    
    
      Service Name
      
      
      
    
  
  
    
      #1 Fuel Oil
      61030333
      4.069266e+07
      1.402547e+07
    
    
      #2 Fuel Oil
      5628457
      3.815718e+06
      1.701116e+06
    
    
      Coal
      381679
      9.760000e+03
      7.705000e+02
    
    
      Electric
      225508046
      1.660733e+08
      1.105300e+09
    
    
      Hot Wtr District Ht
      2494589
      1.556914e+06
      2.750714e+10
    
    
      Natural Gas
      28425111
      4.985783e+07
      5.107488e+07
    
    
      Propane
      1830040
      4.876821e+05
      2.641949e+05
    
    
      Spruce Wood
      376717
      -2.300000e+01
      6.200000e+02
    
    
      Steam District Ht
      13588686
      6.975902e+06
      8.634288e+07



In [74]:

    
df_final.to_pickle('df_final.pkl')
df_final.to_csv('df_final.csv')



In [78]:

    
# FNSB Processed Data
dfu3 = pd.read_pickle('../df_processed.pkl')
dfu3.head()









    Out[78]:







  
    
      
      site_id
      service_type
      cal_year
      cal_mo
      item_desc
      units
      cost
      days_served
      usage
      fiscal_year
      fiscal_mo
      mmbtu
    
  
  
    
      0
      03
      Electricity
      2005
      12
      Electricity charge
      kWh
      1904.657880
      49.5
      14790.748577
      2006
      6
      50.466034
    
    
      1
      03
      Electricity
      2006
      1
      Electricity charge
      kWh
      5430.493797
      93.0
      42665.790911
      2006
      7
      145.575679
    
    
      2
      03
      Electricity
      2006
      2
      Electricity charge
      kWh
      5764.406730
      84.0
      45010.439348
      2006
      8
      153.575619
    
    
      3
      03
      Electricity
      2006
      3
      Electricity charge
      kWh
      6349.255299
      93.0
      46311.547557
      2006
      9
      158.015000
    
    
      4
      03
      Electricity
      2006
      4
      Electricity charge
      kWh
      5529.385224
      90.0
      40392.812893
      2006
      10
      137.820278



In [79]:

    
dfu3.service_type.value_counts()









    Out[79]:





Electricity    24817
Water          16480
Oil #1         10533
Sewer          10175
Refuse          7266
Natural Gas     3113
Steam           1882
Oil #2             2
Name: service_type, dtype: int64



In [82]:

    
dfu3.units.value_counts()









    Out[82]:





-           36661
Gallons     16831
kWh         12369
Loads        3215
kW           2237
CCF          1146
klbs          768
MMBtu         499
kVAR          214
lbs           195
kGal           93
Tons           31
kVARh           6
Cgallons        3
Name: units, dtype: int64



In [91]:

    
df_raw = pd.read_pickle('../df_raw.pkl')
df_raw[df_raw['Service Name']=='Electricity'].head()









    Out[91]:







  
    
      
      Site ID
      Site Name
      Vendor Code
      Vendor Name
      Account Number
      Bill Date
      Due Date
      Entry Date
      Invoice #
      Voucher #
      From
      Thru
      Service Name
      Item Description
      Meter Number
      Usage
      Cost
      Units
      Account Financial Code
      Site Financial Code
    
  
  
    
      22
      CLXGP2
      FNSB-CLRA River Park General Area
      VU797000
      Army Corps of Engineers
      01 River Park
      08/01/2006
      08/01/2006
      11/14/2008
      NaN
      NaN
      2006-07-01
      2006-07-31
      Electricity
      Energy charge
      NaN
      110.0
      31.65
      kWh
      61831.0
      NaN
    
    
      23
      CLXGP2
      FNSB-CLRA River Park General Area
      VU797000
      Army Corps of Engineers
      01 River Park
      08/31/2006
      08/31/2006
      11/14/2008
      NaN
      NaN
      2006-07-31
      2006-08-31
      Electricity
      Energy charge
      NaN
      600.0
      172.65
      kWh
      61831.0
      NaN
    
    
      24
      CLXGP2
      FNSB-CLRA River Park General Area
      VU797000
      Army Corps of Engineers
      01 River Park
      09/30/2006
      09/30/2006
      11/14/2008
      NaN
      NaN
      2006-08-31
      2006-09-30
      Electricity
      Energy charge
      NaN
      930.0
      267.60
      kWh
      61831.0
      NaN
    
    
      25
      CLXGP2
      FNSB-CLRA River Park General Area
      VU797000
      Army Corps of Engineers
      01 River Park
      10/31/2006
      10/31/2006
      11/14/2008
      NaN
      NaN
      2006-09-30
      2006-10-31
      Electricity
      Energy charge
      NaN
      0.0
      0.00
      NaN
      61831.0
      NaN
    
    
      26
      CLXGP2
      FNSB-CLRA River Park General Area
      VU797000
      Army Corps of Engineers
      01 River Park
      11/30/2006
      11/30/2006
      11/14/2008
      NaN
      NaN
      2006-10-31
      2006-11-30
      Electricity
      Energy charge
      NaN
      0.0
      0.00
      NaN
      61831.0
      NaN



In [2]:

    
df_mo = pd.read_pickle('../df_mo.pkl')
df_mo.head()









    Out[2]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
    
  
  
    
      0
      44
      NaN
      NaN
      81.0
      412.0
      Electric
      kWh
      NaT
      2009-01-01
      2009-01-15
    
    
      1
      44
      NaN
      NaN
      99.0
      538.0
      Electric
      kWh
      NaT
      2009-02-01
      2009-02-15
    
    
      2
      44
      NaN
      NaN
      81.0
      432.0
      Electric
      kWh
      NaT
      2009-03-01
      2009-03-15
    
    
      3
      44
      NaN
      NaN
      78.0
      406.0
      Electric
      kWh
      NaT
      2009-04-01
      2009-04-15
    
    
      4
      44
      NaN
      NaN
      116.0
      771.0
      Electric
      kWh
      NaT
      2009-05-01
      2009-05-15



In [18]:

    
dfmf = None
for gp, recs in df_mo.groupby(['BuildingId', 'EnergyTypeName']):
    recs = recs.query('(DollarCost > 0) or (EnergyQuantity > 0)').copy()
    if len(recs) == 0:
        continue
    recs.sort_values(['Thru'], inplace=True)
    
    # Start date comes from prior record
    recs['From'] = recs.Thru.shift(1)
    recs['Item Description'] = 'Energy'
    recs.loc[recs.From.isnull(), 'From'] = recs.Thru - timedelta(days=30)

    # If any of the billing periods are longer than 45 days, then probably
    # a missing prior bill.  Just make those periods 30 days long.
    mask = ((recs.Thru - recs.From) > timedelta(days=45))
    recs.loc[mask, 'From'] = recs.Thru - timedelta(days=30)
    
    if dfmf is None:
        dfmf = recs.copy()
    else:
        dfmf =  dfmf.append(recs, ignore_index=True)



In [19]:

    
(dfmf.Thru - dfmf.From).value_counts()









    Out[19]:





31 days    23881
30 days    16182
28 days     3257
0 days      1872
29 days      379
32 days       35
35 days       19
33 days        9
25 days        6
27 days        6
2 days         4
22 days        3
38 days        2
37 days        2
34 days        1
26 days        1
36 days        1
5 days         1
45 days        1
3 days         1
dtype: int64



In [22]:

    
dfmf.query('EnergyUnitTypeName == "kW"')
dfmf.head()









    Out[22]:







  
    
      
      BuildingId
      DemandCost
      DemandUse
      DollarCost
      EnergyQuantity
      EnergyTypeName
      EnergyUnitTypeName
      MeterReadDate
      UsageDate
      Thru
      From
      Item Description
    
  
  
    
      0
      1
      NaN
      233.8
      10472.0
      129800.0
      Electric
      kWh
      NaT
      2009-01-01
      2009-01-15
      2008-12-16
      Energy
    
    
      1
      1
      NaN
      277.8
      9768.0
      117600.0
      Electric
      kWh
      NaT
      2009-02-01
      2009-02-15
      2009-01-15
      Energy
    
    
      2
      1
      NaN
      275.6
      11883.0
      116000.0
      Electric
      kWh
      NaT
      2009-03-01
      2009-03-15
      2009-02-15
      Energy
    
    
      3
      1
      NaN
      313.6
      15421.0
      158800.0
      Electric
      kWh
      NaT
      2009-04-01
      2009-04-15
      2009-03-15
      Energy
    
    
      4
      1
      NaN
      308.8
      13802.0
      137400.0
      Electric
      kWh
      NaT
      2009-05-01
      2009-05-15
      2009-04-15
      Energy

	BuildingAddress	BuildingCity	BuildingEnergyDetailList	BuildingId	BuildingName	BuildingOwnerId	BuildingOwnerName	BuildingState	BuildingStreet	BuildingUsageName	BuildingZip	OrganizationId	OrganizationName	SquareFeet	YearBuilt
0	207 Spruce Dr PORT LIONS AK 99615	PORT LIONS	None	44	Library	44	City of Port Lions	AK	207 Spruce Dr	Office	99615	2.0	Municipal Government/Subdivision	2000.0	1988.0
1	441 Bayview PORT LIONS AK 99550	PORT LIONS	None	45	Public Safety Building	45	City of Port Lions	AK	441 Bayview	Office	99550	2.0	Municipal Government/Subdivision	1700.0	1980.0
2	443 Bayview Dr PORT LIONS AK 99550	PORT LIONS	None	46	Water Building	46	City of Port Lions	AK	443 Bayview Dr	Other	99550	2.0	Municipal Government/Subdivision	1200.0	2008.0
3	177 N. Birch SOLDOTNA AK 99669	SOLDOTNA	None	47	City Hall	47	City of Soldotna	AK	177 N. Birch	Office	99669	2.0	Municipal Government/Subdivision	11676.0	1986.0
4	235 N. Binkley SOLDOTNA AK 99669	SOLDOTNA	None	48	Public Library	48	City of Soldotna	AK	235 N. Binkley	Other	99669	2.0	Municipal Government/Subdivision	10727.0	1972.0

	BuildingId	DemandCost	DemandUse	DollarCost	EnergyQuantity	EnergyTypeId	EnergyTypeName	EnergyUnitId	EnergyUnitTypeName	MeterReadDate	UsageDate	UsageYear
0	44	None	None	81.0	412.0	1	Electric	1	kWh	None	2009-01-01	2009
1	44	None	None	99.0	538.0	1	Electric	1	kWh	None	2009-02-01	2009
2	44	None	None	81.0	432.0	1	Electric	1	kWh	None	2009-03-01	2009
3	44	None	None	78.0	406.0	1	Electric	1	kWh	None	2009-04-01	2009
4	44	None	None	116.0	771.0	1	Electric	1	kWh	None	2009-05-01	2009

	BuildingId	DemandCost	DemandUse	DollarCost	EnergyQuantity	EnergyTypeName	EnergyUnitTypeName	MeterReadDate	UsageDate	Thru
21255	1	0.00	0.00	3909.87	4085.0	Natural Gas	CCF	2014-01-15	2014-01-01	2014-01-15
21256	1	3752.91	221.28	3959.78	82000.0	Electric	kWh	2014-01-21	2014-01-01	2014-01-21
21257	1	0.00	0.00	2716.99	2712.0	Natural Gas	CCF	2014-02-15	2014-02-01	2014-02-15
21258	1	3752.91	221.28	3921.15	81200.0	Electric	kWh	2014-02-21	2014-02-01	2014-02-21
21259	1	0.00	0.00	3223.50	3295.0	Natural Gas	CCF	2014-03-15	2014-03-01	2014-03-15

	BuildingId	DollarCost	EnergyQuantity	EnergyTypeName	EnergyUnitTypeName	MeterReadDate	UsageDate	Thru
45174	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-07-01	2016-07-15
45175	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-08-01	2016-08-15
45176	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-09-01	2016-09-15
45177	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-10-01	2016-10-15
45178	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-11-01	2016-11-15
45179	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-12-01	2016-12-15
45180	13650	440.79	39039.0	Demand - Electric	kWh	NaT	2016-01-01	2016-01-15
45181	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-02-01	2016-02-15
45182	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-03-01	2016-03-15
45183	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-04-01	2016-04-15
45184	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-05-01	2016-05-15
45185	13650	0.00	0.0	Demand - Electric	kWh	NaT	2016-06-01	2016-06-15

	BuildingId	DemandCost	DemandUse	DollarCost	EnergyQuantity
count	53.000000	53.000000	53.000000	53.000000	0.0
mean	813.698113	177.547925	4.778302	43.115094	NaN
std	2.197769	64.323284	2.938123	6.566051	NaN
min	798.000000	1.510000	0.240000	15.180000	NaN
25%	814.000000	182.510000	2.130000	38.380000	NaN
50%	814.000000	182.510000	6.800000	42.420000	NaN
75%	814.000000	237.360000	8.000000	48.730000	NaN
max	814.000000	237.360000	8.000000	52.760000	NaN

	BuildingId	DemandCost	DemandUse	DollarCost	EnergyQuantity	EnergyTypeName	EnergyUnitTypeName	MeterReadDate	UsageDate	Thru	From	Item Description
24	1	NaN	NaN	10561.95	91200.0	Electric	kWh	NaT	2011-03-01	2011-03-15	2011-02-13	Energy
34	1	2973.58	NaN	4814.98	133600.0	Electric	kWh	NaT	2012-03-01	2012-03-15	2012-02-14	Energy
56	1	3752.91	221.28	3959.78	82000.0	Electric	kWh	2014-01-21	2014-01-01	2014-01-21	2013-12-22	Energy
128	1	NaN	NaN	653.77	395.0	Natural Gas	CCF	NaT	2012-04-01	2012-04-15	2012-03-16	Energy
164	1	0.00	NaN	4043.89	4145.0	Natural Gas	CCF	2015-04-17	2015-04-01	2015-04-17	2015-03-18	Energy

	BuildingId	DollarCost	EnergyQuantity	EnergyTypeName	EnergyUnitTypeName	Thru	From	Item Description
0	1	10472.0	129800.0	Electric	kWh	2009-01-15	2008-12-16	Energy
1	1	9768.0	117600.0	Electric	kWh	2009-02-15	2009-01-15	Energy
2	1	11883.0	116000.0	Electric	kWh	2009-03-15	2009-02-15	Energy
3	1	15421.0	158800.0	Electric	kWh	2009-04-15	2009-03-15	Energy
4	1	13802.0	137400.0	Electric	kWh	2009-05-15	2009-04-15	Energy

	Site ID	Cost	Usage
Service Name
#1 Fuel Oil	61030333	4.069266e+07	1.402547e+07
#2 Fuel Oil	5628457	3.815718e+06	1.701116e+06
Coal	381679	9.760000e+03	7.705000e+02
Electric	225508046	1.660733e+08	1.105300e+09
Hot Wtr District Ht	2494589	1.556914e+06	2.750714e+10
Natural Gas	28425111	4.985783e+07	5.107488e+07
Propane	1830040	4.876821e+05	2.641949e+05
Spruce Wood	376717	-2.300000e+01	6.200000e+02
Steam District Ht	13588686	6.975902e+06	8.634288e+07

	site_id	service_type	cal_year	cal_mo	item_desc	units	cost	days_served	usage	fiscal_year	fiscal_mo	mmbtu
0	03	Electricity	2005	12	Electricity charge	kWh	1904.657880	49.5	14790.748577	2006	6	50.466034
1	03	Electricity	2006	1	Electricity charge	kWh	5430.493797	93.0	42665.790911	2006	7	145.575679
2	03	Electricity	2006	2	Electricity charge	kWh	5764.406730	84.0	45010.439348	2006	8	153.575619
3	03	Electricity	2006	3	Electricity charge	kWh	6349.255299	93.0	46311.547557	2006	9	158.015000
4	03	Electricity	2006	4	Electricity charge	kWh	5529.385224	90.0	40392.812893	2006	10	137.820278

	Site ID	Site Name	Vendor Code	Vendor Name	Account Number	Bill Date	Due Date	Entry Date	Invoice #	Voucher #	From	Thru	Service Name	Item Description	Meter Number	Usage	Cost	Units	Account Financial Code	Site Financial Code
22	CLXGP2	FNSB-CLRA River Park General Area	VU797000	Army Corps of Engineers	01 River Park	08/01/2006	08/01/2006	11/14/2008	NaN	NaN	2006-07-01	2006-07-31	Electricity	Energy charge	NaN	110.0	31.65	kWh	61831.0	NaN
23	CLXGP2	FNSB-CLRA River Park General Area	VU797000	Army Corps of Engineers	01 River Park	08/31/2006	08/31/2006	11/14/2008	NaN	NaN	2006-07-31	2006-08-31	Electricity	Energy charge	NaN	600.0	172.65	kWh	61831.0	NaN
24	CLXGP2	FNSB-CLRA River Park General Area	VU797000	Army Corps of Engineers	01 River Park	09/30/2006	09/30/2006	11/14/2008	NaN	NaN	2006-08-31	2006-09-30	Electricity	Energy charge	NaN	930.0	267.60	kWh	61831.0	NaN
25	CLXGP2	FNSB-CLRA River Park General Area	VU797000	Army Corps of Engineers	01 River Park	10/31/2006	10/31/2006	11/14/2008	NaN	NaN	2006-09-30	2006-10-31	Electricity	Energy charge	NaN	0.0	0.00	NaN	61831.0	NaN
26	CLXGP2	FNSB-CLRA River Park General Area	VU797000	Army Corps of Engineers	01 River Park	11/30/2006	11/30/2006	11/14/2008	NaN	NaN	2006-10-31	2006-11-30	Electricity	Energy charge	NaN	0.0	0.00	NaN	61831.0	NaN

	BuildingId	DemandCost	DemandUse	DollarCost	EnergyQuantity	EnergyTypeName	EnergyUnitTypeName	MeterReadDate	UsageDate	Thru	From	Item Description
0	1	NaN	233.8	10472.0	129800.0	Electric	kWh	NaT	2009-01-01	2009-01-15	2008-12-16	Energy
1	1	NaN	277.8	9768.0	117600.0	Electric	kWh	NaT	2009-02-01	2009-02-15	2009-01-15	Energy
2	1	NaN	275.6	11883.0	116000.0	Electric	kWh	NaT	2009-03-01	2009-03-15	2009-02-15	Energy
3	1	NaN	313.6	15421.0	158800.0	Electric	kWh	NaT	2009-04-01	2009-04-15	2009-03-15	Energy
4	1	NaN	308.8	13802.0	137400.0	Electric	kWh	NaT	2009-05-01	2009-05-15	2009-04-15	Energy