Calculate generation capacity by month

This notebook uses the december 2017 EIA-860m file to determine operable generating capacity by fuel category in every month from 2001-2017.

Because this method uses EIA-860 data on individual plants and generators it does not include the capacity of small scale or distributed solar. EIA does provide an estimate of small scale solar as part of the Electric Power Monthly (Table 6.1.A), although I am not sure if it is supposed to represent all installed non-utility solar in the US.

Instructions

The most recent EIA-860m file should be downloaded to the EIA downloads folder, and the correct file name should be used for loading data. Otherwise the code below can be run straight through as-is.


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import os
import pathlib
from pathlib import Path
import sys
from os.path import join
import json
import calendar
sns.set(style='white')

idx = pd.IndexSlice

Type of capacity

The default is net summer capacity, which is what EIA uses for their capacity factor calculations. eGRID uses nameplate capacity. Valid parameter values are:

  • nameplate capacity (mw)
  • net summer capacity (mw)
  • net winter capacity (mw)

In [2]:
capacity_type = 'net summer capacity (mw)'
# capacity_type = 'nameplate capacity (mw)'

In [3]:
%load_ext watermark
%watermark -iv -v


seaborn     0.8.1
pandas      0.22.0
json        2.0.9
CPython 3.6.4
IPython 6.2.1

In [4]:
# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

In [5]:
# add the 'src' directory as one where we can import modules
cwd = os.getcwd()
src_dir = join(cwd, os.pardir, 'src')
sys.path.append(src_dir)

data_path = Path(cwd, '..', 'Data storage')

In [6]:
%aimport Analysis.index
from Analysis.index import group_fuel_cats

%aimport Analysis.capacity
from Analysis.capacity import monthly_capacity_all, monthly_capacity_year
from Analysis.capacity import monthly_ng_type_all, monthly_ng_type_year

Load monthly EIA-860 data


In [7]:
file_path = data_path / 'EIA downloads' / 'december_generator2017.xlsx'
op = pd.read_excel(file_path, sheet_name='Operating', skiprows=1, skip_footer=1,
                   parse_dates={'op datetime': [14, 15]},
                   na_values=' ')

In [8]:
# need to make some helper functions for the retired generators sheet
def bad_month_values(month):
    'Change value to 1 if outside 1-12'

    if month > 12 or month < 1:
        new_month = 1
    else:
        new_month = month
    return new_month

def make_dt_col(df, month_col, year_col):
    months = df[month_col].astype(str)
    years = df[year_col].astype(str)
    dt_string = years + '-' + months + '-' + '01'
    dt = pd.to_datetime(dt_string)
    return dt

ret = pd.read_excel(file_path, sheet_name='Retired', skiprows=1, skip_footer=1,
                    converters={'Operating Month': bad_month_values},
                    # parse_dates={'op datetime': [16, 17],
                    #              'ret datetime': [14, 15]},
                    na_values=' ')

ret['op datetime'] = make_dt_col(ret, 'Operating Month', 'Operating Year')
ret['ret datetime'] = make_dt_col(ret, 'Retirement Month', 'Retirement Year')

Clean up column names and only keep desired columns


In [9]:
op.columns = op.columns.str.strip()
ret.columns = ret.columns.str.strip()

In [10]:
op_cols = [
    'Plant ID', 'Nameplate Capacity (MW)', 'Net Summer Capacity (MW)',
    'Energy Source Code', 'Prime Mover Code', 'op datetime'
]

ret_cols = [
    'Plant ID', 'Nameplate Capacity (MW)', 'Net Summer Capacity (MW)',
    'Energy Source Code', 'Prime Mover Code', 'Retirement Month',
    'Retirement Year', 'Operating Month', 'Operating Year',
    'op datetime', 'ret datetime'
]

In [11]:
op = op.loc[:, op_cols]
ret = ret.loc[:, ret_cols]

op.columns = op.columns.str.lower()
ret.columns = ret.columns.str.lower()

In [12]:
op.head()


Out[12]:
plant id nameplate capacity (mw) net summer capacity (mw) energy source code prime mover code op datetime
0 2 53.9 56.0 WAT HY 1963-07-01
1 3 153.1 55.0 NG ST 1954-02-01
2 3 153.1 55.0 NG ST 1954-07-01
3 3 403.7 362.0 BIT ST 1969-12-01
4 3 788.8 738.5 BIT ST 1971-10-01

Read fuel category definitions and apply to the generators


In [13]:
state_cat_path = data_path / 'Fuel categories' / 'State_facility.json'
custom_cat_path = data_path / 'Fuel categories' / 'Custom_results.json'
with open(state_cat_path) as json_data:
    state_cats = json.load(json_data)
with open(custom_cat_path) as json_data:
    custom_cats = json.load(json_data)

In [14]:
def reverse_cats(cat_file):
    'Reverse a dict of lists so each item in the list is a key'
    cat_map = {}
    for key, vals in cat_file.items():
        for val in vals:
            cat_map[val] = key
    return cat_map

In [15]:
# Aggregate EIA fuel codes to my final definitions
op['fuel'] = op.loc[:, 'energy source code'].map(reverse_cats(state_cats))
op['fuel category'] = op.loc[:, 'fuel'].map(reverse_cats(custom_cats))

ret['fuel'] = ret.loc[:, 'energy source code'].map(reverse_cats(state_cats))
ret['fuel category'] = ret.loc[:, 'fuel'].map(reverse_cats(custom_cats))

In [16]:
op.head()


Out[16]:
plant id nameplate capacity (mw) net summer capacity (mw) energy source code prime mover code op datetime fuel fuel category
0 2 53.9 56.0 WAT HY 1963-07-01 HYC Hydro
1 3 153.1 55.0 NG ST 1954-02-01 NG Natural Gas
2 3 153.1 55.0 NG ST 1954-07-01 NG Natural Gas
3 3 403.7 362.0 BIT ST 1969-12-01 COW Coal
4 3 788.8 738.5 BIT ST 1971-10-01 COW Coal

Load the NERC region each power plant is in and add to dataframes


In [28]:
nercs_path = data_path / 'Facility labels' / 'Facility locations_RF.csv'
facility_nerc = pd.read_csv(nercs_path, index_col=['nerc', 'year'])
facility_nerc.sort_index(inplace=True)

Need to make this into a dictionary of dictionaries of lists (year, nerc, plant ids)


In [29]:
nerc_dict = {}
for year in facility_nerc.index.get_level_values('year').unique():
    nerc_dict[year] = {}
    
    for nerc in facility_nerc.index.get_level_values('nerc').unique():
        nerc_dict[year][nerc] = facility_nerc.loc[idx[nerc, year], 'plant id'].tolist()

In [30]:
# Make sure there aren't lots of plants in 2016 that disappear in 2017
set(nerc_dict[2016]['MRO']) - set(nerc_dict[2017]['MRO'])


Out[30]:
set()

Determine operable capacity in every month


In [31]:
# Define iterables to loop over
years = range(2001,2018)
months = range(1,13)

fuels = list(custom_cats.keys())

# capacity_type is defined at the top of this notebook
op_df_capacity = monthly_capacity_all(op=op, ret=ret, years=years,
                                      nerc_plant_list=nerc_dict, fuels=fuels,
                                      cap_type=capacity_type, n_jobs=-1,
                                      print_year=False)

In [32]:
op_df_capacity.tail()


Out[32]:
active capacity possible gen datetime
nerc fuel category year month
WECC Wind 2017 8 19611.4 1.45909e+07 2017-08-01
9 19611.4 1.41202e+07 2017-09-01
10 19611.4 1.45909e+07 2017-10-01
11 19611.4 1.41202e+07 2017-11-01
12 19611.4 1.45909e+07 2017-12-01

In [36]:
# Write data to file
out_path = data_path / 'Derived data' / 'Plant capacity' / 'monthly capacity by fuel.csv'
op_df_capacity.to_csv(out_path)

Determine natural gas capacity by prime mover type in each month


In [37]:
# Define iterables to loop over
years = range(2001,2018)
months = range(1,13)

op_ng_type = monthly_ng_type_all(op=op, ret=ret, years=years,
                                 nerc_plant_list=nerc_dict, fuels=fuels,
                                 cap_type=capacity_type, n_jobs=-1,
                                 print_year=False)

In [38]:
op_ng_type.head()


Out[38]:
ngcc turbine other total ngcc fraction turbine fraction other fraction datetime
nerc year month
ASCC 2001 1 214.4 592.4 3.0 809.8 0.264757 0.731539 0.003705 2001-01-01
2 214.4 592.4 3.0 809.8 0.264757 0.731539 0.003705 2001-02-01
3 214.4 592.4 3.0 809.8 0.264757 0.731539 0.003705 2001-03-01
4 214.4 592.4 3.0 809.8 0.264757 0.731539 0.003705 2001-04-01
5 214.4 592.4 3.0 809.8 0.264757 0.731539 0.003705 2001-05-01

In [39]:
out_path = data_path / 'Derived data' / 'Plant capacity' / 'monthly natural gas split.csv'
op_ng_type.to_csv(out_path)