National generation and fuel consumption

The data in this notebook is generation and consumption by fuel type for the entire US. These values are larger than what would be calculated by summing facility-level data. Note that the fuel types are somewhat aggregated (coal rather than BIT, SUB, LIG, etc). So when we multiply the fuel consumption by an emissions factor there will be some level of error.

The code assumes that you have already downloaded an ELEC.txt file from EIA's bulk download website.



In [2]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import io, time, json
import pandas as pd
import os
import numpy as np
import math

Read ELECT.txt file



In [3]:

    
path = os.path.join('Raw data', 'Electricity data', '2017-03-15 ELEC.txt')
with open(path, 'rb') as f:
    raw_txt = f.readlines()

Filter lines to only include total generation and fuel use

Only want monthly US data for all sectors

US-99.M
ELEC.GEN, ELEC.CONS_TOT_BTU, ELEC.CONS_EG_BTU
not ALL

Fuel codes:

WWW, wood and wood derived fuels
WND, wind
STH, solar thermal
WAS, other biomass
TSN, all solar
SUN, utility-scale solar
NUC, nuclear
NG, natural gas
PEL, petroleum liquids
SPV, utility-scale solar photovoltaic
PC, petroluem coke
OTH, other
COW, coal,
DPV, distributed photovoltaic
OOG, other gases
HPS, hydro pumped storage
HYC, conventional hydroelectric
GEO, geothermal
AOR, other renewables (total)



In [4]:

    
def line_to_df(line):
    """
    Takes in a line (dictionary), returns a dataframe
    """
    for key in ['latlon', 'source', 'copyright', 'description', 
                'geoset_id', 'iso3166', 'name', 'state']:
        line.pop(key, None)

    # Split the series_id up to extract information
    # Example: ELEC.PLANT.GEN.388-WAT-ALL.M
    series_id = line['series_id']
    series_id_list = series_id.split('.')
    # Use the second to last item in list rather than third
    plant_fuel_mover = series_id_list[-2].split('-')
    line['type'] = plant_fuel_mover[0]
#     line['state'] = plant_fuel_mover[1]
    line['sector'] = plant_fuel_mover[2]
    temp_df = pd.DataFrame(line)

    try:
        temp_df['year'] = temp_df.apply(lambda x: x['data'][0][:4], axis=1).astype(int)
        temp_df['month'] = temp_df.apply(lambda x: x['data'][0][-2:], axis=1).astype(int)
        temp_df['value'] = temp_df.apply(lambda x: x['data'][1], axis=1)
        temp_df.drop('data', axis=1, inplace=True)
        return temp_df
    except:
        exception_list.append(line)
        pass



In [5]:

    
exception_list = []
gen_rows = [row for row in raw_txt if 'ELEC.GEN' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
total_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_TOT_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
eg_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_EG_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]

All generation by fuel



In [6]:

    
gen_df = pd.concat([line_to_df(json.loads(row)) for row in gen_rows])



In [7]:

    
#drop
gen_df.head()









    Out[7]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      value
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      thousand megawatthours
      2016
      12
      32426.66031
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      thousand megawatthours
      2016
      11
      28579.08298
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      thousand megawatthours
      2016
      10
      29919.29294
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      thousand megawatthours
      2016
      9
      26695.98579
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      thousand megawatthours
      2016
      8
      24509.79210

Multiply generation values by 1000 and change the units to MWh



In [8]:

    
gen_df.loc[:,'value'] *= 1000
gen_df.loc[:,'units'] = 'megawatthours'



In [9]:

    
gen_df['datetime'] = pd.to_datetime(gen_df['year'].astype(str) + '-' + gen_df['month'].astype(str), format='%Y-%m')
gen_df['quarter'] = gen_df['datetime'].dt.quarter
gen_df.rename_axis({'value':'generation (MWh)'}, axis=1, inplace=True)



In [10]:

    
#drop
gen_df.head()









    Out[10]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      generation (MWh)
      datetime
      quarter
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      12
      32426660.31
      2016-12-01
      4
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      11
      28579082.98
      2016-11-01
      4
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      10
      29919292.94
      2016-10-01
      4
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      9
      26695985.79
      2016-09-01
      3
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      8
      24509792.10
      2016-08-01
      3



In [11]:

    
#drop
gen_df.loc[gen_df['type']=='OOG'].head()









    Out[11]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      generation (MWh)
      datetime
      quarter
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.OOG-US-99.M
      200101
      OOG
      megawatthours
      2016
      12
      1007405.42
      2016-12-01
      4
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.OOG-US-99.M
      200101
      OOG
      megawatthours
      2016
      11
      1001151.99
      2016-11-01
      4
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.OOG-US-99.M
      200101
      OOG
      megawatthours
      2016
      10
      891165.18
      2016-10-01
      4
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.OOG-US-99.M
      200101
      OOG
      megawatthours
      2016
      9
      1050049.21
      2016-09-01
      3
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.OOG-US-99.M
      200101
      OOG
      megawatthours
      2016
      8
      1101829.28
      2016-08-01
      3

Total fuel consumption



In [12]:

    
total_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in total_fuel_rows])



In [13]:

    
#drop
total_fuel_df.head()









    Out[13]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      value
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      12
      1260.87651
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      11
      934.68259
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      10
      1057.00563
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      9
      1216.53859
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      8
      1450.03624

Multiply generation values by 1,000,000 and change the units to MMBtu



In [14]:

    
total_fuel_df.loc[:,'value'] *= 1E6
total_fuel_df.loc[:,'units'] = 'mmbtu'



In [15]:

    
total_fuel_df['datetime'] = pd.to_datetime(total_fuel_df['year'].astype(str) + '-' + total_fuel_df['month'].astype(str), format='%Y-%m')
total_fuel_df['quarter'] = total_fuel_df['datetime'].dt.quarter
total_fuel_df.rename_axis({'value':'total fuel (mmbtu)'}, axis=1, inplace=True)



In [16]:

    
#drop
total_fuel_df.head()









    Out[16]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      total fuel (mmbtu)
      datetime
      quarter
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      12
      1.260877e+09
      2016-12-01
      4
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      11
      9.346826e+08
      2016-11-01
      4
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      10
      1.057006e+09
      2016-10-01
      4
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      9
      1.216539e+09
      2016-09-01
      3
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      8
      1.450036e+09
      2016-08-01
      3



In [17]:

    
#drop
total_fuel_df.loc[total_fuel_df['type']=='OOG'].head()









    Out[17]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      total fuel (mmbtu)
      datetime
      quarter

Electric generation fuel consumption



In [18]:

    
eg_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in eg_fuel_rows])



In [19]:

    
#drop
eg_fuel_df.head()









    Out[19]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      value
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      12
      1236.23483
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      11
      913.35719
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      10
      1036.24820
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      9
      1195.13663
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      million MMBtu
      2016
      8
      1426.01303

Multiply generation values by 1,000,000 and change the units to MMBtu



In [20]:

    
eg_fuel_df.loc[:,'value'] *= 1E6
eg_fuel_df.loc[:,'units'] = 'mmbtu'



In [21]:

    
eg_fuel_df['datetime'] = pd.to_datetime(eg_fuel_df['year'].astype(str) + '-' + eg_fuel_df['month'].astype(str), format='%Y-%m')
eg_fuel_df['quarter'] = eg_fuel_df['datetime'].dt.quarter
eg_fuel_df.rename_axis({'value':'elec fuel (mmbtu)'}, axis=1, inplace=True)



In [22]:

    
#drop
eg_fuel_df.head()









    Out[22]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      elec fuel (mmbtu)
      datetime
      quarter
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      12
      1.236235e+09
      2016-12-01
      4
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      11
      9.133572e+08
      2016-11-01
      4
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      10
      1.036248e+09
      2016-10-01
      4
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      9
      1.195137e+09
      2016-09-01
      3
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_EG_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      8
      1.426013e+09
      2016-08-01
      3

Combine three datasets

Need to estimate fuel use for OOG, because EIA doesn't include any (this is only ~2% of OOG fuel for electricity in 2015).



In [23]:

    
merge_cols = ['type', 'year', 'month']

fuel_df = total_fuel_df.merge(eg_fuel_df[merge_cols+['elec fuel (mmbtu)']], 
                              how='outer', on=merge_cols)
fuel_df.head()









    Out[23]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      total fuel (mmbtu)
      datetime
      quarter
      elec fuel (mmbtu)
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      12
      1.260877e+09
      2016-12-01
      4
      1.236235e+09
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      11
      9.346826e+08
      2016-11-01
      4
      9.133572e+08
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      10
      1.057006e+09
      2016-10-01
      4
      1.036248e+09
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      9
      1.216539e+09
      2016-09-01
      3
      1.195137e+09
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.CONS_TOT_BTU.COW-US-99.M
      200101
      COW
      mmbtu
      2016
      8
      1.450036e+09
      2016-08-01
      3
      1.426013e+09

Not seeing the issue that shows up with facilities, where some facilities have positive total fuel consumption but no elec fuel consumption



In [24]:

    
#drop
fuel_df.loc[~(fuel_df['elec fuel (mmbtu)']>=0)]









    Out[24]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      total fuel (mmbtu)
      datetime
      quarter
      elec fuel (mmbtu)



In [25]:

    
gen_fuel_df = gen_df.merge(fuel_df[merge_cols+['total fuel (mmbtu)', 'elec fuel (mmbtu)']], 
                           how='outer', on=merge_cols)
gen_fuel_df.head()









    Out[25]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      generation (MWh)
      datetime
      quarter
      total fuel (mmbtu)
      elec fuel (mmbtu)
    
  
  
    
      0
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      12
      32426660.31
      2016-12-01
      4
      NaN
      NaN
    
    
      1
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      11
      28579082.98
      2016-11-01
      4
      NaN
      NaN
    
    
      2
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      10
      29919292.94
      2016-10-01
      4
      NaN
      NaN
    
    
      3
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      9
      26695985.79
      2016-09-01
      3
      NaN
      NaN
    
    
      4
      201612
      M
      USA
      2017-03-03T09:16:27-05:00
      99
      ELEC.GEN.AOR-US-99.M
      200101
      AOR
      megawatthours
      2016
      8
      24509792.10
      2016-08-01
      3
      NaN
      NaN

No records with positive fuel use but no generation



In [26]:

    
#drop
gen_fuel_df.loc[gen_fuel_df['generation (MWh)'].isnull()]









    Out[26]:






  
    
      
      end
      f
      geography
      last_updated
      sector
      series_id
      start
      type
      units
      year
      month
      generation (MWh)
      datetime
      quarter
      total fuel (mmbtu)
      elec fuel (mmbtu)

Add CO₂ emissions

The difficulty here is that EIA combines all types of coal fuel consumption together in the bulk download and API. Fortunately the emission factors for different coal types aren't too far off on an energy basis (BIT is 93.3 kg/mmbtu, SUB is 97.2 kg/mmbtu). I'm going to average the BIT and SUB factors rather than trying to do something more complicated. In 2015 BIT represented 45% of coal energy for electricity and SUB represented 48%.

Same issue with petroleum liquids. Using the average of DFO and RFO, which were the two largest share of petroleum liquids.



In [27]:

    
path = os.path.join('Clean data', 'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)



In [28]:

    
#drop
ef.index









    Out[28]:





Index([u'BIT', u'DFO', u'GEO', u'JF', u'KER', u'LIG', u'MSW', u'NG', u'PC',
       u'PG', u'RC', u'RFO', u'SGC', u'SGP', u'SUB', u'TDF', u'WC', u'WO',
       u'BFG', u'MSN', u'SC', u'OG', u'AB', u'BLQ', u'LFG', u'MSB', u'NUC',
       u'OBG', u'OBL', u'OBS', u'OTH', u'PUR', u'SLW', u'SUN', u'WAT', u'WDL',
       u'WDS', u'WH', u'WND'],
      dtype='object', name=u'EIA Fuel Code')



In [29]:

    
#drop
gen_fuel_df['type'].unique()









    Out[29]:





array([u'AOR', u'DPV', u'GEO', u'HPS', u'HYC', u'PC', u'COW', u'OOG',
       u'OTH', u'NUC', u'NG', u'WAS', u'STH', u'SPV', u'PEL', u'TSN',
       u'SUN', u'WWW', u'WND'], dtype=object)

Match general types with specific fuel codes