The data in this notebook is generation and consumption by fuel type for the entire US. These values are larger than what would be calculated by summing facility-level data. Note that the fuel types are somewhat aggregated (coal rather than BIT, SUB, LIG, etc). So when we multiply the fuel consumption by an emissions factor there will be some level of error.
The code assumes that you have already downloaded an ELEC.txt
file from EIA's bulk download website.
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import io, time, json
import pandas as pd
import os
import numpy as np
import math
In [3]:
path = os.path.join('Raw data', 'Electricity data', '2017-03-15 ELEC.txt')
with open(path, 'rb') as f:
raw_txt = f.readlines()
Fuel codes:
In [4]:
def line_to_df(line):
"""
Takes in a line (dictionary), returns a dataframe
"""
for key in ['latlon', 'source', 'copyright', 'description',
'geoset_id', 'iso3166', 'name', 'state']:
line.pop(key, None)
# Split the series_id up to extract information
# Example: ELEC.PLANT.GEN.388-WAT-ALL.M
series_id = line['series_id']
series_id_list = series_id.split('.')
# Use the second to last item in list rather than third
plant_fuel_mover = series_id_list[-2].split('-')
line['type'] = plant_fuel_mover[0]
# line['state'] = plant_fuel_mover[1]
line['sector'] = plant_fuel_mover[2]
temp_df = pd.DataFrame(line)
try:
temp_df['year'] = temp_df.apply(lambda x: x['data'][0][:4], axis=1).astype(int)
temp_df['month'] = temp_df.apply(lambda x: x['data'][0][-2:], axis=1).astype(int)
temp_df['value'] = temp_df.apply(lambda x: x['data'][1], axis=1)
temp_df.drop('data', axis=1, inplace=True)
return temp_df
except:
exception_list.append(line)
pass
In [5]:
exception_list = []
gen_rows = [row for row in raw_txt if 'ELEC.GEN' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
total_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_TOT_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
eg_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_EG_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
In [6]:
gen_df = pd.concat([line_to_df(json.loads(row)) for row in gen_rows])
In [7]:
#drop
gen_df.head()
Out[7]:
Multiply generation values by 1000 and change the units to MWh
In [8]:
gen_df.loc[:,'value'] *= 1000
gen_df.loc[:,'units'] = 'megawatthours'
In [9]:
gen_df['datetime'] = pd.to_datetime(gen_df['year'].astype(str) + '-' + gen_df['month'].astype(str), format='%Y-%m')
gen_df['quarter'] = gen_df['datetime'].dt.quarter
gen_df.rename_axis({'value':'generation (MWh)'}, axis=1, inplace=True)
In [10]:
#drop
gen_df.head()
Out[10]:
In [11]:
#drop
gen_df.loc[gen_df['type']=='OOG'].head()
Out[11]:
In [12]:
total_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in total_fuel_rows])
In [13]:
#drop
total_fuel_df.head()
Out[13]:
Multiply generation values by 1,000,000 and change the units to MMBtu
In [14]:
total_fuel_df.loc[:,'value'] *= 1E6
total_fuel_df.loc[:,'units'] = 'mmbtu'
In [15]:
total_fuel_df['datetime'] = pd.to_datetime(total_fuel_df['year'].astype(str) + '-' + total_fuel_df['month'].astype(str), format='%Y-%m')
total_fuel_df['quarter'] = total_fuel_df['datetime'].dt.quarter
total_fuel_df.rename_axis({'value':'total fuel (mmbtu)'}, axis=1, inplace=True)
In [16]:
#drop
total_fuel_df.head()
Out[16]:
In [17]:
#drop
total_fuel_df.loc[total_fuel_df['type']=='OOG'].head()
Out[17]:
In [18]:
eg_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in eg_fuel_rows])
In [19]:
#drop
eg_fuel_df.head()
Out[19]:
Multiply generation values by 1,000,000 and change the units to MMBtu
In [20]:
eg_fuel_df.loc[:,'value'] *= 1E6
eg_fuel_df.loc[:,'units'] = 'mmbtu'
In [21]:
eg_fuel_df['datetime'] = pd.to_datetime(eg_fuel_df['year'].astype(str) + '-' + eg_fuel_df['month'].astype(str), format='%Y-%m')
eg_fuel_df['quarter'] = eg_fuel_df['datetime'].dt.quarter
eg_fuel_df.rename_axis({'value':'elec fuel (mmbtu)'}, axis=1, inplace=True)
In [22]:
#drop
eg_fuel_df.head()
Out[22]:
In [23]:
merge_cols = ['type', 'year', 'month']
fuel_df = total_fuel_df.merge(eg_fuel_df[merge_cols+['elec fuel (mmbtu)']],
how='outer', on=merge_cols)
fuel_df.head()
Out[23]:
Not seeing the issue that shows up with facilities, where some facilities have positive total fuel consumption but no elec fuel consumption
In [24]:
#drop
fuel_df.loc[~(fuel_df['elec fuel (mmbtu)']>=0)]
Out[24]:
In [25]:
gen_fuel_df = gen_df.merge(fuel_df[merge_cols+['total fuel (mmbtu)', 'elec fuel (mmbtu)']],
how='outer', on=merge_cols)
gen_fuel_df.head()
Out[25]:
No records with positive fuel use but no generation
In [26]:
#drop
gen_fuel_df.loc[gen_fuel_df['generation (MWh)'].isnull()]
Out[26]:
The difficulty here is that EIA combines all types of coal fuel consumption together in the bulk download and API. Fortunately the emission factors for different coal types aren't too far off on an energy basis (BIT is 93.3 kg/mmbtu, SUB is 97.2 kg/mmbtu). I'm going to average the BIT and SUB factors rather than trying to do something more complicated. In 2015 BIT represented 45% of coal energy for electricity and SUB represented 48%.
Same issue with petroleum liquids. Using the average of DFO and RFO, which were the two largest share of petroleum liquids.
In [27]:
path = os.path.join('Clean data', 'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)
In [28]:
#drop
ef.index
Out[28]:
In [29]:
#drop
gen_fuel_df['type'].unique()
Out[29]:
Fuel codes:
In [30]:
#drop
ef.loc['NG', 'Fossil Factor']
Out[30]:
In [31]:
fuel_factors = {'NG' : ef.loc['NG', 'Fossil Factor'],
'PEL': ef.loc[['DFO', 'RFO'], 'Fossil Factor'].mean(),
'PC' : ef.loc['PC', 'Fossil Factor'],
'COW' : ef.loc[['BIT', 'SUB'], 'Fossil Factor'].mean(),
'OOG' : ef.loc['OG', 'Fossil Factor']}
In [32]:
#drop
fuel_factors
Out[32]:
In [33]:
# Start with 0 emissions in all rows
# For fuels where we have an emission factor, replace the 0 with the calculated value
gen_fuel_df['all fuel CO2 (kg)'] = 0
gen_fuel_df['elec fuel CO2 (kg)'] = 0
for fuel in fuel_factors.keys():
gen_fuel_df.loc[gen_fuel_df['type']==fuel,'all fuel CO2 (kg)'] = \
gen_fuel_df.loc[gen_fuel_df['type']==fuel,'total fuel (mmbtu)'] * fuel_factors[fuel]
gen_fuel_df.loc[gen_fuel_df['type']==fuel,'elec fuel CO2 (kg)'] = \
gen_fuel_df.loc[gen_fuel_df['type']==fuel,'elec fuel (mmbtu)'] * fuel_factors[fuel]
In [34]:
gen_fuel_df.loc[gen_fuel_df['type']=='COW',:].head()
Out[34]:
In [35]:
#drop
gen_fuel_df.loc[gen_fuel_df['type']=='OOG'].head()
Out[35]:
In [36]:
path = os.path.join('Clean data', 'EIA country-wide gen fuel CO2.csv')
gen_fuel_df.to_csv(path, index=False)
In [ ]:
In [ ]: