Predicting Marginal Generating Units in ERCOT

Project Goals and Model Choices

The goal of this project is to build a model that can predict the type of fossil marginal generating units (MGU) that will provide electricity for additional demand at any given set of grid conditions. This type of problem can be very difficult to solve, especially if the model is also trying to predict grid conditions like demand or wind generation. We are simplifying the model by treating these inputs as exogenous - the time of day or day of the year doesn't matter.

Predicting which individual power plant will provide marginal generation under a given set of grid conditions is also difficult, and prone to overfitting. We will group individual fossil power plants based on their fuel type, heat rate, and historical operating behavior using k-means clustering. The model will predict which group or groups is likely to provide marginal generation.

We use 9 years (2007-2015) of hourly generation and load data for the ERCOT power grid. Over the 9 years of data we see a large increase in the amount of wind generation and increased generation from natural gas power plants.

Model Inputs

As planned, the model will have the following inputs:

Current installed capacity for wind and each fossil group
Current unused capacity of each fossil group
- This serves to normalize inputs, and is done rather than using total load as an input
Current wind generation
- We only have hourly wind generation, but effects of intra-hour variability on the system should increase when wind generation is high
Wind generation in recent past (past 1, 2, ..., n hours - maybe up to n=5?)
- Combined with change in demand to create a change in "net load" to account for ramp rate constraints
Fuel price for electric customers in Texas

Data Sources

We are using data from ERCOT, EIA, and EPA as described below.

ERCOT
- Hourly wind generation
- Total hourly demand
Energy Information Agency (EIA)
- Average heat rate and primary fuel source for each power plant (Form 923)
- Maximum generating capacity of each power plant (Form 860)
- Historical annual and monthly capacity factors for each power plant (923 and 860)
- Historical natural gas/coal prices for electric customers in Texas (monthly/quarterly)
Environmental Protection Agency (EPA)
- Hourly gross load (MW) for every fossil power plant in ERCOT

Initial Data Exploration

Our work so far has primarily consisted of model design and data collection. We have collected all of the data described above, and started importing/cleaning/merging data.

Fossil Power Plants (EIA data)

The figure below shows all ERCOT fossil power plants active in 2007. Non-fossil power plants - which are not available in the hourly EPA data - have been removed from the dataset. Coal power plants tended to be larger - at least 500 megawatts (MW) - and were running at high capacity factors. Natural gas power plants covered a wide range of sizes from a few MW to over 2 GW, but the largest plants had low capacity factors. There are also a few small diesel and petroleum coke facilities.



In [1]:

    
from IPython.display import SVG
SVG('https://www.dropbox.com/s/k8ac0la03hkjo5f/ERCOT%20power%20plants%202007.svg?raw=1')









    Out[1]:

Hourly Fossil Power Plant Generation (EPA data)

This figure shows the hourly gross load from three sample plants over the last 6 months of 2015. These three sample plants happen to show a range of different sizes and behaviors.

The left and middle subplots represent coal plants with 1 and 2 units, all of which have minimum operating loads. We hope that aggregating facilities into groups will allow us to ignore shutoff below the minimum load - we care about the change in group generation rather than if an individual power plant goes from off to on.

The plant on the right consists of two natural gas combustion turbines, which can quickly turn on and ramp up. It never appears to hit its maximum generation of ~250MW.



In [3]:

    
SVG('https://www.dropbox.com/s/k79xmwfbu4dt16h/Sample%20hourly%20load.svg?raw=1')









    Out[3]:

Distribution of Wind Power Over Time (ERCOT data)

ERCOT provides hourly data of load, wind generation, and the percent of load that is served by wind generation. This figure shows the distribution of that last dataset by year. It is easy to see that more of the load is served by wind each year. The distribution also flattens out - fewer hours see a very small amount of the load covered by wind.

Each violin has a small boxplot in the middle of it. The white circle at the center shows the median value, which increases every year, and was over 10% in 2015.



In [4]:

    
SVG('https://www.dropbox.com/s/wjpa9sbj3sklpfc/Wind%20violin%20plot.svg?raw=1')









    Out[4]:

ERCOT Load and Wind Generation Over Time

These figures show monthly average load and wind generation over each year. Load (on the left) follows a predictable pattern with peak demand in the summer months. Wind generation (on the right) is a little messier. Over most years there is a dip in the summer, but we don't see the same dip in 2015.



In [5]:

    
SVG('https://www.dropbox.com/s/onet87a33pjvhym/Monthly%20ERCOT%20load%20and%20wind2.svg?raw=1')









    Out[5]:

ERCOT Installed Wind Capacity

This helps explain the figure above. There was lots of wind installed over the course of 2015, which could have helped bump up the output.



In [10]:

    
SVG('https://www.dropbox.com/s/9k7nyhjt78k1quy/Monthly%20ERCOT%20wind%20capacity.svg?raw=1')









    Out[10]:

Next Steps

Now that we have the required data, our next steps include:

Calculate historical ramp rates from the EPA hourly gross generation data
Cluster power plants using k-means
- Variables include:
  - Fuel type
  - 1-hr ramp rates
  - Plant capacity
  - Mean efficiency & variance
  - Mean CF & variance
- If unsupervised clustering doesn't give reasonable results, we will use expert judgment to group the facilities
Determine the marginal fossil power plants (and groups) for each hour
- This behavior is the y-vector that we hope to accurately predict
Assemble a matrix of model inputs (X-values, described above)
Split the 9 years of hourly data into training and testing sets
Design and train a model