The goal of this project is to build a model that can predict the type of fossil marginal generating units (MGU) that will provide electricity for additional demand at any given set of grid conditions. This type of problem can be very difficult to solve, especially if the model is also trying to predict grid conditions like demand or wind generation. We are simplifying the model by treating these inputs as exogenous - the time of day or day of the year doesn't matter.
Predicting which individual power plant will provide marginal generation under a given set of grid conditions is also difficult, and prone to overfitting. We will group individual fossil power plants based on their fuel type, heat rate, and historical operating behavior using k-means clustering. The model will predict which group or groups is likely to provide marginal generation.
We use 9 years (2007-2015) of hourly generation and load data for the ERCOT power grid. Over the 9 years of data we see a large increase in the amount of wind generation and increased generation from natural gas power plants.
As planned, the model will have the following inputs:
We are using data from ERCOT, EIA, and EPA as described below.
The figure below shows all ERCOT fossil power plants active in 2007. Non-fossil power plants - which are not available in the hourly EPA data - have been removed from the dataset. Coal power plants tended to be larger - at least 500 megawatts (MW) - and were running at high capacity factors. Natural gas power plants covered a wide range of sizes from a few MW to over 2 GW, but the largest plants had low capacity factors. There are also a few small diesel and petroleum coke facilities.
In [1]:
from IPython.display import SVG
SVG('https://www.dropbox.com/s/k8ac0la03hkjo5f/ERCOT%20power%20plants%202007.svg?raw=1')
Out[1]:
This figure shows the hourly gross load from three sample plants over the last 6 months of 2015. These three sample plants happen to show a range of different sizes and behaviors.
The left and middle subplots represent coal plants with 1 and 2 units, all of which have minimum operating loads. We hope that aggregating facilities into groups will allow us to ignore shutoff below the minimum load - we care about the change in group generation rather than if an individual power plant goes from off to on.
The plant on the right consists of two natural gas combustion turbines, which can quickly turn on and ramp up. It never appears to hit its maximum generation of ~250MW.
In [3]:
SVG('https://www.dropbox.com/s/k79xmwfbu4dt16h/Sample%20hourly%20load.svg?raw=1')
Out[3]:
ERCOT provides hourly data of load, wind generation, and the percent of load that is served by wind generation. This figure shows the distribution of that last dataset by year. It is easy to see that more of the load is served by wind each year. The distribution also flattens out - fewer hours see a very small amount of the load covered by wind.
Each violin has a small boxplot in the middle of it. The white circle at the center shows the median value, which increases every year, and was over 10% in 2015.
In [4]:
SVG('https://www.dropbox.com/s/wjpa9sbj3sklpfc/Wind%20violin%20plot.svg?raw=1')
Out[4]:
These figures show monthly average load and wind generation over each year. Load (on the left) follows a predictable pattern with peak demand in the summer months. Wind generation (on the right) is a little messier. Over most years there is a dip in the summer, but we don't see the same dip in 2015.
In [5]:
SVG('https://www.dropbox.com/s/onet87a33pjvhym/Monthly%20ERCOT%20load%20and%20wind2.svg?raw=1')
Out[5]:
In [10]:
SVG('https://www.dropbox.com/s/9k7nyhjt78k1quy/Monthly%20ERCOT%20wind%20capacity.svg?raw=1')
Out[10]:
Now that we have the required data, our next steps include: