In [1]:
import powerplantmatching as pm
import pandas as pd
Load open source data published bei the Global Energy Observation, GEO. As you might know, this is not the original format of the database but the standardized format of powerplantmatching.
In [2]:
geo = pm.data.GEO()
geo.head()
Out[2]:
Load the data published by the ENTSOE which has the same format as the geo data.
In [3]:
entsoe = pm.data.ENTSOE()
entsoe.head()
Out[3]:
Whereas various options of inspection of provided by the pandas package, some more powerplant specific methods are applyable via an accessor 'powerplant'. It gives you a convenient way to inspect, manipulate the data:
In [4]:
geo.powerplant.plot_map();
In [5]:
geo.powerplant.lookup().head(20).to_frame()
Out[5]:
In [6]:
geo.powerplant.fill_missing_commyears().head()
Out[6]:
Of course the pandas function are also very convenient:
In [7]:
print('Total capacity of GEO is: \n {} MW \n'.format(geo.Capacity.sum()));
print('The technology types are: \n {} '.format(geo.Technology.unique()))
All open databases are so far not complete and cover only an part of overall European powerplants. We perceive the capacity gaps looking at the ENTSOE SO&AF Statistics.
In [8]:
stats = pm.data.Capacity_stats()
In [9]:
pm.plot.fueltype_totals_bar([geo, entsoe, stats], keys=["ENTSOE", "GEO", 'Statistics']);
The gaps for both datasets are unmistakable. Adding both datasets on top of each other would not be a solution, since the intersection of both sources are two high, and the resulting dataset would include many duplicates. A better approach is to merge the incomplete datasets together, respecting intersections and differences of each dataset.
Before comparing two lists of power plants, we need to make sure that the data sets are on the same level of aggretation. That is, we ensure that all power plants blocks are aggregated to powerplant stations.
In [10]:
dfs = [geo.powerplant.aggregate_units(), entsoe.powerplant.aggregate_units()]
intersection = pm.matching.combine_multiple_datasets(dfs)
In [11]:
intersection.head()
Out[11]:
The result of the matching process is a multiindexed dataframe. To bring the matched dataframe into a convenient format, we combine the information of the two source sources.
In [12]:
intersection = intersection.powerplant.reduce_matched_dataframe()
intersection.head()
Out[12]:
As you can see in the very last column, we can track which original data entries flew into the resulting one.
We can have a look into the Capacity statisitcs
In [13]:
pm.plot.fueltype_totals_bar([intersection, stats], keys=["Intersection", 'Statistics']);
In [14]:
combined = intersection.powerplant.extend_by_non_matched(entsoe).powerplant.extend_by_non_matched(geo)
In [15]:
pm.plot.fueltype_totals_bar([combined, stats], keys=["Combined", 'Statistics']);
The aggregated capacities roughly match the SO&AF for all conventional powerplants
powerplantmatching comes along with already matched data, this includes data from GEO, ENTSOE, OPSD, CARMA, GPD and ESE (ESE, only if you have followed the instructions)
In [16]:
m = pm.collection.matched_data()
In [17]:
m.powerplant.plot_map()
Out[17]:
In [18]:
pm.plot.fueltype_totals_bar([m, stats], keys=["Processed", 'Statistics']);
In [19]:
pm.plot.factor_comparison([m, stats], keys=['Processed', 'Statistics'])
Out[19]:
In [20]:
m.head()
Out[20]:
In [21]:
pd.concat([m[m.YearCommissioned.notnull()].groupby('Fueltype').YearCommissioned.count(),
m[m.YearCommissioned.isna()].fillna(1).groupby('Fueltype').YearCommissioned.count()],
keys=['YearCommissioned existent', 'YearCommissioned missing'], axis=1)
Out[21]:
In [ ]: