pyam_analysis
packageThe pyam-analysis
package provides a range of diagnostic tools and functions
for analyzing and working with IAMC-style timeseries data.
The package can be used with data that follows the data template convention of the Integrated Assessment Modeling Consortium (IAMC). An illustrative example is shown below; see data.ene.iiasa.ac.at/database for more information.
model | scenario | region | variable | unit | 2005 | 2010 | 2015 |
---|---|---|---|---|---|---|---|
MESSAGE V.4 | AMPERE3-Base | World | Primary Energy | EJ/y | 454.5 | 479.6 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
This notebook illustrates some basic functionality of the pyam-analsysis
package
and the IamDataFrame
class:
The timeseries data used in this tutorial is a partial snapshot of the scenario database compiled for the IPCC's Fifth Assessment Report (AR5):
Krey V., O. Masera, G. Blanford, T. Bruckner, R. Cooke, K. Fisher-Vanden, H. Haberl, E. Hertwich, E. Kriegler, D. Mueller, S. Paltsev, L. Price, S. Schlömer, D. Ürge-Vorsatz, D. van Vuuren, and T. Zwickel, 2014: Annex II: Metrics & Methodology.
In: Climate Change 2014: Mitigation of Climate Change. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Edenhofer, O., R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K. Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier, B. Kriemann, J. Savolainen, S. Schlömer, C. von Stechow, T. Zwickel and J.C. Minx (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA. Link
The complete database is publicly available at tntcat.iiasa.ac.at/AR5DB/.
The data snapshot used for this tutorial consists of selected data from two model intercomparison projects:
- Riahi, K., et al. (2015). "Locked into Copenhagen pledges — Implications of short-term emission targets for the cost and feasibility of long-term climate goals." Technological Forecasting and Social Change 90(Part A): 8-23.
DOI: 10.1016/j.techfore.2013.09.016- Kriegler, E., et al. (2015). "Making or breaking climate targets: The AMPERE study on staged accession scenarios for climate policy." Technological Forecasting and Social Change 90(Part A): 24-44.
DOI: 10.1016/j.techfore.2013.09.021
In [1]:
import pyam_analysis as iam
In [2]:
data = '/home/gidden/work/iiasa/message/pyam-analysis/tutorial/tutorial_AR5_data.csv'
df = iam.IamDataFrame(data=data)
In [3]:
df.models()
Out[3]:
In [4]:
df.scenarios()
Out[4]:
In [5]:
df.regions()
Out[5]:
In [6]:
df.variables(include_units=True)
Out[6]:
Most functions of the IamDataFrame
class take an (optional) argument filters
,
i.e., a dictionary of filter criteria.
The feature for filtering by model, scenario or region is implemented using regular expressions (regex, re) and the re.match() function. This implies that the filtering is done from the beginning of the text string.
Applying the filter
'model': 'MESSAGE'
to the functionscenarios()
will return all MESSAGE V.4 scenarios included in the snapshot.
Filtering for ESSAGE will return an empty set.
In [7]:
df.scenarios({'model': 'MESSAGE'})
Out[7]:
In [8]:
df.scenarios({'model': 'ESSAGE'})
Out[8]:
Filtering for variable strings using regex
is problematic due to the frequent use of the "|"
character in the IAMC template
to specify a hierarchical.
Therefore, this package implements a pseudo-regex syntax,
where |
is escaped, *
is used as a wildcard
and exact matching at the end of the string is enforced.
(in regex lingo, *
is replaced by .*
and $
is appended to the filter string).
Filtering for Primary Energy will return only exactly those data.
Filtering for Primary Energy|* will return all sub-categories of primary-energy level (and only the sub-categories).
In additon, IAM variables can be filtered by the level,
i.e., the "depth" of the variable in a hierarchical reading of the string separated by "|"
.
That is, the variable Primary Energy has level 0, while Primary Energy|Coal has level 1.
Filtering by both variables and level will search for the hierarchical depth
following the
To illustrate the functionality of the filters, we first show all sub-categories of the Emissions
variable.
Then, we reduce variables to only two hierarchical levels below "Emissions|"
; the list returned by the function call will not include Emissions|CO2|Fossil Fuels and Industry|Energy Supply|Electricity
, because this variable is three hierarchical levels below "Emissions|"
.
The third example shows how to filter only by hierarchical level. The function returns all variables that are at the top hierarchical level (i.e., Primary Energy
) and those at the first sub-category level.
Keep in mind that there are no variables Emissions
or Price
(no top level).
In [9]:
df.variables(filters={'variable': 'Emissions|*'})
Out[9]:
In [10]:
df.variables(filters={'variable': 'Emissions|*', 'level': 2})
Out[10]:
In [11]:
df.variables(filters={'level': 1})
Out[11]:
In [12]:
df.models?
As a next step, we want to view a selection of the data in the tutorial snapshot using the IAMC standard.
The filtered data can exported as a csv file by appending .to_csv('selected_data.csv')
to the next command.
For displaying data in a different format, the class IamDataFrame
has a wrapper of the pandas.DataFrame.pivot_table()
function. It allows to flexibly specify the columns and rows.
The function automatically aggregates by summation or counting (specified by the parameter aggfunc
)
over all timeseries data identifiers ('model', 'scenario', 'variable', 'region', 'unit', 'year')
which are not used as index
or columns
.
In the example below, the filter of the timeseries data is set for all subcategories of 'Primary Energy', which are then summed up in the displayed table.
In [13]:
df.timeseries(filters={
'scenario': 'AMPERE3-450',
'variable': 'Primary Energy|Coal',
'region': 'World'
}).head()
Out[13]:
In [14]:
df.pivot_table(
index=['year'],
columns=['scenario'],
values='value',
aggfunc='sum',
filters={'variable': 'Primary Energy', 'region': 'World'}
).head()
Out[14]:
If you are familiar with the python
package pandas
, you can access the pd.DataFrame
directly.
In [15]:
df.data.head()
Out[15]:
As a next step, we want to visualize timeseries data. In the plot below, we show CO2 emissions over time for all scenarios provided in the tutorial snapshot data.
In [16]:
df.plot_lines({'variable': 'Emissions|CO2', 'region': 'World'})
When analyzing scenario results, it is often useful to check whether certain timeseries exist or the values are within a specific range. For example, it may make sense to ensure that reported data for historical periods are close to established reference data.
The following section provides three illustrations:
'Primary Energy'
exists in each scenario (in at least one year).'Primary Energy'
at the global level exceeds 515 EJ/y
in the reference year 2010
(the value must satisfy an upper bound of 515 EJ/y in this notation).'Primary Energy|Coal'
exceeds 400 EJ/y in mid-century.The validate()
function takes a filters
dictionary to perform the checks on a selection of models/scenarios
similar to the functions introduced above.
The criteria
argument can specify a valid range by an upper and lower bound (up
, lo
) for a variable and a subset of years to which the validation is applied - all scenarios with a value in at least one year outside that range are considered to not satisfy the validation.
By setting the argument exclude=True
, all scenarios failing the validation will be categorized as exclude
.
These scenarios will not be shown by default in any subsequent data tables or plots.
In [17]:
df.validate?
In [18]:
df.validate('Primary Energy')
In [19]:
df.validate({'Primary Energy': {'up': 515, 'year': 2010}})
Out[19]:
In [20]:
df.validate(
{'Primary Energy|Coal': {'up': 400, 'year': 2050}},
filters={'region': 'World'},
exclude=False
)
Out[20]:
It is often useful to apply categorization to classes of scenarios according to specific characteristics of the timeseries data.
In the following example, we use the temperature change assessment by MAGICC 6 to group scenarios by the median global warming by the end of the century (year 2100).
We proceed in the following steps:
category()
to apply a categorization (and colour code for later use)
to all scenarios that satisfy a number of specific criteria.
In [21]:
df.plot_lines({'variable': 'Temperature*'})
We now use the categorization feature of the pyam-analysis
package.
By default, each model/scenario is assigned as "uncategorized"
.
The next function resets all scenarios back to "uncategorized"
. This may be helpful in this tutorial if you are going back and forth between cells.
In [22]:
df.reset_category()
In [23]:
df.category(
'Below 1.6C',
{'Temperature|Global Mean|MAGICC6|MED': {'up': 1.6, 'year': 2100}},
color='cornflowerblue',
display='list'
)
Out[23]:
In [24]:
df.category(
'Below 2.0C',
{'Temperature|Global Mean|MAGICC6|MED': {'up': 2.0, 'year': 2100}},
filters={'category': 'uncategorized'},
color='forestgreen'
)
In [25]:
df.category(
'Below 2.5C',
{'Temperature|Global Mean|MAGICC6|MED': {'up': 2.5, 'year': 2100}},
filters={'category': 'uncategorized'},
color='gold'
)
In [26]:
df.category(
'Below 3.5C',
{'Temperature|Global Mean|MAGICC6|MED': {'up': 3.5, 'year': 2100}},
filters={'category': 'uncategorized'},
color='firebrick'
)
In [27]:
df.category(
'Above 3.5C',
{'Temperature|Global Mean|MAGICC6|MED': {}},
filters={'category': 'uncategorized'},
color='magenta'
)
Two models included in the snapshot have not been assessed by MAGICC6 regarding their long-term climate and warming impact. Therefore, the timeseries 'Temperature|Global Mean|MAGICC6|MED'
does not exist, and they have not been categorized.
Below, we display all scenarios that are uncategorized at this point.
In [28]:
df.category('uncategorized', display='list')
Out[28]:
Now, we again display the median global temperature increase for all scenarios, but we use the colouring by category to illustrate the common charateristics across scenarios.
In [29]:
df.plot_lines({'variable': 'Temperature*'}, color_by_cat=True)
As a last step, we display the aggregate CO2 emissions by category. This allows to highlight alternative pathways within the same category.
In [30]:
df.plot_lines(
{'variable': 'Emissions|CO2', 'region': 'World'},
color_by_cat=True
)