This notebook examplifies how to directly apply Pandas core functions (in this case groupby and aggregation) to the pymrio system.
Here we use the WIOD MRIO system (see the notebook "Automatic downloading of MRIO databases" for how to automatically retrieve this database) and will aggregate the WIOD material stressor for used and unused materials. We assume, that the WIOD system is available at
In [1]:
wiod_folder = '/tmp/mrios/WIOD2013'
To get started we import pymrio
In [2]:
import pymrio
For the example here, we use the data from 2009:
In [3]:
wiod09 = pymrio.parse_wiod(path=wiod_folder, year=2009)
WIOD includes multiple material accounts, specified for the "Used" and "Unused" category, as well as information on the total. We will use the latter to confirm our calculations:
In [4]:
wiod09.mat.F
Out[4]:
To aggregate these with the Pandas groupby function, we need to specify the groups which should be grouped by Pandas. Pymrio contains a helper function which builds such a matching dictionary. The matching can also include regular expressions to simplify the build:
In [5]:
groups = wiod09.mat.get_index(as_dict=True, grouping_pattern = {'.*_Used': 'Material Used',
'.*_Unused': 'Material Unused'})
groups
Out[5]:
Note, that the grouping contains the rows which do not match any of the specified groups. This allows to easily aggregates only parts of a specific stressor set. To actually omit these groups include them in the matching pattern and provide None as value.
To have the aggregated data alongside the original data, we first copy the detailed satellite account:
In [6]:
wiod09.mat_agg = wiod09.mat.copy(new_name='Aggregated matrial accounts')
Then, we use the pymrio get_DataFrame iterator together with the pandas groupby and sum functions to aggregate the stressors. For the dataframe containing the unit information, we pass a custom function which concatenate non-unique unit strings.
In [7]:
for df_name, df in zip(wiod09.mat_agg.get_DataFrame(data=False, with_unit=True, with_population=False),
wiod09.mat_agg.get_DataFrame(data=True, with_unit=True, with_population=False)):
if df_name == 'unit':
wiod09.mat_agg.__dict__[df_name] = df.groupby(groups).apply(lambda x: ' & '.join(x.unit.unique()))
else:
wiod09.mat_agg.__dict__[df_name] = df.groupby(groups).sum()
In [8]:
wiod09.mat_agg.F
Out[8]:
In [9]:
wiod09.mat_agg.unit
Out[9]:
The same regular expression grouping can be used to aggregate stressor data which is given per compartment. To do so, the matching dict needs to consist of tuples corresponding to a valid index value in the DataFrames. Each position in the tuple is interprested as a regular expression. Using the get_index method gives a good indication how a valid grouping dict should look like:
In [10]:
tt = pymrio.load_test()
tt.emissions.get_index(as_dict=True)
Out[10]:
With that information, we can now build our own grouping dict, e.g.:
In [11]:
agg_groups = {('emis.*', '.*'): 'all emissions'}
In [12]:
group_dict = tt.emissions.get_index(as_dict=True,
grouping_pattern=agg_groups)
group_dict
Out[12]:
Which can then be used to aggregate the satellite account:
In [13]:
for df_name, df in zip(tt.emissions.get_DataFrame(data=False, with_unit=True, with_population=False),
tt.emissions.get_DataFrame(data=True, with_unit=True, with_population=False)):
if df_name == 'unit':
tt.emissions.__dict__[df_name] = df.groupby(group_dict).apply(lambda x: ' & '.join(x.unit.unique()))
else:
tt.emissions.__dict__[df_name] = df.groupby(group_dict).sum()
In this case we loose the information on the compartment. To reset the index do:
In [14]:
import pandas as pd
tt.emissions.set_index(pd.Index(tt.emissions.get_index(), name='stressor'))
In [15]:
tt.emissions.F
Out[15]: