Advanced functionality - pandas groupby with pymrio satellite accounts

This notebook examplifies how to directly apply Pandas core functions (in this case groupby and aggregation) to the pymrio system.

WIOD material extension aggregation - stressor w/o compartment info

Here we use the WIOD MRIO system (see the notebook "Automatic downloading of MRIO databases" for how to automatically retrieve this database) and will aggregate the WIOD material stressor for used and unused materials. We assume, that the WIOD system is available at



In [1]:

    
wiod_folder = '/tmp/mrios/WIOD2013'

To get started we import pymrio



In [2]:

    
import pymrio

For the example here, we use the data from 2009:



In [3]:

    
wiod09 = pymrio.parse_wiod(path=wiod_folder, year=2009)

WIOD includes multiple material accounts, specified for the "Used" and "Unused" category, as well as information on the total. We will use the latter to confirm our calculations:



In [4]:

    
wiod09.mat.F









    Out[4]:







  
    
      region
      AUS
      ...
      RoW
    
    
      sector
      AtB
      C
      15t16
      17t18
      19
      20
      21t22
      23
      24
      25
      ...
      63
      64
      J
      70
      71t74
      L
      M
      N
      O
      P
    
    
      stressor
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Biomass_animals_Used
      238.487190
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_feed_Used
      314501.775775
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_food_Used
      78736.348430
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_forestry_Used
      21443.712952
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_other_Used
      647.038563
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_coal_Used
      0.000000
      4.084490e+05
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_gas_Used
      0.000000
      3.671908e+04
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_oil_Used
      0.000000
      2.191849e+04
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_other_Used
      0.000000
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_construction_Used
      0.000000
      1.098489e+05
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_industrial_Used
      0.000000
      2.444270e+04
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_metals_Used
      0.000000
      7.019911e+05
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_animals_Unused
      38.094064
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_feed_Unused
      194.597667
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_food_Unused
      17925.841358
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_forestry_Unused
      3216.556943
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Biomass_other_Unused
      128.610253
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_coal_Unused
      0.000000
      6.430405e+06
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_gas_Unused
      0.000000
      4.759046e+03
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_oil_Unused
      0.000000
      4.822068e+03
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Fossil_other_Unused
      0.000000
      0.000000e+00
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_construction_Unused
      0.000000
      3.015773e+03
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_industrial_Unused
      0.000000
      3.389710e+04
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Minerals_metals_Unused
      0.000000
      6.919846e+05
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Total
      437071.063196
      8.472253e+06
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
  

25 rows × 1435 columns

To aggregate these with the Pandas groupby function, we need to specify the groups which should be grouped by Pandas. Pymrio contains a helper function which builds such a matching dictionary. The matching can also include regular expressions to simplify the build:



In [5]:

    
groups = wiod09.mat.get_index(as_dict=True, grouping_pattern = {'.*_Used': 'Material Used', 
                                                                '.*_Unused': 'Material Unused'})
groups









    Out[5]:





{'Biomass_animals_Used': 'Material Used',
 'Biomass_feed_Used': 'Material Used',
 'Biomass_food_Used': 'Material Used',
 'Biomass_forestry_Used': 'Material Used',
 'Biomass_other_Used': 'Material Used',
 'Fossil_coal_Used': 'Material Used',
 'Fossil_gas_Used': 'Material Used',
 'Fossil_oil_Used': 'Material Used',
 'Fossil_other_Used': 'Material Used',
 'Minerals_construction_Used': 'Material Used',
 'Minerals_industrial_Used': 'Material Used',
 'Minerals_metals_Used': 'Material Used',
 'Biomass_animals_Unused': 'Material Unused',
 'Biomass_feed_Unused': 'Material Unused',
 'Biomass_food_Unused': 'Material Unused',
 'Biomass_forestry_Unused': 'Material Unused',
 'Biomass_other_Unused': 'Material Unused',
 'Fossil_coal_Unused': 'Material Unused',
 'Fossil_gas_Unused': 'Material Unused',
 'Fossil_oil_Unused': 'Material Unused',
 'Fossil_other_Unused': 'Material Unused',
 'Minerals_construction_Unused': 'Material Unused',
 'Minerals_industrial_Unused': 'Material Unused',
 'Minerals_metals_Unused': 'Material Unused',
 'Total': 'Total'}

Note, that the grouping contains the rows which do not match any of the specified groups. This allows to easily aggregates only parts of a specific stressor set. To actually omit these groups include them in the matching pattern and provide None as value.

To have the aggregated data alongside the original data, we first copy the detailed satellite account:



In [6]:

    
wiod09.mat_agg = wiod09.mat.copy(new_name='Aggregated matrial accounts')

Then, we use the pymrio get_DataFrame iterator together with the pandas groupby and sum functions to aggregate the stressors. For the dataframe containing the unit information, we pass a custom function which concatenate non-unique unit strings.



In [7]:

    
for df_name, df in zip(wiod09.mat_agg.get_DataFrame(data=False, with_unit=True, with_population=False),
                       wiod09.mat_agg.get_DataFrame(data=True, with_unit=True, with_population=False)):
    if df_name == 'unit':
        wiod09.mat_agg.__dict__[df_name] = df.groupby(groups).apply(lambda x: ' & '.join(x.unit.unique()))
    else:
        wiod09.mat_agg.__dict__[df_name] = df.groupby(groups).sum()



In [8]:

    
wiod09.mat_agg.F









    Out[8]:







  
    
      region
      AUS
      ...
      RoW
    
    
      sector
      AtB
      C
      15t16
      17t18
      19
      20
      21t22
      23
      24
      25
      ...
      63
      64
      J
      70
      71t74
      L
      M
      N
      O
      P
    
  
  
    
      Material Unused
      21503.700285
      7.168884e+06
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Material Used
      415567.362910
      1.303369e+06
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
    
      Total
      437071.063196
      8.472253e+06
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      ...
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
      0.0
    
  

3 rows × 1435 columns



In [9]:

    
wiod09.mat_agg.unit









    Out[9]:





Material Unused    1000 tonnes
Material Used      1000 tonnes
Total              1000 tonnes
dtype: object

Use with stressors including compartment information:

The same regular expression grouping can be used to aggregate stressor data which is given per compartment. To do so, the matching dict needs to consist of tuples corresponding to a valid index value in the DataFrames. Each position in the tuple is interprested as a regular expression. Using the get_index method gives a good indication how a valid grouping dict should look like:



In [10]:

    
tt = pymrio.load_test()
tt.emissions.get_index(as_dict=True)









    Out[10]:





{('emission_type1', 'air'): ('emission_type1', 'air'),
 ('emission_type2', 'water'): ('emission_type2', 'water')}

With that information, we can now build our own grouping dict, e.g.:



In [11]:

    
agg_groups = {('emis.*', '.*'): 'all emissions'}



In [12]:

    
group_dict = tt.emissions.get_index(as_dict=True,
                                    grouping_pattern=agg_groups)
group_dict









    Out[12]:





{('emission_type1', 'air'): 'all emissions',
 ('emission_type2', 'water'): 'all emissions'}

Which can then be used to aggregate the satellite account:



In [13]:

    
for df_name, df in zip(tt.emissions.get_DataFrame(data=False, with_unit=True, with_population=False),
                       tt.emissions.get_DataFrame(data=True, with_unit=True, with_population=False)):
    if df_name == 'unit':
        tt.emissions.__dict__[df_name] = df.groupby(group_dict).apply(lambda x: ' & '.join(x.unit.unique()))
    else:
        tt.emissions.__dict__[df_name] = df.groupby(group_dict).sum()

In this case we loose the information on the compartment. To reset the index do:



In [14]:

    
import pandas as pd
tt.emissions.set_index(pd.Index(tt.emissions.get_index(), name='stressor'))



In [15]:

    
tt.emissions.F









    Out[15]:







  
    
      region
      reg1
      reg2
      ...
      reg5
      reg6
    
    
      sector
      food
      mining
      manufactoring
      electricity
      construction
      trade
      transport
      other
      food
      mining
      ...
      transport
      other
      food
      mining
      manufactoring
      electricity
      construction
      trade
      transport
      other
    
    
      stressor
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      all emissions
      1987315.27
      1008791.385
      24377356.18
      28413081.55
      2901538.31
      5387134.1
      22779986.1
      10291268.6
      1902772.74
      376842.094
      ...
      46499160
      17964832.3
      20604104.1
      8286580.6
      125872643.0
      56775747.7
      7561126.8
      32087934
      55812326.3
      38415421
    
  

1 rows × 48 columns

region	AUS										...	RoW
sector	AtB	C	15t16	17t18	19	20	21t22	23	24	25	...	63	64	J	70	71t74	L	M	N	O	P
stressor
Biomass_animals_Used	238.487190	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_feed_Used	314501.775775	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_food_Used	78736.348430	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_forestry_Used	21443.712952	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_other_Used	647.038563	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_coal_Used	0.000000	4.084490e+05	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_gas_Used	0.000000	3.671908e+04	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_oil_Used	0.000000	2.191849e+04	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_other_Used	0.000000	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_construction_Used	0.000000	1.098489e+05	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_industrial_Used	0.000000	2.444270e+04	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_metals_Used	0.000000	7.019911e+05	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_animals_Unused	38.094064	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_feed_Unused	194.597667	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_food_Unused	17925.841358	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_forestry_Unused	3216.556943	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Biomass_other_Unused	128.610253	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_coal_Unused	0.000000	6.430405e+06	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_gas_Unused	0.000000	4.759046e+03	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_oil_Unused	0.000000	4.822068e+03	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Fossil_other_Unused	0.000000	0.000000e+00	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_construction_Unused	0.000000	3.015773e+03	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_industrial_Unused	0.000000	3.389710e+04	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Minerals_metals_Unused	0.000000	6.919846e+05	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
Total	437071.063196	8.472253e+06	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

region	reg1								reg2		...	reg5		reg6
sector	food	mining	manufactoring	electricity	construction	trade	transport	other	food	mining	...	transport	other	food	mining	manufactoring	electricity	construction	trade	transport	other
stressor
all emissions	1987315.27	1008791.385	24377356.18	28413081.55	2901538.31	5387134.1	22779986.1	10291268.6	1902772.74	376842.094	...	46499160	17964832.3	20604104.1	8286580.6	125872643.0	56775747.7	7561126.8	32087934	55812326.3	38415421