Ecospold2Matrix Demo

A typical use:

We want to recast Ecoinvent-cutoff 3.1 in a matrix representation. We specify the location of the ecospold unit-process ecospold files, give our extraction project a name, and point to a directory to save the logs and results.

Throughout the data extraction, we want to change the sign conventions for waste flows, representing the supply of waste treatment as a positive output. Also, as we have need to distinguish between situations where a value is of magnitude 0 and situations where we simply have no data, so we replace all Not-a-Number entries with 0.0. We therefore pass this choice as a defining parameters of our parser when we initialize it.

When we initialize the parser, it records in a log file all project-specific and default options (see in log in pink below).



In [4]:

    
import ecospold2matrix as e2m

ecospold_dir = '/home/bill/Version_3.1_cutoff_ecoSpold02/'
project_name = 'demo'
out_dir = '/home/bill/data/eco_matrices'

parser = e2m.Ecospold2Matrix(ecospold_dir, project_name, out_dir, positive_waste=True, nan2null=True)









    



INFO:demo:Ecospold2Matrix Processing
INFO:demo:Current git commit: 5146d4dacfbefba2123ba27ed8e4dd9c009b90a3
INFO:demo:Project name: demo
INFO:demo:Unit process and Master data directory: /home/bill/Version_3.1_cutoff_ecoSpold02/
INFO:demo:Data saved in: /home/bill/data/eco_matrices
INFO:demo:Sign conventions changed to make waste flows positive
INFO:demo:Replace Not-a-Number instances with 0.0 in all matrices
INFO:demo:Pickle intermediate results to files
INFO:demo:Order processes based on: ISIC, activityName
INFO:demo:Order elementary exchanges based on: compartment, subcompartment, name

We want to recast the Ecoinvent dataset as a Leontief technology coefficient matrix with environmental extensions. We therefore call parser.ecospold_to_Leontief()

In addition to the normalized coefficient tables, we also want a scaled-up flow tables, with absolute intermediate and elementary flows that match the production volumes recorded as meta-data in unit processes.

As no specific file format is specified, the parser will save the results in all known formats: pandas dataframes, pandas sparse dataframes, sparse MATLAB and scipy matrices, and CSV files. Again, the parser logs all relevant operations.



In [5]:

    
parser.ecospold_to_Leontief(with_absolute_flows=True)









    



INFO:demo:Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
WARNING:demo:Removed 176 duplicate rows from activity_list, see duplicate_activity_list.csv.
INFO:demo:Activities extracted from ActivityIndex.xml with SHA-1 of c579d38fb6fa4a52ec4e09e5b04b873df77ce4c9
INFO:demo:Processing 11301 files in /home/bill/Version_3.1_cutoff_ecoSpold02/datasets
INFO:demo:Flows saved in /home/bill/Version_3.1_cutoff_ecoSpold02/flows.pickle with SHA-1 of d9c64122d1e866354bd5b5d6b410deb5fa915116
INFO:demo:Processing 11301 files - this may take a while ...
INFO:demo:Elementary flows extracted from ElementaryExchanges.xml with SHA-1 of 8a3a0a95e8a023950f42704eebc248014164166c
INFO:demo:Labels saved in /home/bill/Version_3.1_cutoff_ecoSpold02/rawlabels.pickle with SHA-1 of aac4e897f807a0b5740b83a02b2df18239560301
INFO:demo:OK.   No untraceable flows.
INFO:demo:OK. Source activities seem in order. Each product traceable to an activity that actually does produce or distribute this product.
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoPandas_symmNorm.pickle with SHA-1 of f88cd7625c179a9b5b9fd723a9ea10270eb01661
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoPandas_symmScale.pickle with SHA-1 of 59c10896a3f767486a502caee3bccbed6e1939a1
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparsePandas_symmNorm.pickle with SHA-1 of 0b28d60b10f741a5003a502bc907a04b86b12cc8
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparsePandas_symmScale.pickle with SHA-1 of cd5ac48bf293a5473d11a9aff29b5080170776ae
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmNorm.pickle with SHA-1 of 3bb02d845e47f4df774defed750d81b42d616116
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmNorm.mat with SHA-1 of 129f156563d3d33e764589e6a9833e38b5ae3e9b
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmScale.pickle with SHA-1 of 4b8269d07aab24720790feb3fcfac5be9cb351b6
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmScale.mat with SHA-1 of 329ac81318b6cc57c6d4df7b76dccbc482386074
INFO:demo:Final matrices saved as CSV files
INFO:demo:Done running ecospold2matrix.ecospold_to_Leontief

The recasting of intermediate flows as symmetric matrices was successful. The parser did not encounter any inconsistencies in the data that would have required "patching up". The log file records the hash of key files, which allows future users of the data to check that files have not been modified or corrupted.

If we look in the output directory, we see that all python formats are saved as "pickle files", whereas sparse matrices are also recorded as .mat files.



In [7]:

    
ls '/home/bill/data/eco_matrices'









    



csv/                           demoSparseMatrix_symmNorm.pickle
demo_log/                      demoSparseMatrix_symmScale.mat
demoPandas_symmNorm.pickle     demoSparseMatrix_symmScale.pickle
demoPandas_symmScale.pickle    demoSparsePandas_symmNorm.pickle
demoSparseMatrix_symmNorm.mat  demoSparsePandas_symmScale.pickle

CSV files are in their own directory, with one file for each variable: A.csv and Z.csv hold normalized and scaled-up intermediate exchanges, and PRO.csv holds the process descriptions, which serve as row/column labels for these matrices. Similarly, F.csv and G_pro.csv record normalized and scaled-up elementary flows by the different processes, with stressor descriptions (STR) serving as row labels.



In [8]:

    
ls '/home/bill/data/eco_matrices/csv'









    



A.csv  F.csv  G_pro.csv  PRO.csv  STR.csv  Z.csv

We can also access the matrices straight from the parser. The A-matrix has dimensions of 11301-by-11301 processes.



In [9]:

    
parser.A.shape









    Out[9]:





(11301, 11301)

Similarly, the F-matrix records normalized emission of 3955 elementary flow types, emitted by 11301 processes.



In [10]:

    
parser.F.shape









    Out[10]:





(3955, 11301)

The process labels contain the official Id's, classifications (ISIC, ecospoldCategory), names, geography, units etc.



In [11]:

    
parser.PRO.columns









    Out[11]:





Index(['activityId', 'productId', 'activityName', 'ISIC', 'EcoSpoldCategory', 'geography', 'technologyLevel', 'macroEconomicScenario', 'productName', 'unitName', 'activityType', 'startDate', 'endDate'], dtype='object')

We can have a quick look at rows 50 to 59...



In [12]:

    
parser.PRO.ix[50:59, ['activityName', 'productName', 'geography', 'unitName']]









    Out[12]:






  
    
      
      activityName
      productName
      geography
      unitName
    
    
      index
      
      
      
      
    
  
  
    
      017a00eb-e89a-4453-90f8-249d0d98f28f_c538baa8-11c9-4064-b3e2-9faba21c6a9b
                 market for maize seed, organic, at farm
                         maize seed, organic, at farm
       GLO
       kg
    
    
      f3b7e0a5-2cdf-4224-a29f-67e132c8e5d1_0dab73c6-b214-4e9c-8c38-ab49d608637b
                                  market for protein pea
                                          protein pea
       GLO
       kg
    
    
      fa7c1736-5313-4e39-8698-c3ec5d55abbb_510a8fef-7075-4da2-9984-8936ba08c89f
       market for protein pea, Swiss integrated produ...
             protein pea, Swiss integrated production
       GLO
       kg
    
    
      ea6ea016-5982-4e64-b3b5-4b57fd3360ef_06affe58-e750-4345-8725-8218d54352f7
       market for protein pea, feed, Swiss integrated...
       protein pea, feed, Swiss integrated production
       GLO
       kg
    
    
      3f13d0c2-10d7-400a-89d1-62bdc2a4e748_fa8fdaec-627a-4055-b2e9-49b238cf166f
                         market for protein pea, organic
                                 protein pea, organic
       GLO
       kg
    
    
      f1cdc1be-d757-42d6-ad82-1e28e7e74aa3_cb09bcae-b469-4f41-84a6-cdd1e958e027
                                    market for rape seed
                                            rape seed
       GLO
       kg
    
    
      013d2289-d655-430a-9fa2-9230297efae0_44519c79-bf77-4775-a69e-182d26b1f7d5
       market for rape seed, Swiss integrated production
               rape seed, Swiss integrated production
       GLO
       kg
    
    
      98f22fe6-1a57-4eaa-a4e0-d5f4dc3fe4a8_80df7587-0686-45b1-af7c-38ee267c2525
                           market for rape seed, organic
                                   rape seed, organic
       GLO
       kg
    
    
      6b20f9ee-95c9-424e-aeba-d1e1b15b7739_edb81938-8dd6-48fc-9f24-b567992f3ecb
                                    market for rye grain
                                            rye grain
       GLO
       kg

Working with unallocated data

Instead of pre-allocated data, we want to organized unallocated data as supply and use tables (SUT) (see pySUT), in line with typical IO methodology (see pyMRIO).

We create a new project, with a dedicated parser to hold and record its history and our methodological choices.

As this dataset had already been parsed in a previous project, we avoid re-reading ecospold files for no reason and choose to read pickled intermediate results if available, which greatly speeds up the process.



In [13]:

    
dataset_dir = '/home/bill/Version_3.0_unallocated_restricted'

sutparser = e2m.Ecospold2Matrix(dataset_dir, 'sutdemo', out_dir, prefer_pickles=True)









    



INFO:sutdemo:Ecospold2Matrix Processing
INFO:sutdemo:Current git commit: 5146d4dacfbefba2123ba27ed8e4dd9c009b90a3
INFO:sutdemo:Project name: sutdemo
INFO:sutdemo:Unit process and Master data directory: /home/bill/Version_3.0_unallocated_restricted
INFO:sutdemo:Data saved in: /home/bill/data/eco_matrices
INFO:sutdemo:When possible, loads pickled data instead of parsing ecospold files
INFO:sutdemo:Pickle intermediate results to files
INFO:sutdemo:Order processes based on: ISIC, activityName
INFO:sutdemo:Order elementary exchanges based on: compartment, subcompartment, name

We generate the SUT and chose to save it in a pandas dataframe format.



In [14]:

    
sutparser.ecospold_to_sut(fileformats=['Pandas'])









    



INFO:sutdemo:Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
WARNING:sutdemo:Removed 175 duplicate rows from activity_list, see duplicate_activity_list.csv.
INFO:sutdemo:Activities extracted from ActivityIndex.xml with SHA-1 of 829a2696e66cc57a1f2a636d43e7e3264c6ee2b8
INFO:sutdemo:Flows loaded from /home/bill/Version_3.0_unallocated_restricted/flows.pickle with SHA-1 of da39a3ee5e6b4b0d3255bfef95601890afd80709
INFO:sutdemo:Labels loaded from /home/bill/Version_3.0_unallocated_restricted/rawlabels.pickle with SHA-1 of da39a3ee5e6b4b0d3255bfef95601890afd80709
INFO:sutdemo:Final SUT matrices saved in /home/bill/data/eco_matrices/sutdemoPandas_SUT.pickle with SHA-1 of 918f3d92a287e92f243e3084b6c1d669f7912f28
INFO:sutdemo:Done running ecospold2matrix.ecospold_to_sut

This generates a python pickle file named "sutdemoPandas_SUT.pickle".

We can also access the supply and use tables straight from the parser. Let's say we are interested in organic barley production in Switzerland...



In [15]:

    
sutparser.PRO.query("geography == 'CH' and activityName == 'barley production, organic'")[['activityId', 'activityName', 'ISIC', 'geography']]









    Out[15]:






  
    
      
      activityId
      activityName
      ISIC
      geography
    
    
      index
      
      
      
      
    
  
  
    
      0b639971-3ed2-469e-b33e-a152fe63f488_f467c4d0-ea1c-4ae3-8d69-712598a0478a
       0b639971-3ed2-469e-b33e-a152fe63f488
       barley production, organic
       0111:Growing of cereals (except rice), legumin...
       CH

We can check its column in the supply table (V), where we see its coproduction of two products (barley and straw)



In [16]:

    
sutparser.V.ix[:,'0b639971-3ed2-469e-b33e-a152fe63f488'].dropna()









    Out[16]:





productId
f467c4d0-ea1c-4ae3-8d69-712598a0478a    4152.7
692b4f7e-9e79-4f69-b22f-b66f68f2f9cc    2924.2
Name: 0b639971-3ed2-469e-b33e-a152fe63f488, dtype: float64

Stay tuned

This is a newborn project. More demos and features yet to come. Please download, play with it, and join!

	activityName	productName	geography	unitName
index
017a00eb-e89a-4453-90f8-249d0d98f28f_c538baa8-11c9-4064-b3e2-9faba21c6a9b	market for maize seed, organic, at farm	maize seed, organic, at farm	GLO	kg
f3b7e0a5-2cdf-4224-a29f-67e132c8e5d1_0dab73c6-b214-4e9c-8c38-ab49d608637b	market for protein pea	protein pea	GLO	kg
fa7c1736-5313-4e39-8698-c3ec5d55abbb_510a8fef-7075-4da2-9984-8936ba08c89f	market for protein pea, Swiss integrated produ...	protein pea, Swiss integrated production	GLO	kg
ea6ea016-5982-4e64-b3b5-4b57fd3360ef_06affe58-e750-4345-8725-8218d54352f7	market for protein pea, feed, Swiss integrated...	protein pea, feed, Swiss integrated production	GLO	kg
3f13d0c2-10d7-400a-89d1-62bdc2a4e748_fa8fdaec-627a-4055-b2e9-49b238cf166f	market for protein pea, organic	protein pea, organic	GLO	kg
f1cdc1be-d757-42d6-ad82-1e28e7e74aa3_cb09bcae-b469-4f41-84a6-cdd1e958e027	market for rape seed	rape seed	GLO	kg
013d2289-d655-430a-9fa2-9230297efae0_44519c79-bf77-4775-a69e-182d26b1f7d5	market for rape seed, Swiss integrated production	rape seed, Swiss integrated production	GLO	kg
98f22fe6-1a57-4eaa-a4e0-d5f4dc3fe4a8_80df7587-0686-45b1-af7c-38ee267c2525	market for rape seed, organic	rape seed, organic	GLO	kg
6b20f9ee-95c9-424e-aeba-d1e1b15b7739_edb81938-8dd6-48fc-9f24-b567992f3ecb	market for rye grain	rye grain	GLO	kg

	activityId	activityName	ISIC	geography
index
0b639971-3ed2-469e-b33e-a152fe63f488_f467c4d0-ea1c-4ae3-8d69-712598a0478a	0b639971-3ed2-469e-b33e-a152fe63f488	barley production, organic	0111:Growing of cereals (except rice), legumin...	CH