Ecospold2Matrix Demo

A typical use:

We want to recast Ecoinvent-cutoff 3.1 in a matrix representation. We specify the location of the ecospold unit-process ecospold files, give our extraction project a name, and point to a directory to save the logs and results.

Throughout the data extraction, we want to change the sign conventions for waste flows, representing the supply of waste treatment as a positive output. Also, as we have need to distinguish between situations where a value is of magnitude 0 and situations where we simply have no data, so we replace all Not-a-Number entries with 0.0. We therefore pass this choice as a defining parameters of our parser when we initialize it.

When we initialize the parser, it records in a log file all project-specific and default options (see in log in pink below).


In [4]:
import ecospold2matrix as e2m

ecospold_dir = '/home/bill/Version_3.1_cutoff_ecoSpold02/'
project_name = 'demo'
out_dir = '/home/bill/data/eco_matrices'

parser = e2m.Ecospold2Matrix(ecospold_dir, project_name, out_dir, positive_waste=True, nan2null=True)


INFO:demo:Ecospold2Matrix Processing
INFO:demo:Current git commit: 5146d4dacfbefba2123ba27ed8e4dd9c009b90a3
INFO:demo:Project name: demo
INFO:demo:Unit process and Master data directory: /home/bill/Version_3.1_cutoff_ecoSpold02/
INFO:demo:Data saved in: /home/bill/data/eco_matrices
INFO:demo:Sign conventions changed to make waste flows positive
INFO:demo:Replace Not-a-Number instances with 0.0 in all matrices
INFO:demo:Pickle intermediate results to files
INFO:demo:Order processes based on: ISIC, activityName
INFO:demo:Order elementary exchanges based on: compartment, subcompartment, name

We want to recast the Ecoinvent dataset as a Leontief technology coefficient matrix with environmental extensions. We therefore call parser.ecospold_to_Leontief()

In addition to the normalized coefficient tables, we also want a scaled-up flow tables, with absolute intermediate and elementary flows that match the production volumes recorded as meta-data in unit processes.

As no specific file format is specified, the parser will save the results in all known formats: pandas dataframes, pandas sparse dataframes, sparse MATLAB and scipy matrices, and CSV files. Again, the parser logs all relevant operations.


In [5]:
parser.ecospold_to_Leontief(with_absolute_flows=True)


INFO:demo:Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
WARNING:demo:Removed 176 duplicate rows from activity_list, see duplicate_activity_list.csv.
INFO:demo:Activities extracted from ActivityIndex.xml with SHA-1 of c579d38fb6fa4a52ec4e09e5b04b873df77ce4c9
INFO:demo:Processing 11301 files in /home/bill/Version_3.1_cutoff_ecoSpold02/datasets
INFO:demo:Flows saved in /home/bill/Version_3.1_cutoff_ecoSpold02/flows.pickle with SHA-1 of d9c64122d1e866354bd5b5d6b410deb5fa915116
INFO:demo:Processing 11301 files - this may take a while ...
INFO:demo:Elementary flows extracted from ElementaryExchanges.xml with SHA-1 of 8a3a0a95e8a023950f42704eebc248014164166c
INFO:demo:Labels saved in /home/bill/Version_3.1_cutoff_ecoSpold02/rawlabels.pickle with SHA-1 of aac4e897f807a0b5740b83a02b2df18239560301
INFO:demo:OK.   No untraceable flows.
INFO:demo:OK. Source activities seem in order. Each product traceable to an activity that actually does produce or distribute this product.
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoPandas_symmNorm.pickle with SHA-1 of f88cd7625c179a9b5b9fd723a9ea10270eb01661
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoPandas_symmScale.pickle with SHA-1 of 59c10896a3f767486a502caee3bccbed6e1939a1
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparsePandas_symmNorm.pickle with SHA-1 of 0b28d60b10f741a5003a502bc907a04b86b12cc8
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparsePandas_symmScale.pickle with SHA-1 of cd5ac48bf293a5473d11a9aff29b5080170776ae
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmNorm.pickle with SHA-1 of 3bb02d845e47f4df774defed750d81b42d616116
INFO:demo:Final, symmetric, normalized matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmNorm.mat with SHA-1 of 129f156563d3d33e764589e6a9833e38b5ae3e9b
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmScale.pickle with SHA-1 of 4b8269d07aab24720790feb3fcfac5be9cb351b6
INFO:demo:Final, symmetric, scaled-up flow matrices saved in /home/bill/data/eco_matrices/demoSparseMatrix_symmScale.mat with SHA-1 of 329ac81318b6cc57c6d4df7b76dccbc482386074
INFO:demo:Final matrices saved as CSV files
INFO:demo:Done running ecospold2matrix.ecospold_to_Leontief

The recasting of intermediate flows as symmetric matrices was successful. The parser did not encounter any inconsistencies in the data that would have required "patching up". The log file records the hash of key files, which allows future users of the data to check that files have not been modified or corrupted.

If we look in the output directory, we see that all python formats are saved as "pickle files", whereas sparse matrices are also recorded as .mat files.


In [7]:
ls '/home/bill/data/eco_matrices'


csv/                           demoSparseMatrix_symmNorm.pickle
demo_log/                      demoSparseMatrix_symmScale.mat
demoPandas_symmNorm.pickle     demoSparseMatrix_symmScale.pickle
demoPandas_symmScale.pickle    demoSparsePandas_symmNorm.pickle
demoSparseMatrix_symmNorm.mat  demoSparsePandas_symmScale.pickle

CSV files are in their own directory, with one file for each variable: A.csv and Z.csv hold normalized and scaled-up intermediate exchanges, and PRO.csv holds the process descriptions, which serve as row/column labels for these matrices. Similarly, F.csv and G_pro.csv record normalized and scaled-up elementary flows by the different processes, with stressor descriptions (STR) serving as row labels.


In [8]:
ls '/home/bill/data/eco_matrices/csv'


A.csv  F.csv  G_pro.csv  PRO.csv  STR.csv  Z.csv

We can also access the matrices straight from the parser. The A-matrix has dimensions of 11301-by-11301 processes.


In [9]:
parser.A.shape


Out[9]:
(11301, 11301)

Similarly, the F-matrix records normalized emission of 3955 elementary flow types, emitted by 11301 processes.


In [10]:
parser.F.shape


Out[10]:
(3955, 11301)

The process labels contain the official Id's, classifications (ISIC, ecospoldCategory), names, geography, units etc.


In [11]:
parser.PRO.columns


Out[11]:
Index(['activityId', 'productId', 'activityName', 'ISIC', 'EcoSpoldCategory', 'geography', 'technologyLevel', 'macroEconomicScenario', 'productName', 'unitName', 'activityType', 'startDate', 'endDate'], dtype='object')

We can have a quick look at rows 50 to 59...


In [12]:
parser.PRO.ix[50:59, ['activityName', 'productName', 'geography', 'unitName']]


Out[12]:
activityName productName geography unitName
index
017a00eb-e89a-4453-90f8-249d0d98f28f_c538baa8-11c9-4064-b3e2-9faba21c6a9b market for maize seed, organic, at farm maize seed, organic, at farm GLO kg
f3b7e0a5-2cdf-4224-a29f-67e132c8e5d1_0dab73c6-b214-4e9c-8c38-ab49d608637b market for protein pea protein pea GLO kg
fa7c1736-5313-4e39-8698-c3ec5d55abbb_510a8fef-7075-4da2-9984-8936ba08c89f market for protein pea, Swiss integrated produ... protein pea, Swiss integrated production GLO kg
ea6ea016-5982-4e64-b3b5-4b57fd3360ef_06affe58-e750-4345-8725-8218d54352f7 market for protein pea, feed, Swiss integrated... protein pea, feed, Swiss integrated production GLO kg
3f13d0c2-10d7-400a-89d1-62bdc2a4e748_fa8fdaec-627a-4055-b2e9-49b238cf166f market for protein pea, organic protein pea, organic GLO kg
f1cdc1be-d757-42d6-ad82-1e28e7e74aa3_cb09bcae-b469-4f41-84a6-cdd1e958e027 market for rape seed rape seed GLO kg
013d2289-d655-430a-9fa2-9230297efae0_44519c79-bf77-4775-a69e-182d26b1f7d5 market for rape seed, Swiss integrated production rape seed, Swiss integrated production GLO kg
98f22fe6-1a57-4eaa-a4e0-d5f4dc3fe4a8_80df7587-0686-45b1-af7c-38ee267c2525 market for rape seed, organic rape seed, organic GLO kg
6b20f9ee-95c9-424e-aeba-d1e1b15b7739_edb81938-8dd6-48fc-9f24-b567992f3ecb market for rye grain rye grain GLO kg

Working with unallocated data

Instead of pre-allocated data, we want to organized unallocated data as supply and use tables (SUT) (see pySUT), in line with typical IO methodology (see pyMRIO).

We create a new project, with a dedicated parser to hold and record its history and our methodological choices.

As this dataset had already been parsed in a previous project, we avoid re-reading ecospold files for no reason and choose to read pickled intermediate results if available, which greatly speeds up the process.


In [13]:
dataset_dir = '/home/bill/Version_3.0_unallocated_restricted'

sutparser = e2m.Ecospold2Matrix(dataset_dir, 'sutdemo', out_dir, prefer_pickles=True)


INFO:sutdemo:Ecospold2Matrix Processing
INFO:sutdemo:Current git commit: 5146d4dacfbefba2123ba27ed8e4dd9c009b90a3
INFO:sutdemo:Project name: sutdemo
INFO:sutdemo:Unit process and Master data directory: /home/bill/Version_3.0_unallocated_restricted
INFO:sutdemo:Data saved in: /home/bill/data/eco_matrices
INFO:sutdemo:When possible, loads pickled data instead of parsing ecospold files
INFO:sutdemo:Pickle intermediate results to files
INFO:sutdemo:Order processes based on: ISIC, activityName
INFO:sutdemo:Order elementary exchanges based on: compartment, subcompartment, name

We generate the SUT and chose to save it in a pandas dataframe format.


In [14]:
sutparser.ecospold_to_sut(fileformats=['Pandas'])


INFO:sutdemo:Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
WARNING:sutdemo:Removed 175 duplicate rows from activity_list, see duplicate_activity_list.csv.
INFO:sutdemo:Activities extracted from ActivityIndex.xml with SHA-1 of 829a2696e66cc57a1f2a636d43e7e3264c6ee2b8
INFO:sutdemo:Flows loaded from /home/bill/Version_3.0_unallocated_restricted/flows.pickle with SHA-1 of da39a3ee5e6b4b0d3255bfef95601890afd80709
INFO:sutdemo:Labels loaded from /home/bill/Version_3.0_unallocated_restricted/rawlabels.pickle with SHA-1 of da39a3ee5e6b4b0d3255bfef95601890afd80709
INFO:sutdemo:Final SUT matrices saved in /home/bill/data/eco_matrices/sutdemoPandas_SUT.pickle with SHA-1 of 918f3d92a287e92f243e3084b6c1d669f7912f28
INFO:sutdemo:Done running ecospold2matrix.ecospold_to_sut

This generates a python pickle file named "sutdemoPandas_SUT.pickle".

We can also access the supply and use tables straight from the parser. Let's say we are interested in organic barley production in Switzerland...


In [15]:
sutparser.PRO.query("geography == 'CH' and activityName == 'barley production, organic'")[['activityId', 'activityName', 'ISIC', 'geography']]


Out[15]:
activityId activityName ISIC geography
index
0b639971-3ed2-469e-b33e-a152fe63f488_f467c4d0-ea1c-4ae3-8d69-712598a0478a 0b639971-3ed2-469e-b33e-a152fe63f488 barley production, organic 0111:Growing of cereals (except rice), legumin... CH

We can check its column in the supply table (V), where we see its coproduction of two products (barley and straw)


In [16]:
sutparser.V.ix[:,'0b639971-3ed2-469e-b33e-a152fe63f488'].dropna()


Out[16]:
productId
f467c4d0-ea1c-4ae3-8d69-712598a0478a    4152.7
692b4f7e-9e79-4f69-b22f-b66f68f2f9cc    2924.2
Name: 0b639971-3ed2-469e-b33e-a152fe63f488, dtype: float64

Stay tuned

This is a newborn project. More demos and features yet to come. Please download, play with it, and join!