We want to recast Ecoinvent-cutoff 3.1 in a matrix representation. We specify the location of the ecospold unit-process ecospold files, give our extraction project a name, and point to a directory to save the logs and results.
Throughout the data extraction, we want to change the sign conventions for waste flows, representing the supply of waste treatment as a positive output. Also, as we have need to distinguish between situations where a value is of magnitude 0 and situations where we simply have no data, so we replace all Not-a-Number entries with 0.0. We therefore pass this choice as a defining parameters of our parser when we initialize it.
When we initialize the parser, it records in a log file all project-specific and default options (see in log in pink below).
In [4]:
import ecospold2matrix as e2m
ecospold_dir = '/home/bill/Version_3.1_cutoff_ecoSpold02/'
project_name = 'demo'
out_dir = '/home/bill/data/eco_matrices'
parser = e2m.Ecospold2Matrix(ecospold_dir, project_name, out_dir, positive_waste=True, nan2null=True)
We want to recast the Ecoinvent dataset as a Leontief technology coefficient matrix with environmental extensions. We therefore call parser.ecospold_to_Leontief()
In addition to the normalized coefficient tables, we also want a scaled-up flow tables, with absolute intermediate and elementary flows that match the production volumes recorded as meta-data in unit processes.
As no specific file format is specified, the parser will save the results in all known formats: pandas dataframes, pandas sparse dataframes, sparse MATLAB and scipy matrices, and CSV files. Again, the parser logs all relevant operations.
In [5]:
parser.ecospold_to_Leontief(with_absolute_flows=True)
The recasting of intermediate flows as symmetric matrices was successful. The parser did not encounter any inconsistencies in the data that would have required "patching up". The log file records the hash of key files, which allows future users of the data to check that files have not been modified or corrupted.
If we look in the output directory, we see that all python formats are saved as "pickle files", whereas sparse matrices are also recorded as .mat files.
In [7]:
ls '/home/bill/data/eco_matrices'
CSV files are in their own directory, with one file for each variable: A.csv and Z.csv hold normalized and scaled-up intermediate exchanges, and PRO.csv holds the process descriptions, which serve as row/column labels for these matrices. Similarly, F.csv and G_pro.csv record normalized and scaled-up elementary flows by the different processes, with stressor descriptions (STR) serving as row labels.
In [8]:
ls '/home/bill/data/eco_matrices/csv'
We can also access the matrices straight from the parser. The A-matrix has dimensions of 11301-by-11301 processes.
In [9]:
parser.A.shape
Out[9]:
Similarly, the F-matrix records normalized emission of 3955 elementary flow types, emitted by 11301 processes.
In [10]:
parser.F.shape
Out[10]:
The process labels contain the official Id's, classifications (ISIC, ecospoldCategory), names, geography, units etc.
In [11]:
parser.PRO.columns
Out[11]:
We can have a quick look at rows 50 to 59...
In [12]:
parser.PRO.ix[50:59, ['activityName', 'productName', 'geography', 'unitName']]
Out[12]:
Instead of pre-allocated data, we want to organized unallocated data as supply and use tables (SUT) (see pySUT), in line with typical IO methodology (see pyMRIO).
We create a new project, with a dedicated parser to hold and record its history and our methodological choices.
As this dataset had already been parsed in a previous project, we avoid re-reading ecospold files for no reason and choose to read pickled intermediate results if available, which greatly speeds up the process.
In [13]:
dataset_dir = '/home/bill/Version_3.0_unallocated_restricted'
sutparser = e2m.Ecospold2Matrix(dataset_dir, 'sutdemo', out_dir, prefer_pickles=True)
We generate the SUT and chose to save it in a pandas dataframe format.
In [14]:
sutparser.ecospold_to_sut(fileformats=['Pandas'])
This generates a python pickle file named "sutdemoPandas_SUT.pickle".
We can also access the supply and use tables straight from the parser. Let's say we are interested in organic barley production in Switzerland...
In [15]:
sutparser.PRO.query("geography == 'CH' and activityName == 'barley production, organic'")[['activityId', 'activityName', 'ISIC', 'geography']]
Out[15]:
We can check its column in the supply table (V), where we see its coproduction of two products (barley and straw)
In [16]:
sutparser.V.ix[:,'0b639971-3ed2-469e-b33e-a152fe63f488'].dropna()
Out[16]: