Import Agribalyse and link to ecoinvent 3.2 cutoff

Agribalyse is a French LCI database of agricultural products. It builds on top of ecoinvent 3.2. It was exported from SimaPro, so the names of ecoinvent processes are mangled, and need to be fixed back to standard ecoinvent.

This notebook uses Agribalyse 1.3.


In [1]:
from brightway2 import *

Create a new project for this notebook


In [2]:
projects.set_current("Agribalyse 1.3")

Add biosphere flows

Biosphere flow names follow the standard in ecoinvent 3.3. We will need to match these names to those in Agribalyse.


In [3]:
bw2setup()


Biosphere database already present!!! No setup is needed

Import ecoinvent 3.2 cutoff as background database


In [4]:
path = '/d/documents/eaternity/data/3.3/datasets'
importer = SingleOutputEcospold2Importer(path, "ecoinvent 3.2 cutoff")
importer.apply_strategies()
importer.statistics()


Out[4]:
(13831, 501148, 0)
13831 datasets
501148 exchanges
0 unlinked exchanges
  
Applied 13 strategies in 5.75 seconds
Applying strategy: nuncertainty
Applying strategy: delete_ghost_exchanges
Applying strategy: delete_exchanges_missing_activity
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: drop_unspecified_subcategories
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: remove_unnamed_parameters
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: remove_zero_amount_coproducts
Extracted 13831 datasets in 68.04 seconds
Applying strategy: normalize_units
Extracting XML data from 13831 datasets

In [6]:
importer.write_database()


Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:02:42
Title: Writing activities to SQLite3 database:
  Started: 03/15/2017 17:06:53
  Finished: 03/15/2017 17:09:35
  Total time elapsed: 00:02:42
  CPU %: 82.20
  Memory %: 14.52
Created database: ecoinvent 3.2 cutoff
Out[6]:
Brightway2 SQLiteBackend: ecoinvent 3.2 cutoff

Load Agribalyse data

This notebook uses the SimaPro CSV version of Agribalyse.


In [8]:
fpath = '/d/documents/eaternity/data/agribalyse-csv-final_no-links_nov2016v3.csv'
# path = "/Users/cmutel/Documents/LCA Documents/Agribalyse/1.3/Agribalyse CSV FINAL_no links_Nov2016v3.CSV"
ag = SimaProCSVImporter(fpath, "Agribalyse 1.3")


Extracted 1189 unallocated datasets in 12.54 seconds

In [0]:
[type(o) for o in ag]

In [11]:
ag.apply_strategies()
ag.statistics()


---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-11-57e3e692a3d4> in <module>()
      1 
----> 2 ag.apply_strategies()
      3 ag.statistics()

/d/documents/eaternity/bw2-py/envs/bw2/lib/python3.5/site-packages/bw2io/importers/base.py in apply_strategies(self, strategies, verbose)
     70         func_list = self.strategies if strategies is None else strategies
     71         for func in func_list:
---> 72             self.apply_strategy(func, verbose)
     73         if verbose:
     74             print("Applied {} strategies in {:.2f} seconds".format(

/d/documents/eaternity/bw2-py/envs/bw2/lib/python3.5/site-packages/bw2io/importers/base.py in apply_strategy(self, strategy, verbose)
     50             print("Applying strategy: {}".format(func_name))
     51         try:
---> 52             self.data = strategy(self.data)
     53             self.applied_strategies.append(func_name)
     54         except StrategyError as err:

/d/documents/eaternity/bw2-py/envs/bw2/lib/python3.5/site-packages/bw2io/strategies/migrations.py in migrate_datasets(db, migration)
      9 def migrate_datasets(db, migration):
     10     assert migration in migrations, u"Can't find migration {}".format(migration)
---> 11     migration_data = Migration(migration).load()
     12 
     13     to_dict = lambda x: dict(zip(migration_data['fields'], x))

/d/documents/eaternity/bw2-py/envs/bw2/lib/python3.5/site-packages/bw2io/migrations.py in load(self)
     61             self.filename + ".json"
     62         )
---> 63         return JsonWrapper.load(filepath)
     64 
     65 

/d/documents/eaternity/src/bw2data/bw2data/serialization.py in load(self, file)
     45             return anyjson.deserialize(open(file, encoding='utf-8').read())
     46         else:
---> 47             return json.load(open(file, encoding='utf-8'))
     48 
     49     @classmethod

FileNotFoundError: [Errno 2] No such file or directory: '/home/oney/.local/share/Brightway3/Agribalyse-13.a68c51d5c40982bcd21e1aa4ec805302/migrations/default-units.cb1707af55f1f27602990d653c9c3142.json'
Applying strategy: strip_biosphere_exc_locations
Applying strategy: migrate_datasets
Applying strategy: sp_allocate_products
Applying strategy: fix_zero_allocation_products
Applying strategy: split_simapro_name_geo
Applying strategy: normalize_units
Applying strategy: assign_only_product_as_production
Applying strategy: drop_unspecified_subcategories

We start by changing the SimaPro names to what ecoinvent provides, as we are using the ecoinvent-provided version of 3.2. We can then match the unlinked inputs in Agribalyse against the outputs provided by ecoinvent.


In [4]:
ag.migrate('simapro-ecoinvent-3.2')


Applying strategy: migrate_datasets
Applying strategy: migrate_exchanges

In [5]:
ag.match_database("ecoinvent 3.2 cutoff", fields=('reference product', 'location', 'unit', 'name'))


Applying strategy: link_iterable_by_fields

In [3]:
ag.statistics()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-2a79833a501a> in <module>()
----> 1 ag.statistics()

NameError: name 'ag' is not defined

Let's look at what we can't link. The easiest way is to export the unlinked exchanges to an Excel file.


In [2]:
ag.write_excel(TRUE)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-41d2eceaab8d> in <module>()
----> 1 ag.write_excel()

NameError: name 'ag' is not defined

You can also download this Excel file.

Add unlinked biosphere flows

Brightway2-IO 0.5.7 includes a lot of fixes to import Agribalyse more cleanly: 2a5406a, 0154841, b2d5545, 99b07b9, 2129804, 5d6b936, and 5d0542a.

However, there are still a large number of biosphere flows not used in ecoinvent that we don't have in our biosphere3 database (and, consequently, in our LCIA methods). The best we can do is to import these flows into another biosphere database. If these flows are important to you, you should do further matching or modify the relevant LCIA methods to include them.


In [8]:
Database("Agribalyse 1.3 new biosphere").register()
ag.add_unlinked_flows_to_biosphere_database("Agribalyse 1.3 new biosphere")


Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.751  | ETA[sec]: 0.411  | ETA[sec]: 0.288  | ETA[sec]: 0.224  | ETA[sec]: 0.188  | ETA[sec]: 0.166  | ETA[sec]: 0.145  | ETA[sec]: 0.130  | ETA[sec]: 0.176  | ETA[sec]: 0.155  | ETA[sec]: 0.140  | ETA[sec]: 0.126  | ETA[sec]: 0.114  | ETA[sec]: 0.103  | ETA[sec]: 0.093  | ETA[sec]: 0.083  | ETA[sec]: 0.086  | ETA[sec]: 0.078  | ETA[sec]: 0.069  | ETA[sec]: 0.061  | ETA[sec]: 0.054  | ETA[sec]: 0.047  | ETA[sec]: 0.040  | ETA[sec]: 0.033  | ETA[sec]: 0.029  | ETA[sec]: 0.023  | ETA[sec]: 0.017  | ETA[sec]: 0.011  | ETA[sec]: 0.005  | ETA[sec]: 0.000  | ETA[sec]: 0.000 
Total time elapsed: 0.165 sec
Title: Writing activities to SQLite3 database:
  Started: 01/12/2017 11:19:35
  Finished: 01/12/2017 11:19:36
  Total time elapsed: 0.165 sec
  CPU %: 98.900000
  Memory %: 1.443982
Applying strategy: link_iterable_by_fields

We should now have no unlinked biosphere flows:

Fixing units for passenger cars

For some reason there are some links to market for transport, passenger car, medium size, diesel, EURO 4 and market for transport, passenger car, medium size, petrol, EURO 4 with units of meters (instead of person kilometers, as in ecoinvent). We can create a custom migration to handle these cases:


In [9]:
agribalyse_transport_data = {
    'fields': ['name', 'unit'],
    'data': [
        (
            ('market for transport, passenger car, medium size, diesel, EURO 4', 'meter'), 
            {
                'unit': 'kilometer', 
                'multiplier': 1e-3
            }
        ),
        (
            ('market for transport, passenger car, medium size, petrol, EURO 4', 'meter'), 
            {
                'unit': 'kilometer', 
                'multiplier': 1e-3
            }
        )
    ]
}

Migration("agribalyse-transport").write(
    agribalyse_transport_data,
    description="Fix some transport unit conversions"
)

In [10]:
ag.migrate("agribalyse-transport")


Applying strategy: migrate_datasets
Applying strategy: migrate_exchanges

Link to ecoinvent again, to fix our now linkable transport inputs.


In [11]:
ag.match_database("ecoinvent 3.2 cutoff", fields=('reference product', 'location', 'unit', 'name'))


Applying strategy: link_iterable_by_fields

In [12]:
ag.statistics()


1261 datasets
115259 exchanges
113 unlinked exchanges
  Type technosphere: 31 unique unlinked exchanges
Out[12]:
(1261, 115259, 113)

Add waste treatment activities

There are a number of unlinked waste flows. I am not sure on the best way to handle these, so we will try a very simple approach. We will just add these as new activities, without any other inputs or environmental effects.


In [13]:
ag.add_unlinked_activities()
ag.statistics()


Applying strategy: link_iterable_by_fields
1292 datasets
115259 exchanges
0 unlinked exchanges
  
Out[13]:
(1292, 115259, 0)

Write the modified, fully linked database

We are finished with the importing process.


In [14]:
ag.write_database()


Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 2.333  | ETA[sec]: 4.123  | ETA[sec]: 7.568  | ETA[sec]: 6.914  | ETA[sec]: 7.082  | ETA[sec]: 7.498  | ETA[sec]: 8.129  | ETA[sec]: 7.346  | ETA[sec]: 6.489  | ETA[sec]: 6.087  | ETA[sec]: 5.413  | ETA[sec]: 5.082  | ETA[sec]: 5.265  | ETA[sec]: 4.997  | ETA[sec]: 4.681  | ETA[sec]: 4.413  | ETA[sec]: 4.098  | ETA[sec]: 4.154  | ETA[sec]: 3.783  | ETA[sec]: 3.362  | ETA[sec]: 3.115  | ETA[sec]: 2.671  | ETA[sec]: 2.331  | ETA[sec]: 1.975  | ETA[sec]: 1.623  | ETA[sec]: 1.311  | ETA[sec]: 0.996  | ETA[sec]: 0.656  | ETA[sec]: 0.328  | ETA[sec]: 0.000  | ETA[sec]: 0.000 
Total time elapsed: 9.796 sec
Title: Writing activities to SQLite3 database:
  Started: 01/12/2017 11:19:51
  Finished: 01/12/2017 11:20:01
  Total time elapsed: 9.796 sec
  CPU %: 92.100000
  Memory %: 1.472557
Created database: Agribalyse 1.3
Out[14]:
Brightway2 SQLiteBackend: Agribalyse 1.3

Checking the imported datasets

We need to do some basic validation to make sure we have meaningful results. Here I just do some basic testing, but you should validate against known scores if you are frequently using this database. The following code is rahter simple and is not a real validation check.


In [15]:
gwp = [x for x in methods if "IPCC 2013" in str(x)][0]
gwp


Out[15]:
('IPCC 2013', 'climate change', 'GTP 100a')

In [16]:
db = Database("Agribalyse 1.3")

lca = LCA({db.random(): 1}, gwp)
lca.lci(factorize=True)
lca.lcia()
lca.score


Out[16]:
1.2294694637032393

Let's calculate the LCIA scores of all activities in Agribalyse


In [17]:
import pyprind

scores = []

for act in pyprind.prog_bar(db):
    lca.redo_lcia({act: 1})
    scores.append(lca.score)


0%                          100%
[##############################] | ETA[sec]: 27.829  | ETA[sec]: 26.342  | ETA[sec]: 24.972  | ETA[sec]: 23.653  | ETA[sec]: 22.599  | ETA[sec]: 21.636  | ETA[sec]: 20.578  | ETA[sec]: 19.662  | ETA[sec]: 18.675  | ETA[sec]: 17.669  | ETA[sec]: 16.759  | ETA[sec]: 15.755  | ETA[sec]: 14.785  | ETA[sec]: 13.861  | ETA[sec]: 12.932  | ETA[sec]: 11.970  | ETA[sec]: 11.019  | ETA[sec]: 10.141  | ETA[sec]: 9.261  | ETA[sec]: 8.397  | ETA[sec]: 7.543  | ETA[sec]: 6.673  | ETA[sec]: 5.809  | ETA[sec]: 4.965  | ETA[sec]: 4.141  | ETA[sec]: 3.305  | ETA[sec]: 2.482  | ETA[sec]: 1.653  | ETA[sec]: 0.823  | ETA[sec]: 0.000  | ETA[sec]: 0.000 
Total time elapsed: 24.694 sec

In [18]:
import numpy as np

scores = np.array(scores)

mask = scores == 0
print("Number of activities with zero GWP score: {} of {}".format(mask.sum(), len(db)))


Number of activities with zero GWP score: 63 of 1292

In [19]:
%matplotlib notebook

In [20]:
import seaborn as sns


Warning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.

In [21]:
sns.distplot(np.log(scores[scores > 0]))


Warning: using a non-integer number instead of an integer will result in an error in the future
Warning: using a non-integer number instead of an integer will result in an error in the future
Warning: using a non-integer number instead of an integer will result in an error in the future
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x116f378d0>

Conclusion

We have imported the Agribalyse database. In the process of importing, we found and resolved several problems:

  1. First, we had to fix the names and categories of biosphere flow names, to make them compatible with the names and categories used in ecoinvent version 3.
  2. We linked inputs to activities in ecoinvent 3.2.
  3. Next, we created a new database for the new biosphere flows that we couldn't match.
  4. We added unlinked activities to our database.
  5. We fixed allocated products with zero production values.

This was a bit of a pain, but this is the sad truth of LCA data compatibility - it currently isn't all that great.


In [0]: