Loading, saving and exporting data

Pymrio includes several functions for data reading and storing. This section presents the methods to use for saving and loading data already in a pymrio compatible format. For parsing raw MRIO data see the different tutorials for working with available MRIO databases.

Here, we use the included small test MRIO system to highlight the different function. The same functions are available for any MRIO loaded into pymrio. Expect, however, significantly decreased performance due to the size of real MRIO system.


In [1]:
import pymrio
import os
io = pymrio.load_test().calc_all()

Basic save and read

To save the full system, use:


In [2]:
save_folder_full = '/tmp/testmrio/full'
io.save_all(path=save_folder_full)


Out[2]:
<pymrio.core.mriosystem.IOSystem at 0x7fee1712dc10>

To read again from that folder do:


In [3]:
io_read = pymrio.load_all(path=save_folder_full)

The fileio activities are stored in the included meta data history field:


In [4]:
io_read.meta


Out[4]:
Description: test mrio for pymrio
MRIO Name: testmrio
System: pxp
Version: v1
File: /tmp/testmrio/full/metadata.json
History:
20191007 13:24:05 - FILEIO -  Added satellite account from /tmp/testmrio/full/factor_inputs
20191007 13:24:05 - FILEIO -  Added satellite account from /tmp/testmrio/full/emissions
20191007 13:24:05 - FILEIO -  Loaded IO system from /tmp/testmrio/full
20191007 13:24:05 - FILEIO -  Saved testmrio to /tmp/testmrio/full
20191007 13:24:05 - MODIFICATION -  Calculating accounts for extension emissions
20191007 13:24:05 - MODIFICATION -  Calculating accounts for extension factor_inputs
20191007 13:24:05 - MODIFICATION -  Leontief matrix L calculated
20191007 13:24:05 - MODIFICATION -  Coefficient matrix A calculated
20191007 13:24:05 - MODIFICATION -  Industry output x calculated
20191007 13:24:05 - FILEIO -  Load test_mrio from /home/konstans/proj/pymrio/pymrio/mrio_models/test_mrio
 ... (more lines in history)

Storage format

Internally, pymrio stores data in csv format, with the 'economic core' data in the root and each satellite account in a subfolder. Metadata as file as a file describing the data format ('file_parameters.json') are included in each folder.


In [5]:
import os
os.listdir(save_folder_full)


Out[5]:
['emissions',
 'factor_inputs',
 'metadata.json',
 'file_parameters.json',
 'population.txt',
 'unit.txt',
 'L.txt',
 'A.txt',
 'x.txt',
 'Y.txt',
 'Z.txt']

The file format for storing the MRIO data can be switched to a binary pickle format with:


In [6]:
save_folder_bin = '/tmp/testmrio/binary'
io.save_all(path=save_folder_bin, table_format='pkl')
os.listdir(save_folder_bin)


Out[6]:
['emissions',
 'factor_inputs',
 'metadata.json',
 'file_parameters.json',
 'population.pkl',
 'unit.pkl',
 'L.pkl',
 'A.pkl',
 'x.pkl',
 'Y.pkl',
 'Z.pkl']

This can be used to reduce the storage space required on the disk for large MRIO databases.

Archiving MRIOs databases

To archive a MRIO system after saving use pymrio.archive:


In [7]:
mrio_arc = '/tmp/testmrio/archive.zip'

# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass
    
pymrio.archive(source=save_folder_full, archive=mrio_arc)

Data can be read directly from such an archive by:


In [8]:
tt = pymrio.load_all(mrio_arc)

Currently data can not be saved directly into a zip archive. It is, however, possible to remove the source files after archiving:


In [9]:
tmp_save = '/tmp/testmrio/tmp'

# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass

io.save_all(tmp_save)

print("Directories before archiving: {}".format(os.listdir('/tmp/testmrio')))
pymrio.archive(source=tmp_save, archive=mrio_arc, remove_source=True)
print("Directories after archiving: {}".format(os.listdir('/tmp/testmrio')))


Directories before archiving: ['tmp', 'emission_footprints.xlsx', 'factor_inputs', 'emissions', 'Z.txt', 'Y.txt', 'x.txt', 'A.txt', 'L.txt', 'unit.txt', 'population.txt', 'file_parameters.json', 'metadata.json', 'binary', 'full']
Directories after archiving: ['archive.zip', 'emission_footprints.xlsx', 'factor_inputs', 'emissions', 'Z.txt', 'Y.txt', 'x.txt', 'A.txt', 'L.txt', 'unit.txt', 'population.txt', 'file_parameters.json', 'metadata.json', 'binary', 'full']

Several MRIO databases can be stored in the same archive:


In [10]:
# Remove a potentially existing archive from before
try:
    os.remove(mrio_arc)
except FileNotFoundError:
    pass

tmp_save = '/tmp/testmrio/tmp'

io.save_all(tmp_save)
pymrio.archive(source=tmp_save, archive=mrio_arc, path_in_arc='version1/', remove_source=True)
io2 = io.copy()
del io2.emissions
io2.save_all(tmp_save)
pymrio.archive(source=tmp_save, archive=mrio_arc, path_in_arc='version2/', remove_source=True)

When loading from an archive which includes multiple MRIO databases, specify one with the parameter 'path_in_arc':


In [11]:
io1_load = pymrio.load_all(mrio_arc, path_in_arc='version1/')
io2_load = pymrio.load_all(mrio_arc, path_in_arc='version2/')

print("Extensions of the loaded io1 {ver1} and of io2: {ver2}".format(
     ver1=list(io1_load.get_extensions()),
     ver2=list(io2_load.get_extensions())))


Extensions of the loaded io1 ['factor_inputs', 'emissions'] and of io2: ['factor_inputs']

The pymrio.load function can be used directly to only a specific satellite account of a MRIO database from a zip archive:


In [12]:
emissions = pymrio.load(mrio_arc, path_in_arc='version1/emissions')
print(emissions)


Extension Emissions with parameters: name, F, F_Y, S, S_Y, M, D_cba, D_pba, D_imp, D_exp, unit, D_cba_reg, D_pba_reg, D_imp_reg, D_exp_reg, D_cba_cap, D_pba_cap, D_imp_cap, D_exp_cap

The archive function is a wrapper around python.zipfile module. There are, however, some differences to the defaults choosen in the original:

  • In contrast to zipfile.write, pymrio.archive raises an error if the data (path + filename) are identical in the zip archive. Background: the zip standard allows that files with the same name and path are stored side by side in a zip file. This becomes an issue when unpacking this files as they overwrite each other upon extraction.

  • The standard for the parameter 'compression' is set to ZIP_DEFLATED This is different from the zipfile default (ZIP_STORED) which would not give any compression. See the zipfile docs for further information. Depending on the value given for the parameter 'compression' additional modules might be necessary (e.g. zlib for ZIP_DEFLATED). Futher information on this can also be found in the zipfile python docs.

Storing or exporting a specific table or extension

Each extension of the MRIO system can be stored separetly with:


In [13]:
save_folder_em= '/tmp/testmrio/emissions'

In [14]:
io.emissions.save(path=save_folder_em)


Out[14]:
<pymrio.core.mriosystem.Extension at 0x7fee1712d710>

This can then be loaded again as separate satellite account:


In [15]:
emissions = pymrio.load(save_folder_em)

In [16]:
emissions


Out[16]:
<pymrio.core.mriosystem.Extension at 0x7fee0e421e90>

In [17]:
emissions.D_cba


Out[17]:
region reg1 reg2 ... reg5 reg6
sector food mining manufactoring electricity construction trade transport other food mining ... transport other food mining manufactoring electricity construction trade transport other
stressor compartment
emission_type1 air 2.056183e+06 179423.535893 9.749300e+07 1.188759e+07 3.342906e+06 3.885884e+06 1.075027e+07 1.582152e+07 1.793338e+06 19145.604911 ... 4.209505e+07 1.138661e+07 1.517235e+07 1.345318e+06 7.145075e+07 3.683167e+07 1.836696e+06 4.241568e+07 4.805409e+07 3.602298e+07
emission_type2 water 2.423103e+05 25278.192086 1.671240e+07 1.371303e+05 3.468292e+05 7.766205e+05 4.999628e+05 8.480505e+06 2.136528e+05 3733.601474 ... 4.243738e+06 7.307208e+06 4.420574e+06 5.372216e+05 1.068144e+07 5.728136e+05 9.069515e+05 5.449044e+07 8.836484e+06 4.634899e+07

2 rows × 48 columns

As all data in pymrio is stored as pandas DataFrame, the full pandas stack for exporting tables is available. For example, to export a table as excel sheet use:


In [18]:
io.emissions.D_cba.to_excel('/tmp/testmrio/emission_footprints.xlsx')

For further information see the pandas documentation on import/export.