Import data from Catalysishub.org for CatMAP

In this tutorial we will

-   Download a set of formation energies from a publication and export them in a CatMAP EnergyLandscape object.
-   Create an ASE-db sqlite3 file containing the corresponding atomic structures.

In [1]:
# Import modules.
import os
import ase.db
from ase.visualize import view

from catmap.api.cathub import CatalysisHub

Download formation energies.

First we need to write a query for a publication.


In [2]:
# Instantiate cathub interface object.
cathub = CatalysisHub()

# GraphQL search string to the publications table.
publication = "WO3 surface doping for OER to be published"

Formation energies as used by CatMAP are simply reaction energies with respect to a fixed reference. Therefore, you only need to query for reactions from the relevant gas phase references, in order to download the relevant set of formation energies.


In [3]:
# Choose your references.
references = ['H2gas', 'H2Ogas']

# Fetch energies and create an EnergyLandscape object.
energy_landscape = cathub.publication_energy_landscape(publication, references, site_specific=True, limit=10)

We have now retrieved a list of dictionaries, reactions. The reaction energies can be attached to a catmap.api.energy_landscape object as formation energies.


In [4]:
# Take a peak.
energy_landscape.formation_energies


Out[4]:
{'1_OH_FeW15HO48_phase_surf_1_cell_B': 2.12481465,
 '1_OH_HfW15O47_phase_surf_1_cell_A': 1.29929107,
 '1_OH_MoW15O48_phase_surf_1_cell_B': 2.15319344,
 '1_OH_W15ZrO47_phase_surf_1_cell_B': 2.24930634,
 '1_OOH_TaW15O47_phase_surf_1_cell_A': 3.12919949,
 '1_O_CrW15O47_phase_surf_1_cell_A': 1.64600583,
 '1_O_CrW15O47_phase_surf_1_cell_B': 3.43933557,
 '1_O_NbW15O48_phase_surf_1_cell_B': 4.8276208,
 '1_O_VW15O48_phase_surf_1_cell_A': 2.98979267,
 '1_O_W15ZrO48_phase_surf_1_cell_A': 3.64217135}

Finally, as usual, we export a CatMAP input file.


In [9]:
fname = 'my_energies.txt'
energy_landscape.make_input_file(fname)


Formation energies exported to my_energies.txt

In [11]:
# Take a peak at the file.
with open(fname) as fp:
    for line in fp.readlines()[:5]:
        print(line)


surface_name	phase	site_name	species_name	formation_energy	frequencies	reference	coverage	std

CrW15O47	phase	1	O	1.646	[]	UmVhY3Rpb246MzQ1Mzk=	0.0	nan

CrW15O47	phase	1	O	3.4393	[]	UmVhY3Rpb246MzQ1Mzg=	0.0	nan

FeW15HO48	phase	1	OH	2.1248	[]	UmVhY3Rpb246MzQ1NzA=	0.0	nan

HfW15O47	phase	1	OH	1.2993	[]	UmVhY3Rpb246MzQ2MjM=	0.0	nan

Notice the reference column contains catalysis-hub ids corresponding to the atomic structure.

Atomic structures.

Next, we will retrieve atomic structures from the publication.


In [12]:
# Return a list of atoms objects.
images = cathub.get_publication_atoms(publication, limit=10)

# This may take time due to a GraphQL query per atom object.


100%|██████████| 10/10 [00:07<00:00,  1.34it/s]

Finally, we can save them to an ase database, keeping the catalysis-hub ids, to connect them with the energy data file.


In [13]:
# Save them to an ASE-db file.
os.remove('my_asedb.db')
c = ase.db.connect('my_asedb.db')

for atoms in images:
    c.write(atoms, key_value_pairs=atoms.info['key_value_pairs'])

In [ ]:
# Alternative approach.

pubid = cathub.get_publication_uids
unique_ids = cathub.get_publication_uids