Han, Kehang (hkh12@mit.edu)
This notebook is designed to demonstrate how to access a centralized database hosted on RMG server. The database currently contains several big tables, including the most comprehensive one sdata134k_table
containing all the 134k molecules. People are welcome to access to other tables as well. They are mostly subsets of sdata134k_table
, e.g., small_cyclic_table
contains all the hydrocarbon cyclics with less than 3 rings.
In [1]:
from rmgpy.data.rmg import RMGDatabase
from rmgpy import settings
from rmgpy.species import Species
from rmgpy.rmg.main import RMG
from IPython.display import display
import numpy as np
import os
import pandas as pd
from pymongo import MongoClient
import logging
logging.disable(logging.CRITICAL)
In [2]:
def get_data(host, db_name, collection_name, port=27017):
# connect to db and query
client = MongoClient(host, port)
db = getattr(client, db_name)
collection = getattr(db, collection_name)
db_cursor = collection.find()
# collect data
print('reading data...')
db_mols = []
for db_mol in db_cursor:
db_mols.append(db_mol)
print('done')
return db_mols
In [3]:
database = RMGDatabase()
In [4]:
database.load(settings['database.directory'], thermoLibraries=[],\
kineticsFamilies='none', kineticsDepositories='none', reactionLibraries = [])
thermoDatabase = database.thermo
In [5]:
# fetch testing dataset
db_name = 'sdata134k'
collection_name = 'small_cyclic_table'
host = 'mongodb://user:user@rmg.mit.edu/admin'
port = 27018
db_mols = get_data(host, db_name, collection_name, port)
len(db_mols)
Out[5]:
In [6]:
# Don't use G298, not formation, in hartrees
# Hf298 in kcal/mol
# S298 in cal/mol
db_mols[0]
Out[6]:
In [ ]: