Any questions from the reading?
From the article, we can see the importance of data science in chemical engineering. My question is, how much effort should we put in data science and traditional experiments respectively? It seems data science combined with simulations can do lots of stuffs. I mean, is it possible that in the future, 90% of research will do computation, and the remaining will do experiments which are already predicted by machine, or do some tests in order to generate some data?
Another question which is not much related to the article, will we get enough skills doing data science at the end of this quarter?
Provide your brief summary here in markdown format. Please make every effort that it is free from typos and grammatical mistakes. Excessive typos or basic grammatical mistakes (i.e., that interfere with readability) will be marked down (no pun) 15%. There are Jupyter plug-ins that can do spell-check, but if you are concerned it might be faster to just copy-paste your summary and then format it correctly.
Metal-organic frameworks (MOF) are a kind of nanoporous solids formed by metal ions or clusters and polydentate organic linkers, being used widely in gas separation and storage, catalysis, nonlinear optics, sensing, controlled drug release, and light-harvesting. By a grand canonical Monte Carlo (GCMC) simulations, structural and functional properties of a MOF can be calculated in great agreement with experimental results. However, in computational screening, there are hundreds of thousands of hypothetical MOFs structures because of massive libraries of hypothetical nanoporous materials, making GCMC simulations impossible.
In this article, they used machine learning and cheminformatic models to preselect high-performing structures and discard low ones. They developed accurate quantitative structure-property relationship (QSPR) models by purely geometrical features of the material, like pore size, surface area, and void fraction, and combined with atomic property–weighted radial distribution function (AP-RDF) descriptor to predict CO2 uptake in MOFs. A database of 324,500 hypothetical MOF structures is generated. They selected 10% of the database to form the calibration set randomly used to train the QSPR models. The remaining MOFs formed the test set used to validate their models. GCMC simulations were used to calculate the CO2 uptake of all MOFs. A MOF is classified as high-performing if it possesses an uptake of greater than 4 mmol/g at 1 bar CO2, as low-performing if it is below 4 mmol/g. Then a cutoff parameter can be used at run-time to decide which one is worth of more compute intensive screening. By using this classifier, we would reduce the large number of required GCMC simulations.