This notebook provides a short demo of how to use SMACT to generate a list of element compositions that could later be used as input for machine learning or some other screening workflow.
We use the standard smact_filter
as described in the docs and outlined more fully in this research paper.
In the example below, we generate ternary oxide compositions of the first row transition metals.
In [1]:
### Imports
from smact import Element, element_dictionary, ordered_elements
from smact.screening import smact_filter
from datetime import datetime
import itertools
import multiprocessing
We define the elements we are interested in:
In [2]:
all_el = element_dictionary() # A dictionary of all element objects
# Say we are just interested in first row transition metals
els = [all_el[symbol] for symbol in ordered_elements(21,30)]
# We can print the symbols
print([i.symbol for i in els])
We will investiage ternary M1-M2-O combinations exhaustively, where M1 and M2 are different transition metals.
In [3]:
# Generate all M1-M2 combinations
metal_pairs = itertools.combinations(els, 2)
# Add O to each pair
ternary_systems = [[*m, Element('O')] for m in metal_pairs]
# Prove to ourselves that we have all unique chemical systems
for i in ternary_systems:
print(i[0].symbol, i[1].symbol, i[2].symbol)
In [4]:
# Use multiprocessing and smact_filter to quickly generate our list of compositions
start = datetime.now()
if __name__ == '__main__': # Always use pool protected in an if statement
with multiprocessing.Pool(processes=4) as p: # start 4 worker processes
result = p.map(smact_filter, ternary_systems)
print('Time taken to generate list: {0}'.format(datetime.now()-start))
In [5]:
# Flatten the list of lists
flat_list = [item for sublist in result for item in sublist]
print('Number of compositions: --> {0} <--'.format(len(flat_list)))
print('Each list entry looks like this:\n elements, oxidation states, stoichiometries')
for i in flat_list[:5]:
print(i)
In [6]:
from pymatgen import Composition
def comp_maker(comp):
form = []
for el, ammt in zip(comp[0], comp[2]):
form.append(el)
form.append(ammt)
form = ''.join(str(e) for e in form)
pmg_form = Composition(form).reduced_formula
return pmg_form
if __name__ == '__main__':
with multiprocessing.Pool(processes=4) as p:
pretty_formulas = p.map(comp_maker, flat_list)
print('Each list entry now looks like this: ')
for i in pretty_formulas[:5]:
print(i)
In [7]:
import pandas as pd
new_data = pd.DataFrame({'pretty_formula': pretty_formulas})
# Drop any duplicate compositions
new_data = new_data.drop_duplicates(subset = 'pretty_formula')
new_data.describe()
Out[7]:
The dataframe can then be featurized for representation to a machine learning algorithm, for example in Scikit-learn. Below is a code snippet from a publicly avalable example to demonstrate this using the matminer package:
from matminer.featurizers.conversions import StrToComposition
from matminer.featurizers.base import MultipleFeaturizer
from matminer.featurizers import composition as cf
# Use featurizers from matminer to featurize data
str_to_comp = StrToComposition(target_col_id='composition_obj')
str_to_comp.featurize_dataframe(new_data, col_id='pretty_formula')
feature_calculators = MultipleFeaturizer([cf.Stoichiometry(),
cf.ElementProperty.from_preset("magpie"),
cf.ValenceOrbital(props=['avg']),
cf.IonProperty(fast=True),
cf.BandCenter(), cf.AtomicOrbitals()])
feature_labels = feature_calculators.feature_labels()
feature_calculators.featurize_dataframe(new_data, col_id='composition_obj');
D. W. Davies - 20th Feb 2019