Generate a list of SMACT-allowed compositions

This notebook provides a short demo of how to use SMACT to generate a list of element compositions that could later be used as input for machine learning or some other screening workflow. We use the standard smact_filter as described in the docs and outlined more fully in this research paper.

In the example below, we generate ternary oxide compositions of the first row transition metals.



In [1]:

    
### Imports
from smact import Element, element_dictionary, ordered_elements
from smact.screening import smact_filter
from datetime import datetime
import itertools
import multiprocessing

We define the elements we are interested in:

List generation



In [2]:

    
all_el = element_dictionary()   # A dictionary of all element objects

# Say we are just interested in first row transition metals
els = [all_el[symbol] for symbol in ordered_elements(21,30)]

# We can print the symbols 
print([i.symbol for i in els])









    



['Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']

We will investiage ternary M1-M2-O combinations exhaustively, where M1 and M2 are different transition metals.



In [3]:

    
# Generate all M1-M2 combinations
metal_pairs = itertools.combinations(els, 2)
# Add O to each pair 
ternary_systems = [[*m, Element('O')] for m in metal_pairs]
# Prove to ourselves that we have all unique chemical systems
for i in ternary_systems:
    print(i[0].symbol, i[1].symbol, i[2].symbol)









    



Sc Ti O
Sc V O
Sc Cr O
Sc Mn O
Sc Fe O
Sc Co O
Sc Ni O
Sc Cu O
Sc Zn O
Ti V O
Ti Cr O
Ti Mn O
Ti Fe O
Ti Co O
Ti Ni O
Ti Cu O
Ti Zn O
V Cr O
V Mn O
V Fe O
V Co O
V Ni O
V Cu O
V Zn O
Cr Mn O
Cr Fe O
Cr Co O
Cr Ni O
Cr Cu O
Cr Zn O
Mn Fe O
Mn Co O
Mn Ni O
Mn Cu O
Mn Zn O
Fe Co O
Fe Ni O
Fe Cu O
Fe Zn O
Co Ni O
Co Cu O
Co Zn O
Ni Cu O
Ni Zn O
Cu Zn O



In [4]:

    
# Use multiprocessing and smact_filter to quickly generate our list of compositions
start = datetime.now()
if __name__ == '__main__':   # Always use pool protected in an if statement 
    with multiprocessing.Pool(processes=4) as p:    # start 4 worker processes
        result = p.map(smact_filter, ternary_systems)
print('Time taken to generate list:  {0}'.format(datetime.now()-start))









    



Time taken to generate list:  0:00:00.448839



In [5]:

    
# Flatten the list of lists
flat_list = [item for sublist in result for item in sublist]
print('Number of compositions: --> {0} <--'.format(len(flat_list)))
print('Each list entry looks like this:\n  elements, oxidation states, stoichiometries')
for i in flat_list[:5]:
    print(i)









    



Number of compositions: --> 16615 <--
Each list entry looks like this:
  elements, oxidation states, stoichiometries
(('Sc', 'Ti', 'O'), (1, -1, -2), (3, 1, 1))
(('Sc', 'Ti', 'O'), (1, -1, -2), (4, 2, 1))
(('Sc', 'Ti', 'O'), (1, -1, -2), (5, 1, 2))
(('Sc', 'Ti', 'O'), (1, -1, -2), (5, 3, 1))
(('Sc', 'Ti', 'O'), (1, -1, -2), (6, 4, 1))

Next steps

Pymatgen reduced formulas

We could turn the compositions into reduced formula using pymatgen (we lost the oxidation state information in this example).



In [6]:

    
from pymatgen import Composition
def comp_maker(comp):
    form = []
    for el, ammt in zip(comp[0], comp[2]):
        form.append(el)
        form.append(ammt)
    form = ''.join(str(e) for e in form)
    pmg_form = Composition(form).reduced_formula
    return pmg_form

if __name__ == '__main__':  
    with multiprocessing.Pool(processes=4) as p:
        pretty_formulas = p.map(comp_maker, flat_list)

print('Each list entry now looks like this: ')
for i in pretty_formulas[:5]:
    print(i)









    



Each list entry now looks like this: 
Sc3TiO
Sc4Ti2O
Sc5TiO2
Sc5Ti3O
Sc6Ti4O

Pandas

Finally, we could put this into a pandas DataFrame.



In [7]:

    
import pandas as pd
new_data = pd.DataFrame({'pretty_formula': pretty_formulas})
# Drop any duplicate compositions
new_data = new_data.drop_duplicates(subset = 'pretty_formula')
new_data.describe()









    Out[7]:







  
    
      
      pretty_formula
    
  
  
    
      count
      9353
    
    
      unique
      9353
    
    
      top
      Ti3V4O7
    
    
      freq
      1

Next steps

The dataframe can then be featurized for representation to a machine learning algorithm, for example in Scikit-learn. Below is a code snippet from a publicly avalable example to demonstrate this using the matminer package:

from matminer.featurizers.conversions import StrToComposition
from matminer.featurizers.base import MultipleFeaturizer
from matminer.featurizers import composition as cf

# Use featurizers from matminer to featurize data
str_to_comp = StrToComposition(target_col_id='composition_obj')
str_to_comp.featurize_dataframe(new_data, col_id='pretty_formula')

feature_calculators = MultipleFeaturizer([cf.Stoichiometry(), 
                         cf.ElementProperty.from_preset("magpie"),
                         cf.ValenceOrbital(props=['avg']), 
                         cf.IonProperty(fast=True),
                         cf.BandCenter(), cf.AtomicOrbitals()])

feature_labels = feature_calculators.feature_labels()
feature_calculators.featurize_dataframe(new_data, col_id='composition_obj');

D. W. Davies - 20th Feb 2019