The Combinatorial Explosion


"During the past century, science has developed a limited capability to design materials, but we are still too dependent on serendipity" - Eberhart and Clougherty, Looking for design in materials design (2004)

This practical explores how materials design can be approached by using the simplest of rules in order to narrow down the combinations to those that might be considered legitimate. It will demonstrate the scale of the problem, even after some chemical rules are applied.

TASKS

  • Section 1: 1 task
  • Section 2 i: 2 tasks
  • Section 2 ii: 2 tasks
  • Section 2 iii: 2 tasks
  • Section 2 iv: 3 tasks
  • Section 3 & 4: information only

NOTES ON USING THE NOTEBOOK

  • This notebook is divided into "cells" which either contain Markdown (text, equations and images) or Python code
  • A cell can be "run" by selecting it and either
    • pressing the Run button in the toolbar above (triangle/arrow symbol)
    • Using Cell > Run in the menu above
    • Holding the Ctrl key and pressing Enter
  • Running Markdown cells just displays them nicely (like this text!) Running Python code cells runs the code and displays any output below.
  • When you run a cell and it appears to not be doing anything, if there is no number in the square brackets and instead you see In [*] it is still running!
  • If the output produces a lot of lines, you can minimise the output box by clicking on the white space to the left of it.
  • You can clear the output of a cell or all cells by going to Cell > Current output/All output > Clear.

1. Back to basics: Forget your chemistry

(From the blog of Anubhav Jain: www.hackingmaterials.com)

  1. You have the first 50 elements of the periodic table
  2. You also have a 10 x 10 x 10 grid
  3. You are allowed to arrange 30 of the elements at a time in some combination in the grid to make a 'compound'
  4. How many different arrangements (different compounds) could you make?

The answer is about $10^{108}$, over a googol of compounds!

TASK: Use the cell below to arrive at the conclusion above. Hints for the formula required are below the cell.


In [ ]:
from math import factorial as factorial

grid_points = 1000.0
atoms = 30.0
elements = 50.0

##########

# A. Show that assigning each of the 30 atoms as one of 50 elements is ~ 9e50 (permutations)

element_assignment = 0

print(f'Number of possible element assignments is: {element_assignment}')

# B. Show that the number of possible arrangements of 30 atoms on a grid of 10x10x10 is ~2e57 (combinations)

atom_arrangements = 0

print(f'Number of atom arrangements is: {atom_arrangements}')

# C. Finally, show that the total number of potential "materials" is ~ 2e108

total_materials = 0

print(f'Total number of "materials" is: {total_materials}')

2. Counting combinations: Remember your chemistry

We will use well-known elemental properties along with the criterion that compounds must not have an overall charge in order to sequentially apply different levels of screening and count the possible combinations:

i. Setting up the search space - Defining which elements we want to include

ii. Element combination counting - Considering combinations of elements and ignore oxidation states

iii. Ion combination counting - Considering combinations of elements in their allowed oxidation states

iv. Charge neutrality - Discarding any combinations that would not make a charge neutral compound

v. Electronegativity - Discarding any combinations which exhibit a cation which is more electronegative than an anion

i. Setting up and choosing the search-space

The code below imports the element data that we need in order to do our counting. The main variable in the cell below for this practical is the max_atomic_number which dictates how many elements to consider.

For example, when max_atomic_number = 10 the elements from H to Ne are considered in the search.

  • TASK 1: Change the variable max_atomic_number so that it includes elements from H to Ar
  • TASK 2: Get the code to print out the actual list of elements that will be considered

In [ ]:
# Imports the SMACT toolkit for later on #
import smact

# Gets element data from file and puts into a list #
with open('Counting/element_data.txt','r') as f:
    data = f.readlines()

list_of_elements = []

# Specify the range of elements to include #

### EDIT BELOW ###
max_atomic_number = 10
##################

# Populates a list with the elements we are concerned with #
for line in data:
    if not line.startswith('#'):
        # Grab first three items from table row
        symbol, name, Z = line.split()[:3]
        if int(Z) > 0 and int(Z) < max_atomic_number + 1:
            list_of_elements.append(symbol)
            
print(f'--- Considering the {len(list_of_elements)} elements '
      f'from {list_of_elements[0]} to {list_of_elements[-1]} ---')

ii. Element combination counting

This first procedure simply counts how many binary combinations are possible for a given set of elements. This is a numerical (combinations) problem, as we are not considering element properties in any way for the time being.

  • TASK 1: Increase the number of elements to consider (max_atomic_number in the cell above) to see how this affects the number of combinations
  • TASK 2: If you can, add another for statement (e.g. for k, ele_c...) to make the cell count up ternary combinations. It is advisable to change the number of elements to include back to 10 first! Hint: The next exercise is set up for ternary counting so you could come back and do this after looking at that.

In [ ]:
# Counts up possibilities and prints the output #
element_count = 0
for i, ele_a in enumerate(list_of_elements):
    for j, ele_b in enumerate(list_of_elements[i+1:]):
        element_count += 1
        print(f'{ele_a} {ele_b}')

# Prints the total number of combinations found
print(f'Number of combinations = {element_count}')

iii. Ion combination counting

We now consider each known oxidation state of an element (so strictly speaking we are not dealing with 'ions'). The procedure incorporates a library of known oxidation states for each element and is this time already set up to search for ternary combinations. The code prints out the combination of elements including their oxidation states. There is also a timer so that you can see how long it takes to run the program.

  • TASK 1: Reset the search space to ~10 elements, read through (feel free to ask if you don't understand any parts!) and run the code below.
  • TASK 2: change max_atomic_number again in the cell above and see how this affects the number of combinations. Hint: It is advisable to increase the search-space gradually and see how long the calculation takes. Big numbers mean you could be waiting a while for the calculation to run....

In [ ]:
# Sets up the timer to see how long the program takes to run #
import time
start_time = time.time()

ion_count = 0
for i, ele_a in enumerate(list_of_elements):
    for ox_a in smact.Element(ele_a).oxidation_states:
        
        for j, ele_b in enumerate(list_of_elements[i+1:]):
            for ox_b in smact.Element(ele_b).oxidation_states:
                
                for k, ele_c in enumerate(list_of_elements[i+j+2:]):
                    for ox_c in smact.Element(ele_c).oxidation_states:
                        ion_count += 1
                    
                        print(f'{ele_a} {ox_a} \t {ele_b} {ox_b} \t {ele_c} {ox_c}')

# Prints the total number of combinations found and the time taken to run.
print(f'Number of combinations = {ion_count}')
print(f'--- {time.time() - start_time} seconds to run ---')

All we seem to have done is make matters worse!

We are introducing many more species by further splitting each element in our search-space into separate ions, one for each allowed oxidation state. When we get to max_atomic_number > 20, we are including the transition metals and their many oxidation states.

iv. Charge neutrality

The previous step is necessary to incorporate our filter that viable compounds must be charge neutral overall. Scrolling through the output from above, it is easy to see that the vast majority of the combinations are not charge neutral overall. We can discard these combinations to start narrowing our search down to more 'sensible' (or at least not totally unreasonable) ones. In this cell, we will use the neutral_ratios function in smact to do this.

  • TASK 1: Reset the search space to ~10 elements, read through (feel free to ask if you don't understand any parts!) and run the code below.
  • TASK 2: Edit the code so that it also prints out the oxidation state next to each element
  • TASK 3: Increase the number of elements to consider again (max_atomic_number in the cell above) and compare the output of i. and ii. with that of the below cell

In [ ]:
import time
from smact import neutral_ratios
start_time = time.time()

charge_neutral_count = 0
for i, ele_a in enumerate(list_of_elements):
    for ox_a in smact.Element(ele_a).oxidation_states:
        
        for j, ele_b in enumerate(list_of_elements[i+1:]):
            for ox_b in smact.Element(ele_b).oxidation_states:
                
                for k, ele_c in enumerate(list_of_elements[i+j+2:]):
                    for ox_c in smact.Element(ele_c).oxidation_states:
                        
                        # Checks if the combination is charge neutral before printing it out! #                        
                        cn_e, cn_r = neutral_ratios([ox_a, ox_b, ox_c], threshold=1)
                        if cn_e:
                            charge_neutral_count += 1
                            print(f'{ele_a} \t {ele_b} \t {ele_c}')

print(f'Number of combinations = {charge_neutral_count}')
print(f'--- {time.time() - start_time} seconds to run ---')

This drastically reduces the number of combinations we get out and we can even begin to see some compounds that we recognise and know exist.

v. Electronegativity

The last step is to incorporate the key chemical property of electronegativity, i.e. the propensity of an element to attract electron density to itself in a bond. This is a logical step as inspection of the output from above reveals that some combinations feature a species in a higher (more positive) oxidation state which is more elecronegative than other species present.

With this in mind, we now incorporate another filter which checks that the species with higher oxidation states have lower electronegativities. The library of values used is of the widely accepted electronegativity scale as developed by Linus Pauling. The scale is based on the dissociation energies of heteronuclear diatomic molecules and their corresponding homonuclear diatomic molecules:


In [ ]:
import time
from smact.screening import pauling_test
start_time = time.time()

pauling_count = 0
for i, ele_a in enumerate(list_of_elements):
    paul_a = smact.Element(ele_a).pauling_eneg
    for ox_a in smact.Element(ele_a).oxidation_states:
        
        for j, ele_b in enumerate(list_of_elements[i+1:]):
            paul_b = smact.Element(ele_b).pauling_eneg
            for ox_b in smact.Element(ele_b).oxidation_states:
                
                for k, ele_c in enumerate(list_of_elements[i+j+2:]):
                    paul_c = smact.Element(ele_c).pauling_eneg
                    for ox_c in smact.Element(ele_c).oxidation_states:
                        
                        # Puts elements, oxidation states and electronegativites into lists for convenience #
                        elements = [ele_a, ele_b, ele_c]
                        oxidation_states = [ox_a, ox_b, ox_c]
                        pauling_electro = [paul_a, paul_b, paul_c]
                        
                        # Checks if the electronegativity makes sense and if the combination is charge neutral #
                        electroneg_makes_sense = pauling_test(oxidation_states, pauling_electro, elements)
                        cn_e, cn_r = smact.neutral_ratios([ox_a, ox_b, ox_c], threshold=1)
                        if cn_e:
                            if electroneg_makes_sense:
                                pauling_count += 1
                                print(f'{ele_a}{ox_a} \t {ele_b}{ox_b} \t {ele_c}{ox_c}')
                                
print(f'Number of combinations = {pauling_count}')
print(f'--- {time.time() - start_time} seconds to run ---')

3. Speedy calculations

For a given search-space of elements, the number of combinations in the output has decreased each time we've applied a filter. However, the time taken to carry out the calculation has increased. This highlights a fundamental trade-off, however there are some clever coding techniqes that can be used to considerably speed things up.

By employing multi-threading and reworking the above code to reduce the number of times it has to look up element properties from other files, the time taken to carry out the ternary count including checks for charge neutrality and electronegativity for 103 elements was reduced from ~ 1 hour to just 26 seconds (carried out on the same workstation)!

Furthermore, a quaternary count for 103 atoms took only 2 hours. This was reducecd to 40 minutes using a 16-core workstation. The code used to carry out these calculations is available in the examples folder of the main smact repository.

N.B. For the numbers quoted above, the stoichiometry of each site was also allowed to vary between 1 and 8 which significantly increases the number of combinations (see below).

4. Some technicalities...

A note on multiplicity: In this exercise, we have not considered higher multiplicies (stoichiometries) in our combinations at all, i.e. we have considered only AB, ABC (and ABCD for quaternary) type combinations. When extended to AxByCz... where x,y,z > 1, the numbers involved get considerably larger still. This can be adjusted by setting threshold in the charge neutrality function to > 1 in the cells above. The threshold defines the maximum values of x,y,z... . If this is changed, the sum of the oxidation states printed will not always sum to zero, however some multiples (between 1 and threshold) of them always will.

Finally, some wisdom from our friend, Linus: