Organized high-throughput calculations: job-folders

Pylada provides tools to organize high-throughput calculations in a systematic manner. The whole high-throughput experience revolves around job-folders. These are convenient ways of organizing actual calculations. They can be though of as folders on a file system, or directories in unix parlance, each one dedicated to running a single actual calculation (eg launching :ref:`VASP

` once). The added benefits beyond creating the same file-structure with bash are:

  1. the ability to create a tree of folders/calculations using the power of the python programming language. No more copy-pasting files and unintelligible bash scripts!
  2. the ability to launch all folders simultaneously
  3. the ability to collect the results across all folders simultaneously, all within python, and with all of python's goodies. E.g. no more copy-pasting into excel by hand. Just do the summing, and multiplying, and graphing there and then.

Actually, there are a lot more benefits. Having everything - from input to output - within the same modern and efficient programming language means there is no limit to what can be achieved.

The following describes how job-folders are created. The fun bits, launching jobs, collecting results, manipulating all job-folders simultaneously, can be found in the next section. Indeed, all of these are intrinsically linked to the Pylada's IPython interface.

 Prep: creating a dummy functional

First off, we will need a functional. Rather that use something heavy, like VASP, we will use a dummy functional which does pretty much nothing... We will write it to a file, so that it can be imported later on.


In [1]:
%%writefile dummy.py
def functional(structure, outdir=None, value=False, **kwargs):
    """ A dummy functional """
    from copy import deepcopy
    from pickle import dump
    from random import random
    from py.path import local

    structure = deepcopy(structure)
    structure.value = value
    outdir = local(outdir)
    outdir.ensure(dir=True)
    dump((random(), structure, value, functional), outdir.join('OUTCAR').open('wb'))

    return Extract(outdir)


Writing dummy.py

This functional takes a few arguments, amongst which an output directory, and writes a file to disk. That's pretty much it.

However, you'll notice that it returns an object of class Extract. We'll create this class in a second. This class is capable of checking whether the functional did run correctly or not (Extract.success attribute is True or False). For VASP or Espresso, it is also capable of parsing output files to recover quantities, like the total energy or the eigenvalues.

This class is not completely necessary to create the Job Folder, but knowing when a job a successful and being able to easily process it's ouput are really nice features to have.

The following is a dummy Extraction classs for the dummy functional. It knows to check for the existence of an OUTCAR file (a dummy OUTCAR, not a real one) and how to parse it.


In [2]:
%%writefile -a dummy.py

def Extract(outdir=None):
    """ An extraction function for a dummy functional """
    from os import getcwd
    from collections import namedtuple
    from pickle import load
    from py.path import local

    if outdir == None:
        outdir = local()()
    Extract = namedtuple('Extract', ['success', 'directory',
                                     'energy', 'structure', 'value', 'functional'])
    outdir = local(outdir)
    if not outdir.check():
        return Extract(False, str(outdir), None, None, None, None)
    if not outdir.join('OUTCAR').check(file=True):
        return Extract(False, str(outdir), None, None, None, None)
    with outdir.join('OUTCAR').open('rb') as file:
        structure, energy, value, functional = load(file)
        return Extract(True, outdir, energy, structure, value, functional)
functional.Extract = Extract


Appending to dummy.py

 Creating and accessing job-folders

Job-folders can be created with two simple lines of codes:


In [3]:
from pylada.jobfolder import JobFolder
root = JobFolder()

To add further job-folders, one can do:


In [4]:
jobA = root / 'jobA'
jobB = root / 'another' / 'jobB'
jobBprime = root / 'another' / 'jobB' / 'prime'

As you can, see job-folders can be given any structure that on-disk directories can. What is more, a job-folder can access other job-folders with the same kind of syntax that one would use (on unices) to access other directories:


In [5]:
assert jobA['/'] is root
assert jobA['../another/jobB'] is jobB
assert jobB['prime'] is jobBprime
assert jobBprime['../../'] is not jobB

And trying to access non-existing folders will get you in trouble:


In [6]:
try:
    root['..']
except KeyError:
    pass
else:
    raise Exception("I expected an error")

Furthermore, job-folders know what they are:


In [7]:
jobA.name


Out[7]:
'/jobA/'

Who they're parents are:


In [8]:
jobB.parent.name


Out[8]:
'/another/'

They know about their sub-folders:


In [9]:
assert 'prime' in jobB
assert '/jobA' in jobBprime

As well as their ancestral lineage all the way to the first matriarch:


In [10]:
assert jobB.root is root

 A Job-folder that executes code

The whole point of a job-folder is to create an architecture for calculations. Each job-folder can contain at most a single calculation. A calculation is setup by passing to the job-folder a function and the parameters for calling it.


In [11]:
from pylada.crystal.binary import zinc_blende
from dummy import functional

jobA.functional = functional
jobA.params['structure'] = zinc_blende()
jobA.params['value'] = 5

In the above, the function functional from the dummy module created previously is imported into the namespace. The special attribute job.functional is set to functional. Two arguments, structure and value, are specified by adding the to the dictionary job.params. Please note that the third line does not contain parenthesis: this is not a function call, it merely saves a reference to the function with the object of calling it later. 'C' aficionados should think a saving a pointer to a function.

Warning: The reference to functional is deepcopied: the instance that is saved to jod-folder is not the one that was passed to it. On the other hand, the parameters (jobA.params) are held by reference rather than by value.

Tip: To force a job-folder to hold a functional by reference rather than by value, do:

jobA._functional = functional

The parameters in job.params should be pickleable so that the folder can be saved to disk later. Jobfolder.functional must be a pickleable and callable. Setting Jobfolder.functional to something else will immediately fail. In practice, this means it can be a function or a callable class, as long as that function or class is imported from a module. It cannot be defined in __main__, e.g. the script that you run to create the job-folders. And that's why the dummy functional in this example is written to it's own dummy.py file.

That said, we can now execute each jobA by calling the function compute:


In [12]:
directory = "tmp/" + jobA.name[1:]
result = jobA.compute(outdir=directory)
assert result.success

Assuming that you the unix program tree, the following will show that an OUTCAR file was created in the right directory:


In [13]:
%%bash
[ ! -e tree ] || tree tmp/


tmp/
└── jobA
    └── OUTCAR

1 directory, 1 file

Running the job-folder jobA is exactly equivalent to calling the functional directly:

functional(structure=zinc_blende(), value=5, outdir='tmp/jobA')

In practice, what we have done is created an interface where any program can be called in the same way. This will be extremly useful when launching many jobs simultaneously.

 Creating multiple executable jobs

The crux of this setup is the ability to create jobs programmatically:

Finally, let's not that executable job-folders (i.e. for which jofolder.functional is set) can be easily iterated over with jobfolder.keys(), jobfolder.values(), and jobfolder.items().


In [14]:
from pylada.jobfolder import JobFolder
from pylada.crystal.binary import zinc_blende

root = JobFolder()

structures = ['diamond', 'diamond/alloy', 'GaAs']
stuff = [0, 1, 2]
species = [('Si', 'Si'), ('Si', 'Ge'), ('Ga', 'As')]

for name, value, species in zip(structures, stuff, species):
    job = root / name
    job.functional = functional
    job.params['value'] = value
    job.params['structure'] = zinc_blende()
    
    for atom, specie in zip(job.structure, species):
        atom.type = specie

print(root)


Folders: 
  GaAs
  diamond
  diamond/alloy

We can now iterate over executable subfolders:


In [15]:
print(list(root.keys()))


['GaAs', 'diamond', 'diamond/alloy']

Or subsets of executable folders:


In [16]:
for jobname, job in root['diamond'].items():
    print("diamond/", jobname, " with ", len(job.params['structure']), " atoms")


diamond/   with  2  atoms
diamond/ alloy  with  2  atoms

 Saving to disk using the python API

Jobfolders can be saved to and loaded from disk using python functions:


In [17]:
from pylada.jobfolder import load, save
save(root, 'root.dict', overwrite=True) # saves to file
root = load('root.dict') # loads from file
print(root)


Folders: 
  GaAs
  diamond
  diamond/alloy

But pylada also provides an ipython interface for dealing with jobfolders. It is described elsewhere. The difference between the python and the ipython interfaces are a matter of convenience.