Pylada provides tools to organize high-throughput calculations in a systematic manner. The whole high-throughput experience revolves around job-folders. These are convenient ways of organizing actual calculations. They can be though of as folders on a file system, or directories in unix parlance, each one dedicated to running a single actual calculation (eg launching :ref:`VASP
Actually, there are a lot more benefits. Having everything - from input to output - within the same modern and efficient programming language means there is no limit to what can be achieved.
The following describes how job-folders are created. The fun bits, launching jobs, collecting results, manipulating all job-folders simultaneously, can be found in the next section. Indeed, all of these are intrinsically linked to the Pylada's IPython interface.
In [1]:
%%writefile dummy.py
def functional(structure, outdir=None, value=False, **kwargs):
""" A dummy functional """
from copy import deepcopy
from pickle import dump
from random import random
from py.path import local
structure = deepcopy(structure)
structure.value = value
outdir = local(outdir)
outdir.ensure(dir=True)
dump((random(), structure, value, functional), outdir.join('OUTCAR').open('wb'))
return Extract(outdir)
This functional takes a few arguments, amongst which an output directory, and writes a file to disk. That's pretty much it.
However, you'll notice that it returns an object of class Extract. We'll create this class in a second. This class is capable of checking whether the functional did run correctly or not (Extract.success attribute is True or False). For VASP or Espresso, it is also capable of parsing output files to recover quantities, like the total energy or the eigenvalues.
This class is not completely necessary to create the Job Folder, but knowing when a job a successful and being able to easily process it's ouput are really nice features to have.
The following is a dummy Extraction classs for the dummy functional. It knows to check for the existence of an OUTCAR file (a dummy OUTCAR, not a real one) and how to parse it.
In [2]:
%%writefile -a dummy.py
def Extract(outdir=None):
""" An extraction function for a dummy functional """
from os import getcwd
from collections import namedtuple
from pickle import load
from py.path import local
if outdir == None:
outdir = local()()
Extract = namedtuple('Extract', ['success', 'directory',
'energy', 'structure', 'value', 'functional'])
outdir = local(outdir)
if not outdir.check():
return Extract(False, str(outdir), None, None, None, None)
if not outdir.join('OUTCAR').check(file=True):
return Extract(False, str(outdir), None, None, None, None)
with outdir.join('OUTCAR').open('rb') as file:
structure, energy, value, functional = load(file)
return Extract(True, outdir, energy, structure, value, functional)
functional.Extract = Extract
In [3]:
from pylada.jobfolder import JobFolder
root = JobFolder()
To add further job-folders, one can do:
In [4]:
jobA = root / 'jobA'
jobB = root / 'another' / 'jobB'
jobBprime = root / 'another' / 'jobB' / 'prime'
As you can, see job-folders can be given any structure that on-disk directories can. What is more, a job-folder can access other job-folders with the same kind of syntax that one would use (on unices) to access other directories:
In [5]:
assert jobA['/'] is root
assert jobA['../another/jobB'] is jobB
assert jobB['prime'] is jobBprime
assert jobBprime['../../'] is not jobB
And trying to access non-existing folders will get you in trouble:
In [6]:
try:
root['..']
except KeyError:
pass
else:
raise Exception("I expected an error")
Furthermore, job-folders know what they are:
In [7]:
jobA.name
Out[7]:
Who they're parents are:
In [8]:
jobB.parent.name
Out[8]:
They know about their sub-folders:
In [9]:
assert 'prime' in jobB
assert '/jobA' in jobBprime
As well as their ancestral lineage all the way to the first matriarch:
In [10]:
assert jobB.root is root
In [11]:
from pylada.crystal.binary import zinc_blende
from dummy import functional
jobA.functional = functional
jobA.params['structure'] = zinc_blende()
jobA.params['value'] = 5
In the above, the function functional from the dummy module created previously is imported into the namespace. The special attribute job.functional is set to functional. Two arguments, structure and value, are specified by adding the to the dictionary job.params. Please note that the third line does not contain parenthesis: this is not a function call, it merely saves a reference to the function with the object of calling it later. 'C' aficionados should think a saving a pointer to a function.
Warning: The reference to functional is deepcopied: the instance that is saved to jod-folder is not the one that was passed to it. On the other hand, the parameters (jobA.params) are held by reference rather than by value.
Tip: To force a job-folder to hold a functional by reference rather than by value, do:
jobA._functional = functional
The parameters in job.params should be pickleable so that the folder can be saved to disk later. Jobfolder.functional must be a
pickleable and callable. Setting Jobfolder.functional to
something else will immediately fail. In practice, this means it can be a
function or a callable class, as long as that function or class is imported from a module. It cannot be defined in __main__, e.g. the script that you run to create the job-folders. And that's why the dummy functional in this example is written to it's own dummy.py file.
That said, we can now execute each jobA by calling the function compute:
In [12]:
directory = "tmp/" + jobA.name[1:]
result = jobA.compute(outdir=directory)
assert result.success
Assuming that you the unix program tree, the following will show that an OUTCAR file was created in the right directory:
In [13]:
%%bash
[ ! -e tree ] || tree tmp/
Running the job-folder jobA is exactly equivalent to calling the functional directly:
functional(structure=zinc_blende(), value=5, outdir='tmp/jobA')
In practice, what we have done is created an interface where any program can be called in the same way. This will be extremly useful when launching many jobs simultaneously.
In [14]:
from pylada.jobfolder import JobFolder
from pylada.crystal.binary import zinc_blende
root = JobFolder()
structures = ['diamond', 'diamond/alloy', 'GaAs']
stuff = [0, 1, 2]
species = [('Si', 'Si'), ('Si', 'Ge'), ('Ga', 'As')]
for name, value, species in zip(structures, stuff, species):
job = root / name
job.functional = functional
job.params['value'] = value
job.params['structure'] = zinc_blende()
for atom, specie in zip(job.structure, species):
atom.type = specie
print(root)
We can now iterate over executable subfolders:
In [15]:
print(list(root.keys()))
Or subsets of executable folders:
In [16]:
for jobname, job in root['diamond'].items():
print("diamond/", jobname, " with ", len(job.params['structure']), " atoms")
In [17]:
from pylada.jobfolder import load, save
save(root, 'root.dict', overwrite=True) # saves to file
root = load('root.dict') # loads from file
print(root)
But pylada also provides an ipython interface for dealing with jobfolders. It is described elsewhere. The difference between the python and the ipython interfaces are a matter of convenience.