PyEmma Featurizer Support


In [1]:
from __future__ import print_function

import openpathsampling as paths
import numpy as np

In [2]:
#! lazy
import pyemma.coordinates as coor

In [3]:
#! lazy
ref_storage = paths.Storage('engine_store_test.nc', mode='r')


13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Open existing netCDF file 'engine_store_test.nc' for reading - reading from existing file
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover'] and instatiated with ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover']
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['samples', 'movepath'] and instatiated with ['samples', 'movepath']
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['simulation', 'mccycle', 'previous', 'active', 'change'] and instatiated with ['simulation', 'mccycle', 'previous', 'active', 'change']
13-12-16 15:15:27 openpathsampling.netcdfplus.util INFO     Ran load_indices in time 0.002424

In [4]:
#! lazy
storage = paths.Storage('delete.nc', 'w')
storage.trajectories.save(ref_storage.trajectories[0])


13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Create new netCDF file 'delete.nc' for writing - deleting existing file
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Setup netCDF file and create variables
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover'] and instatiated with ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover']
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['samples', 'movepath'] and instatiated with ['samples', 'movepath']
13-12-16 15:15:27 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['simulation', 'mccycle', 'previous', 'active', 'change'] and instatiated with ['simulation', 'mccycle', 'previous', 'active', 'change']
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'trajectories'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'topologies'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'cvs'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'snapshots'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'samples'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'samplesets'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'movechanges'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'steps'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'details'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'pathmovers'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'shootingpointselectors'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'engines'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'pathsimulators'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'transitions'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'networks'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'schemes'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'interfacesets'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'msouters'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'volumes'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'ensembles'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'tag'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Finished setting up netCDF file
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'snapshot0'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'snapshot0statics'
13-12-16 15:15:27 openpathsampling.netcdfplus.netcdfplus INFO     Initializing store 'snapshot0kinetics'
Out[4]:
221422574805173574613592029305145131028L

Import a PyEmma Coordinates Module

Using of pyemma featurizers or general other complex code requires a little trick to be storable. Since storing of code only works if we are not dependend on the context (scope) we need to wrap the construction of our featurizer in a function, that gets all it needs from the global scope as a parameter


In [5]:
def pyemma_generator(f):
    f.add_inverse_distances(f.pairs(f.select_Backbone()))

In [6]:
cv = paths.collectivevariable.PyEMMAFeaturizerCV(
    'pyemma', 
    pyemma_generator, 
    topology=ref_storage.snapshots[0].topology
).with_diskcache()

Now use this featurizer generating function to build a collective variable out of it. All we need for that is a name as usual, the generating function, the list of parameters - here only the topology and at best a test snapshot, a template.


In [7]:
cv(ref_storage.trajectories[0]);

Let's save it to the storage


In [8]:
#! lazy
print(storage.save(cv))


(store.cvs[CollectiveVariable] : 1 object(s), 2, 200774533649002155781043300304947249328L)

and apply the featurizer to a trajectory


In [9]:
cv(storage.trajectories[0]);

Sync to make sure the cache is written to the netCDF file.


In [10]:
cv(storage.snapshots.all());

In [11]:
py_cv = storage.cvs['pyemma']

In [12]:
store = storage.stores['cv%d' % storage.idx(py_cv)]
nc_var = store.variables['value']

In [13]:
assert(nc_var.shape[1] == 15)
print(nc_var.shape[1])


15

In [14]:
assert(nc_var.var_type == 'numpy.float32')
print(nc_var.var_type)


numpy.float32

In [15]:
#! ignore
print(storage.variables['attributes_json'][:])


[ u'{"_cls":"PyEMMAFeaturizerCV","_dict":{"topology":{"_store":"topologies","_hex_uuid":"0xa6946fccc13c11e68416000000000002L"},"name":"pyemma","featurizer":{"_marshal":"YwEAAAABAAAAAwAAAEMAAABzIAAAAHwAAGoAAHwAAGoBAHwAAGoCAIMAAIMBAIMBAAFkAABTKAEAAABOKAMAAAB0FQAAAGFkZF9pbnZlcnNlX2Rpc3RhbmNlc3QFAAAAcGFpcnN0DwAAAHNlbGVjdF9CYWNrYm9uZSgBAAAAdAEAAABmKAAAAAAoAAAAAHMeAAAAPGlweXRob24taW5wdXQtNS00Nzc0ZDhlZGRkMDA+dBAAAABweWVtbWFfZ2VuZXJhdG9yAQAAAHMCAAAAAAE=","_module_vars":[],"_global_vars":[]},"kwargs":{}}}']

In [16]:
py_cv_idx = storage.idx(py_cv)
print(py_cv_idx)
py_emma_feat = storage.vars['attributes_json'][py_cv_idx]


0

In [17]:
erg = py_emma_feat(storage.snapshots);

In [18]:
#! lazy
print(erg[:,2:4])


[[ 2.68972969  2.06547379]
 [ 2.66780734  2.04628825]
 [ 2.61396885  2.00944018]
 [ 2.61284137  2.00203228]
 [ 2.6829567   2.04546595]
 [ 2.73087001  2.06900215]
 [ 2.79744959  2.09469533]
 [ 2.79462361  2.09850192]
 [ 2.74107337  2.08592916]
 [ 2.75102925  2.09090185]]

In [19]:
storage.close()
ref_storage.close()

In [20]:
#! lazy
storage = paths.Storage('delete.nc', 'r')


13-12-16 15:15:28 openpathsampling.netcdfplus.netcdfplus INFO     Open existing netCDF file 'delete.nc' for reading - reading from existing file
13-12-16 15:15:28 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover'] and instatiated with ['replica', 'trajectory', 'ensemble', 'bias', 'parent', 'mover']
13-12-16 15:15:28 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['samples', 'movepath'] and instatiated with ['samples', 'movepath']
13-12-16 15:15:28 openpathsampling.netcdfplus.stores.variable INFO     Creates VariableStore with variables ['simulation', 'mccycle', 'previous', 'active', 'change'] and instatiated with ['simulation', 'mccycle', 'previous', 'active', 'change']
13-12-16 15:15:28 openpathsampling.netcdfplus.util INFO     Ran load_indices in time 0.001526

In [21]:
cv = storage.cvs[0]

Make sure that we get the same result


In [22]:
assert np.allclose(erg, cv(storage.snapshots))

In [23]:
storage.close()

In [ ]: