Select Snapshots

This notebook will take an old (and large) TPS simulation file, select some snapshots to use as input data.

Note: this first version is quick and dirty. There might be some points to consider to select better snapshots. But this is just intended to get initial data to our colleagues.


In [1]:
import openpathsampling as paths
import random

In [2]:
%%time
storage = paths.Storage("alanine_dipeptide_tps.nc", "r")


CPU times: user 20.7 s, sys: 6.48 s, total: 27.2 s
Wall time: 2min 10s

In [3]:
print storage.file_size_str


17.95GB

In [4]:
n_snapshots = len(storage.snapshots)
print n_snapshots


946650

In [5]:
stateA = storage.volumes['C_7eq']
stateB = storage.volumes['alpha_R']

Now we do the main calculation: every snapshot must not be in a state, and we never re-use a snapshot. (In other words, randomly chosen without replacement.)

In addition, OPS snapshots are always listed in pairs, with velocities reversed. (The data is only stored once, but both can be accessed directly from the snapshot storage.) Because of this, we'll make sure we only take the even-numbered snapshots.


In [6]:
%%time
snapshots = []
while len(snapshots) < 1000:
    random_choice = random.randint(0, (n_snapshots/2)-1)
    snap = storage.snapshots[random_choice*2]
    if not stateA(snap) and not stateB(snap) and snap not in snapshots:
        snapshots.append(snap)


CPU times: user 16.4 s, sys: 452 ms, total: 16.8 s
Wall time: 1min 1s

In [7]:
new_store = paths.Storage("snapshots.nc", "w")

In [8]:
new_store.save(snapshots);

In [9]:
# save the old engine because we'll re-use its topology later
new_store.save(storage.engines[0]);

In [10]:
new_store.sync()
new_store.close()