Here, we give an example script, showing the effect of Delay Embedding on a Brownian motion on the Muller-Brown potential, projeted onto its y-axis. This script may take a long time to run, as considerable data is required to accurately reconstruct the hidden degrees of freedom.
In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pyedgar
from pyedgar.data_manipulation import tlist_to_flat, flat_to_tlist, delay_embed, lift_function
%matplotlib inline
We first load in the pre-sampled data. The data consists of 400 short trajectories, each with 30 datapoints. The precise sampling procedure is described in "Galerkin Approximation of Dynamical Quantities using Trajectory Data" by Thiede et al. Note that this is a smaller dataset than in the paper. We use a smallar dataset to ensure the diffusion map basis construction runs in a reasonably short time.
In [2]:
ntraj = 700
trajectory_length = 40
lag_values = np.arange(1, 37, 2)
embedding_values = lag_values[1:] - 1
In [3]:
trajs_2d = np.load('data/muller_brown_trajs.npy')[:ntraj, :trajectory_length] # Raw trajectory
trajs = trajs_2d[:, :, 1] # Only keep y coordinate
stateA = (trajs > 1.15).astype('float')
stateB = (trajs < 0.15).astype('float')
# Convert to list of trajectories format
trajs = [traj_i.reshape(-1, 1) for traj_i in trajs]
stateA = [A_i for A_i in stateA]
stateB = [B_i for B_i in stateB]
# Load the true results
true_mfpt = np.load('data/htAB_1_0_0_1.npy')
We also convert the data into the flattened format. This converts the data into a 2D array, which allows the data to be passed into many ML packages that require a two-dimensional dataset. In particular, this is the format accepted by the Diffusion Atlas object. Trajectory start/stop points are then stored in the traj_edges array.
In [4]:
flattened_trajs, traj_edges = tlist_to_flat(trajs)
flattened_stateA = np.hstack(stateA)
flattened_stateB = np.hstack(stateB)
print("Flattened Shapes are: ", flattened_trajs.shape, flattened_stateA.shape, flattened_stateB.shape,)
In [5]:
# Build the basis set
diff_atlas = pyedgar.basis.DiffusionAtlas.from_sklearn(alpha=0, k=500, bandwidth_type='-1/d', epsilon='bgh_generous')
diff_atlas.fit(flattened_trajs)
flat_basis = diff_atlas.make_dirichlet_basis(200, in_domain=(1. - flattened_stateA))
basis = flat_to_tlist(flat_basis, traj_edges)
flat_basis_no_boundaries = diff_atlas.make_dirichlet_basis(200)
basis_no_boundaries = flat_to_tlist(flat_basis_no_boundaries, traj_edges)
In [6]:
# Perform DGA calculation
mfpt_BA_lags = []
for lag in lag_values:
mfpt = pyedgar.galerkin.compute_mfpt(basis, stateA, lag=lag)
pi = pyedgar.galerkin.compute_change_of_measure(basis_no_boundaries, lag=lag)
flat_pi = np.array(pi).ravel()
flat_mfpt = np.array(mfpt).ravel()
mfpt_BA = np.mean(flat_mfpt * flat_pi * np.array(stateB).ravel()) / np.mean(flat_pi * np.array(stateB).ravel())
mfpt_BA_lags.append(mfpt_BA)
In [7]:
mfpt_BA_embeddings = []
for lag in embedding_values:
# Perform delay embedding
debbed_traj = delay_embed(trajs, n_embed=lag)
lifted_A = lift_function(stateA, n_embed=lag)
lifted_B = lift_function(stateB, n_embed=lag)
flat_debbed_traj, embed_edges = tlist_to_flat(debbed_traj)
flat_lifted_A = np.hstack(lifted_A)
# Build the basis
diff_atlas = pyedgar.basis.DiffusionAtlas.from_sklearn(alpha=0, k=500, bandwidth_type='-1/d',
epsilon='bgh_generous', neighbor_params={'algorithm':'brute'})
diff_atlas.fit(flat_debbed_traj)
flat_deb_basis = diff_atlas.make_dirichlet_basis(200, in_domain=(1. - flat_lifted_A))
deb_basis = flat_to_tlist(flat_deb_basis, embed_edges)
flat_pi_basis = diff_atlas.make_dirichlet_basis(200)
pi_basis = flat_to_tlist(flat_deb_basis, embed_edges)
# Construct the Estimate
deb_mfpt = pyedgar.galerkin.compute_mfpt(deb_basis, lifted_A, lag=1)
pi = pyedgar.galerkin.compute_change_of_measure(pi_basis)
flat_pi = np.array(pi).ravel()
flat_mfpt = np.array(deb_mfpt).ravel()
deb_mfpt_BA = np.mean(flat_mfpt * flat_pi * np.array(lifted_B).ravel()) / np.mean(flat_pi * np.array(lifted_B).ravel())
mfpt_BA_embeddings.append(deb_mfpt_BA)
We plot the results of our calculation, against the true value (black line, with the standard deviation in stateB given by the dotted lines). We see that increasing the lag time causes the mean-first-passage time to grow unboundedly. In contrast, with delay embedding the mean-first-passage time converges. We do, however, see one bad fluction at a delay length of 16, and that as the the delay length gets sufficiently long, the calculation blows up.
In [12]:
plt.plot(embedding_values, mfpt_BA_embeddings, label="Delay Embedding")
plt.plot(lag_values, mfpt_BA_lags, label="Lags")
plt.axhline(true_mfpt[0] * 10, color='k', label='True')
plt.axhline((true_mfpt[0] + true_mfpt[1]) * 10., color='k', linestyle=':')
plt.axhline((true_mfpt[0] - true_mfpt[1]) * 10., color='k', linestyle=':')
plt.legend()
plt.ylim(0, 100)
plt.xlabel("Lag / Delay Length")
plt.ylabel("Estimated MFPT")
Out[12]:
In [ ]: