In [1]:
import msmbuilder

Connections to surface reconstruction:

  • A problem that arises frequently in graphics is to estimate curves or surfaces given point clouds (e.g. in 3D scanning)
  • Crust algorithm:
    • Let $S$ be the set of sample points
    • Compute the Voronoi diagram $\text{Vor}(S)$ and let $V$ be its set of Voronoi vertices.
    • Compute the Delaunay triangulation $\text{Del}(S \cup V)$.
    • The curve $P$ is composed of the edges of $\text{Del}(S \cup V)$ with endpoints in $S$.
  • Smooth curve shortening: $$ \frac{\partial C}{\partial t} = \frac{\partial^2 C}{\partial s}$$ where $C(s,t)$ defines a geometric flow a curve at time $t$ parametrized by arclength $s$ (generalized from $C(s) = (x(s),y(s))$ to multiple time points)
  • Midpoint transformation: a discrete curve defined by the points $(v_1,\dots,v_n)$ can be aggregated to remove noise by replacing vertices $v_i$ and $v_{i+1}$ with their average (i.e. the midpoint of the segment $v_iv_{i+1}$) $$ \frac{\partial v_i}{\partial t} = \frac{\partial^2 v_i}{\partial x^2}$$

Is anything here applicable to the string method?

  • Analogy between motion planning and minimum-energy path finding

Connection to motion planning:

  • Protein conformations are elements of a configuration space $\mathcal{C}$, the set of all possible configurations of its atoms (a.k.a. state space, moduli space)
  • In motion planning, we want to find a path through configuration space from one point to another given some constraints (e.g. avoid obstacles, or we might associate a cost $L(c)$ with each intermediate configuration $c$ and want to minimize $\sum_c L(c)$ (e.g. free energy barriers)
  • Cell decomposition - partition configuration space into a finite number of cells and find paths between cells
  • Probabilistic roadmap algorithms for the protein folding problem, sampling-based motion planning - randomly generate samples, query if they are in $\mathcal{C}$, connect local points into a graph structure

Coarse-graining / enhanced sampling notes:

For my sanity, I need to make an "MCMC Zoo" similar to Scott Aaronson's "Complexity Zoo"

Goose-chase notes:

MSM-learning notes

  • Nonparametric estimation of a a Markov kernel using kernel density estimation:
  • Simultaneous learning of state-decompositions and transition-matrices?
  • How would you sample over transition operators?
    • Simultaneously learn a low-dimensional representation and a transition operator that reproduces the observed dynamics?
  • "Efficient Bayesian estimation of Markov model transition matrices with given stationary distribution"
    • Before reading: what I might do differently:
      • Incorporate geometric information about states
        • Direct pairwise RMSD -- likely noisy, but a good baseline
        • Cost of min-cost non-direct path between states -- estimating a "geodesic" in discretized conformation space. Smaller distances are more reliable
          • Efficient inference about Markov processes: http://projecteuclid.org/euclid.aoms/1177707039
          • Contains information on properties of stochastic matrices
          • READ CHAPTER 5 OF THE MSMs BOOK: contains 5 algorithms for Bayesian sampling of transition matrices (containing efficient Markov kernels for transition matrices that are reversible), plus some test problems
          • Variational Bayes?
          • http://www.gatsby.ucl.ac.uk/vbayes/
  • Stochastic neighbor compression: http://jmlr.org/proceedings/papers/v32/kusner14.pdf
    • Huge improvement in speed and accuracy over KNN (useful for state decomposition?)
  • Nonlinear metric learning: http://www1.cse.wustl.edu/~kilian/papers/chilmnn.pdf

    - Also useful for state decomposition? Learn a better predictor of kinetic distance than geometry. Also compare with Robert McGibbon's work on this

String Method

String method for the study of rare events (Weinan, Ren, and Vanden-Eijnden, 2002, Phys. Rev. B)

  • plus two follow-up papers
  • plus Science paper on melting

Project proposal: Reduced-rank projected Markov models

  • Problem: discrete state decomposition
  • Drawing on three papers:

Project proposal: Information-Geometric Sampling Methods for Transition Matrices

  • When the number of states is large, posterior inference becomes intractable

State of the Art

  • Taken from Chapter 5 of Intro to MSMs

-


In [ ]: