Four state mapping onto bestmsm

The goal of this script is to map between the results obtained using the Fourstate class and those obtained with the BestMSM package. In particular we care about the estimation of fluxes and paths for the four state model. First we create the instances of the Fourstate and BestMSM.MSM classes.


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import fourstate
import itertools
import networkx as nx
import numpy as np
bhs = fourstate.FourState()


 count matrix:
[[468324    832    470      0]
 [   832 120096    350     52]
 [   470    350  92721   1831]
 [     0     52   1831 636804]]

 transition matrix:
[[  9.97227581e-01   6.85733125e-03   4.92807113e-03   0.00000000e+00]
 [  1.77162253e-03   9.89829391e-01   3.66984020e-03   8.14170321e-05]
 [  1.00079638e-03   2.88469463e-03   9.72203582e-01   2.86681896e-03]
 [  0.00000000e+00   4.28583203e-04   1.91985069e-02   9.97051764e-01]]

 rate:
[[ -2.77241890e-03   6.85733125e-03   4.92807113e-03   0.00000000e+00]
 [  1.77162253e-03  -1.01706091e-02   3.66984020e-03   8.14170321e-05]
 [  1.00079638e-03   2.88469463e-03  -2.77964182e-02   2.86681896e-03]
 [  0.00000000e+00   4.28583203e-04   1.91985069e-02  -2.94823599e-03]]

 equilibrium probabilities:
[ 0.3544307   0.09156877  0.07197805  0.48202247]

 eigenvalues:
[  6.57711856e-19  -1.89528180e-03  -1.13818135e-02  -3.04105869e-02]

This much we knew already. Now we just have to produce an instance of the BestMSM.MSM class that has the same count and transition matrices.


In [2]:
import bestmsm.msm as msm
bhsmsm = msm.MSM(keys = range(4))
bhsmsm.count = bhs.count
bhsmsm.keep_states = range(4)
bhsmsm.keep_keys = range(4)
bhsmsm.trans = bhs.trans
bhsmsm.rate = bhs.K
bhsmsm.tauK, bhsmsm.peqK = bhsmsm.calc_eigsK()

So now we already have a transition matrix, eigenvalues and eigenvectors which are the same for the Fourstate and BestMSM.MSM class. The important bits start next, with the flux and pfold estimations.


In [3]:
bhs.run_commit()
bhsmsm.do_pfold(UU=[0], FF=[3])
print " These are the flux matrices"
print bhs.J
print bhsmsm.J
print " And these are the total flux values"
print "    from Fourstate: %g"%bhs.sum_flux
print "    from BestMSM.MSM: %g"%bhs.sum_flux


 pfold values:
[ 0.          0.24729852  0.72333251  1.        ]

 flux :
[[ -0.00000000e+00  -1.55283051e-04  -2.56575420e-04  -0.00000000e+00]
 [  1.55283051e-04  -0.00000000e+00  -1.25743403e-04  -2.95396480e-05]
 [  2.56575420e-04   1.25743403e-04  -0.00000000e+00  -3.82318823e-04]
 [  0.00000000e+00   2.95396480e-05   3.82318823e-04  -0.00000000e+00]]

 reactive flux: 0.000411858
   definitely FF and UU states [0, 3]
 These are the flux matrices
[[ -0.00000000e+00  -1.55283051e-04  -2.56575420e-04  -0.00000000e+00]
 [  1.55283051e-04  -0.00000000e+00  -1.25743403e-04  -2.95396480e-05]
 [  2.56575420e-04   1.25743403e-04  -0.00000000e+00  -3.82318823e-04]
 [  0.00000000e+00   2.95396480e-05   3.82318823e-04  -0.00000000e+00]]
[[ -0.00000000e+00  -1.55283051e-04  -2.56575420e-04  -0.00000000e+00]
 [  1.55283051e-04  -0.00000000e+00  -1.25743403e-04  -2.95396480e-05]
 [  2.56575420e-04   1.25743403e-04  -0.00000000e+00  -3.82318823e-04]
 [  0.00000000e+00   2.95396480e-05   3.82318823e-04  -0.00000000e+00]]
 And these are the total flux values
    from Fourstate: 0.000411858
    from BestMSM.MSM: 0.000411858

It really looks like we have got the same exact thing, as intended. Next comes the generation of paths. This implies defining a function in the case of the Fourstate object.


In [4]:
def gen_path_lengths(keys, J, pfold, flux, FF, UU):
    nkeys = len(keys)
    I = [x for x in range(nkeys) if x not in FF+UU]
    Jnode = []
    # calculate flux going through nodes
    for i in range(nkeys):
        Jnode.append(np.sum([J[i,x] for x in range(nkeys) \
                             if pfold[x] < pfold[i]]))
    # define matrix with edge lengths
    Jpath = np.zeros((nkeys, nkeys), float)
    for i in UU:
        for j in I + FF:
            if J[j,i] > 0:
                Jpath[j,i] = np.log(flux/J[j,i]) + 1 # I add 1
    for i in I:
        for j in [x for x in FF+I if pfold[x] > pfold[i]]:
            if J[j,i] > 0:
                Jpath[j,i] = np.log(Jnode[j]/J[j,i]) + 1 # I add 1
    return Jnode, Jpath

Jnode, Jpath = gen_path_lengths(range(4), bhs.J, bhs.pfold, \
                                bhs.sum_flux, [3], [0])
JpathG = nx.DiGraph(Jpath.transpose())

tot_flux = 0
for path in nx.all_simple_paths(JpathG, 0, 3):
    print path
    f = bhs.J[path[1],path[0]]
    print "%2i -> %2i: %10.4e "%(path[0], path[1], \
                bhs.J[path[1],path[0]])
    for i in range(2, len(path)):
        print "%2i -> %2i: %10.4e %10.4e"%(path[i-1], path[i], \
                bhs.J[path[i],path[i-1]], Jnode[path[i-1]])
        f *= bhs.J[path[i],path[i-1]]/Jnode[path[i-1]]
    tot_flux +=f
    print "  J(path) = %10.4e"%f
    print
print " Commulative flux: %10.4e"%tot_flux


[0, 1, 2, 3]
 0 ->  1: 1.5528e-04 
 1 ->  2: 1.2574e-04 1.5528e-04
 2 ->  3: 3.8232e-04 3.8232e-04
  J(path) = 1.2574e-04

[0, 1, 3]
 0 ->  1: 1.5528e-04 
 1 ->  3: 2.9540e-05 1.5528e-04
  J(path) = 2.9540e-05

[0, 2, 3]
 0 ->  2: 2.5658e-04 
 2 ->  3: 3.8232e-04 3.8232e-04
  J(path) = 2.5658e-04

 Commulative flux: 4.1186e-04

For BestMSM.MSM everything should be built in. First we obtain the 4 highest flux paths.


In [5]:
bhsmsm.do_dijkstra(UU=[0], FF=[3], npath=3)


 Total flux 0.000411858470592
 path  [0, 2, 3]
   flux: 2.57e-04; ratio: 6.23e-01; leftover: 3.77e-01
 path  [0, 1, 2, 3]
   flux: 1.26e-04; ratio: 3.05e-01; leftover: 7.17e-02
 path  [0, 1, 3]
   flux: 2.95e-05; ratio: 7.17e-02; leftover: -8.80e-16
 No paths
 Flux from paths -8.80229805407e-16
printing

 Writing graph in dot format...
Out[5]:
[(((3, 0), [0, 2, 3], 1.5476820153502904), 0.00025657541958512564),
 (((3, 0), [0, 1, 2, 3], 1.2110062410582523), 0.00012574340295749901),
 (((3, 0), [0, 1, 3], 0.9999999999999835), 2.9539648049723218e-05)]

Then we use the alternative mechanism of giving a cutoff for the flux left. Here we want to account for 80% of the flux.


In [11]:
bhsmsm.do_pfold(UU=[0], FF=[3])
bhsmsm.do_dijkstra(UU=[0], FF=[3], cut=0.2)


   definitely FF and UU states [0, 3]
 Total flux 0.000411858470592
 path  [0, 2, 3]
   flux: 2.57e-04; ratio: 6.23e-01; leftover: 3.77e-01
 path  [0, 1, 2, 3]
   flux: 1.26e-04; ratio: 3.05e-01; leftover: 7.17e-02
 Flux from paths 0.0717228129538
printing

 Writing graph in dot format...
Out[11]:
[(((3, 0), [0, 2, 3], 1.5476820153502904), 0.00025657541958512564),
 (((3, 0), [0, 1, 2, 3], 1.2110062410582523), 0.00012574340295749901)]

So at least for this simple example, things are working perfectly fine.