--- title: "Continuous-Time Markov Model ctmc_fit" date: 2018-09-02T19:00:00+02:00 draft: false tags: ["ctmc_fit", "ctmc_fit2", "ctmc_datacheck", "overgang", "Continuous-Time Markov Model"] author: "Ulf Hamster" ---

Load modules


In [1]:
import pprint
pp = pprint.PrettyPrinter(indent=4)

import pandas as pd
import numpy as np
from datetime import datetime
import oxyba as ox
import overgang as og

%load_ext autoreload
%autoreload 2
%reload_ext autoreload

Data Preprocessing

Import demo data


In [2]:
df = pd.read_csv("demo-data.csv", delimiter=',')
datatable = np.array(df)

Transform data


In [3]:
datatable[:,1] = [datetime.strptime(d,'%Y-%m-%d') for d in datatable[:,1]]

map_to_encodings


In [4]:
mapping = [['AAA'], ['AA+', 'AA', 'AA-'], ['A+', 'A', 'A-'], 
          ['BBB+', 'BBB', 'BBB-'], ['BB+', 'BB', 'BB-'], 
          ['B+', 'B', 'B-'], ['CCC+', 'CCC', 'CCC-', 'CC', 'C'], 
          ['DDD', 'DD', 'D', 'RD']]

datatable[:,2] = ox.mapencode(datatable[:,2], mapping, nastate=True)

numstates = len(mapping) + 1

Transform tabular data to list of lists


In [5]:
#datalist = og.table_transform(datatable, lastdate=datetime(2018,1,1))
datalist = og.table_transform(datatable)

The datalist object as the following strucutre


In [6]:
pp.pprint(datalist[20:22])


[   ([5, 6, 8], [2.208219178082192, 2.441095890410959, 5.112021857923497]),
    ([4, 5, 4], [1.0573770491803278, 3.3534246575342466, 0.6693989071038251])]

Estimate Markov Model

ctmc_fit with debug=True will throw an exception if something is wrong. With debug=False (Default) ctmc_fit is very fast but might crash at a later or generate bogus results.


In [7]:
try:
    transmat, genmat, transcount, statetime = og.ctmc_fit(
        datalist, numstates, 1.0, toltime=1e-8, debug=True)
except Exception as e:
    print(e)


The example id=40 has a state[2] that have not been active for longer than toltime

ctmc_fit2 will just send a warning message and will try to autocorrect and proceed. Obviously, this kind of implementation invites for sluggish data preprocessing but is a painless way to get quick results for a small data set.


In [8]:
transmat, genmat, transcount, statetime = og.ctmc_fit2(datalist, numstates)


/Users/uh/kmedian/overgang/overgang/ctmc_fit2.py:25: UserWarning: The example id=40 has a state that have not been active for longer than toltime.
  "toltime.").format(exid))

At least we got a result with ctmc_fit2


In [9]:
print("Transition Matrix\n", transmat.round(2))


Transition Matrix
 [[0.96 0.04 0.   0.   0.   0.   0.   0.   0.  ]
 [0.04 0.9  0.06 0.   0.   0.   0.   0.   0.  ]
 [0.   0.03 0.89 0.08 0.   0.   0.   0.   0.  ]
 [0.   0.   0.06 0.86 0.06 0.01 0.   0.   0.  ]
 [0.   0.   0.   0.11 0.8  0.08 0.   0.   0.  ]
 [0.   0.   0.   0.01 0.1  0.79 0.05 0.01 0.04]
 [0.   0.   0.   0.   0.01 0.21 0.53 0.17 0.07]
 [0.   0.   0.   0.   0.03 0.38 0.06 0.52 0.01]
 [0.   0.   0.   0.   0.   0.03 0.   0.   0.97]]

Go Back to Pre-Processing

ctmc_fit with debug=True calls internally ctmc_datacheck at the beginning. ctmc_datacheck iterates over datalist and checks all kinds of error causes and throws immediatly an exception.


In [10]:
try:
    og.ctmc_datacheck(datalist, numstates, toltime=1e-8)
except Exception as e:
    print(e)


The example id=40 has a state[2] that have not been active for longer than toltime

In [11]:
datalist[40]


Out[11]:
([3, 4, 8, 5],
 [1.8164383561643835, 0.1178082191780822, 0.0, 1.3415300546448088])