Dynamic Discrete Choice Models

In a dynamic discrete choice (DDC) model an agent chooses a series of actions according to its preferences, cognitive biases and beliefs about the future.

A common assumption for DDC models is the random utility assumption. Under the random utility assusmption, an agent chooses the action that maximizes their present discounted utility (in other words they have rational expectations about the future and their choices are free from cognitive bias). This means that the agent being analyzed by the researcher is behaving according to a Markov Decision Process.

In a Markov Decision Process (MDP), an agent picks a policy (a function mapping states to actions) that maximizes their present discounted utility. A MDP is a tuple, $(S, A, T, U, \beta)$, where

$S$ is the state space
$A$ is the action space
$T$ is a transition distribution (a function that takes a state and an action returns a probability distribution of states for the next period)
$U$ is function that maps state action pairs to utils
$\beta$ is the discount rate

DDC models typically partition the state into two pieces: those observed by the researcher and those that are not.

$x$: tuple of agent state variables observed by the researcher
$\epsilon$: tuple of agent state variables unobserved by the researcher

DDC models also make a number of assumptions about the structure of the MDP in order to make estimation simple. Some common assumptions are:

Additive seperability: The utility function is given by $\mu(s, a) = u(x, a) + \epsilon$
Conditional independence: The transition function can be decomposed into $T(s' | s, a) = P_x(x' | x, a)P_\epsilon(\epsilon' | \epsilon, a)$
IID unobserved state variable: $\epsilon$ is IID
Conditional Logit: $\epsilon$ is Gumbel distributed

One of the first methods to estimate the parameters of the utility function in a DDC given a series of state and action observations is the NFXP method.

NFXP

The NFXP uses the Additive seperability, conditional independence, IID unobserved state variable and conditional logit assumptions to greatly simplify estimation. An outline of how the assumptions simplify the value function is shown below.

$$ \begin{aligned} V(x,\epsilon)&=\max_{d\in D}\mu(x,d,\epsilon)+\beta E_{x',\epsilon'}[V(x',\epsilon'|x,\epsilon)]\\ &=\max_{d\in D}u(x,d)+\epsilon+\beta E_{x',\epsilon'}[V(x',\epsilon'|x,\epsilon)]\quad\text{AS}\\ &=\max_{d\in D}u(x,d)+\epsilon+\beta\int_{x'}\int_{\epsilon'}V(x',\epsilon')p(x',\epsilon'|x,\epsilon,d)d\epsilon'dx'\\ &=\max_{d\in D}u(x,d)+\epsilon+\beta\int_{x'}\int_{\epsilon'}V(x',\epsilon')p(\epsilon'|\epsilon,d)q(x'|x,d)d\epsilon'dx'\quad\text{CI}\\ &=\max_{d\in D}u(x,d)+\epsilon+\beta\int_{x'}\int_{\epsilon'}V(x',\epsilon')p(\epsilon_{1})\cdots p(\epsilon_{|\epsilon|})d\epsilon'q(x'|x,d)dx'\quad\text{IID}\\ &=\max_{d\in D}u(x,d)+\epsilon+\beta\int_{x'}\overline{V(x')}q(x'|x,d)dx' \end{aligned} $$

When the observed state variables, x, are discrete the integral is replaced with a summation

$$ \begin{aligned} \overline{V(x)}&= \int_{\epsilon}\left(\max_{d\in D} u(x,d) + \epsilon(d) + \beta\sum_{x'}\overline{V(x')} q(x' | x, d)\right)dP(\epsilon) \notag\\ &=\log\left[\sum_{d=0}^{D}\exp\left\{u(x,d)+\beta\sum_{x'}\overline{V(x')}q(x'|x,d)\right\} \right]\quad\text{Gumbel Distribution} \end{aligned} $$

The probability of choosing an action given the states observed by the researcher is

$$ P(d|x_{t},\theta)=\frac{\exp\{v(d,x_{t}\}}{\sum_{j=0}^{J}\exp\{v(j,x_{t})\}} $$

This bears a strong resemblance to the choice proability in logistic regression. The likelihood is simply the probability density of the first observation times the transition probabilities for the whole series times the choice probabilities. Logged this gives

$$ l_{i}(\theta)=\sum_{t=1}^{T}\log P(d=d_{it}|x_{it},\theta)+\left(\sum_{t=1}^{T-1}\log q(x_{it+1}|x_{it},d_{it},\theta)\right)+\log Pr(x_{i1}|\theta) $$

Interfaces