So we want to estimate $P(d,x)$
Experiment: separate randomly the data (participants) into those getting treatment $A$ & those getting treatment $B$.
Odds: if $p$ is a probability, $h = \displaystyle \frac{p}{1-p}$ is the odds $\in (0,+\infty)$
Define log odds as $\mu=\log h$ and $\mu \in (-\infty, +\infty)$
graph of log h
Data $Y_i$ is assumed: $Y_i \sim Bern(p(d_i,x_i))$
To estimate $p(d,x)$ we will look for an estimator of $\mu(d,x)$ first.
We are interested in the so called "odds ratio":
$$R = \frac{h \text{ for A}}{h \text{ for B}} = \frac{\displaystyle e^{\mu(A,x)}}{\displaystyle e^{\mu(B,x)}} = e^{\mu(A,x) - \mu(B,x)} = e^{\beta_1}$$
This shows why we want to estimate $\beta_1$.
It turns out that it is possible to define aan MLE for $\hat{\beta}=(\hat{\beta}_0, \hat{\beta}_1, \hat{\beta}_2)$ based on the data $(x_i, Y_i)$ such that $Y - \hat{P}$ is perpendicular to the data.
Note: $\hat{P} = \text{predicrted value of }p\text{ according to } \hat{\beta}$
$$\hat{\mu}(d,x) = \hat{\beta}_0 + \hat{\beta}_1d_A + \hat{\beta}_2x$$
Compute $p$ from $\mu$:
$\mu = \log h \displaystyle \Longrightarrow h = e^{\mu}$
$h = \displaystyle \frac{p}{1-p} \displaystyle \Longrightarrow p = \displaystyle \frac{h}{1+h}$
$$\Longrightarrow \hat{p} (d,x) = \frac{\displaystyle e^{\hat{\mu} (d,x)}}{1 + \displaystyle e^{\hat{\mu} (d,x)}}$$
Of course, as we said, our estimated odds ratio is $\hat{R} = e^{\beta_1}$
Question: why not using a model like $$\mu(d,x) = \beta_0 + \beta_1 d_A +\beta_2 d_B + \beta_3 x$$
In [ ]: