Here is the collapsed Gibbs sampler for Chen et al.'s generalized relational topic models with data augmentation. I am building on the collapsed Gibbs sampler I wrote for binary logistic supervised latent Dirichlet allocation.
The generative model for RTMs is as follows:
$$\begin{align} \theta^{(d)} &\sim \dir(\alpha) &\text{(topic distribution for document $d \in \{1, \ldots, D\}$)} \\ \phi^{(k)} &\sim \dir(\beta) &\text{(term distribution for topic $k \in \{1, \ldots, K\}$)} \\ z_n^{(d)} \mid \theta^{(d)} &\sim \dis \left( \theta^{(d)} \right) &\text{(topic of $n$th token of document $d$, $n \in \{1, \ldots, N^{(d)}\}$)} \\ w_n^{(d)} \mid \phi^{(z_n^{(d)})} &\sim \dis \left( \phi^{(z_n^{(d)})} \right) &\text{(term of $n$th token of document $d$, $n \in \{1, \ldots, N^{(d)}\}$)} \\ \Eta_{k, k'} &\sim \normal \left( \mu, \nu^2 \right) &\text{(regression coefficients for topic pairs $k, k' \in \{1, \ldots, K\}$)} \\ y^{(d, d')} \mid \Eta, \etd{d}, \etd{d'} &\sim \ber \left( \frac{ \exp \left( \eHe \right) }{ 1 + \exp \left( \eHe \right) } \right) &\text{(link indicator for documents $d, d' \in \{1, \ldots, D\}$)} \end{align}$$where each token can be any one of $V$ terms in our vocabulary, $\etd{d}$ is the empirical topic distribution of document $d$, and $\circ$ is the Hadamard (element-wise) product.
Plate notation for relational topic models.
This diagram should replace $\beta_k$ with $\phi^{(k)}$, and each $\phi^{(k)}$ should be dependent on a single $\beta$.
Following Chen et al. 2013, the regularized pseudo-likelihood for the link variable $y^{(d, d')}$, with regularization parameter $b \ge 0$, can be written
$$\begin{align} \cpsi{y^{(d, d')}}{\Eta, \etd{d}, \etd{d'}, b} &= \cp{y^{(d, d')}}{\Eta, \etd{d}, \etd{d'}}^b \\ &= \left( \frac{\exp \left( \eHe \right)^{y^{(d, d')}}}{ 1 + \exp \left( \eHe \right)} \right)^b \\ &= \frac{\exp \left( b y^{(d, d')} \eHe \right)} { \left( \exp \left( -\frac{\eHe}{2} \right) + \exp \left( \frac{\eHe}{2} \right) \right)^b \exp \left( \frac{b}{2} \eHe \right) } \\ &= 2^{-b} \exp \left( b \left( y^{(d, d')} - \frac{1}{2} \right) \left( \eHe \right) \right) \cosh \left( \frac{ \eHe }{2} \right)^{-b} \\ &= 2^{-b} \exp \left( b \left( y^{(d, d')} - \frac{1}{2} \right) \left( \eHe \right) \right) \int_0^\infty \exp \left( -\frac{ \left( \eHe \right)^2 }{2} \omega^{(d, d')} \right) \cp{\omega^{(d, d')}}{b, 0} d\omega^{(d, d')} \end{align}$$where $\omega^{(d, d')}$ is a Polya-Gamma distributed variable with parameters $b = b$ and $c = 0$ (see Polson et al. 2012 for details). This means that, for each pair of documents $d$ and $d'$, the pseudo-likelihood of $y^{(d, d')}$ is actually a mixture of Gaussians with respect to the Polya-Gamma distribution $\PG(b, 0)$. Therefore, the joint pseudo-likelihood of $y^{(d, d')}$ and $\omega^{(d, d')}$ can be written
$$\cPsi{y^{(d, d')}, \omega^{(d, d')}}{\Eta, \etd{d}, \etd{d'}, b} = 2^{-b} \exp \left( \kappa^{(d, d')} \zeta^{(d, d')} - \frac{ \omega^{(d, d')} }{2} (\zeta^{(d, d')})^2 \right) \cp{\omega^{(d, d')}}{b, 0}.$$where $\kappa^{(d, d')} = b(y^{(d, d')} - 1/2)$ and $\zeta^{(d, d')} = \eHe$. The joint probability distribution can therefore be factored as follows:
$$\begin{align} \cp{\theta, \phi, z, w, \Eta, y, \omega}{\alpha, \beta, \mu, \nu^2, b} &= \prod_{k=1}^{K} \cp{\phi^{(k)}}{\beta} \prod_{d=1}^{D} \cp{\theta^{(d)}}{\alpha} \prod_{n=1}^{N^{(d)}} \cp{z_n^{(d)}}{\theta^{(d)}} \cp{w_n^{(d)}}{\phi^{(z_n^{(d)})}} \\ & \quad \times \prod_{k_1=1}^{K} \prod_{k_2=1}^{K} \cp{\Eta_{k_1, k_2}}{\mu, \nu^2} \prod_{d_1=1}^D \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^D \cPsi{y^{(d_1, d_2)}, \omega^{(d_1, d_2)}}{\Eta, \etd{d_1}, \etd{d_2}, b} \\ &= \prod_{k=1}^{K} \frac{\Betaf(b^{(k)} + \beta)}{\Betaf(\beta)} \cp{\phi^{(k)}}{b^{(k)} + \beta} \prod_{d=1}^{D} \frac{\Betaf(a^{(d)} + \alpha)}{\Betaf(\alpha)} \cp{\theta^{(d)}}{a^{(d)} + \alpha} \\ &\quad \times \prod_{k_1=1}^{K} \prod_{k_2=1}^{K} \cN{\Eta_{k_1, k_2}}{\mu, \nu^2} \prod_{d_1=1}^D \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^D 2^{-b} \exp \left( \kappa^{(d_1, d_2)} \zeta^{(d_1, d_2)} - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) \cp{\omega^{(d_1, d_2)}}{b, 0} \end{align}$$where $a_k^{(d)}$ is the number of tokens in document $d$ assigned to topic $k$, $b_v^{(k)}$ is the number of tokens equal to term $v$ and assigned to topic $k$, and $\Betaf$ is the multivariate Beta function. Marginalizing out $\theta$ and $\phi$ by integrating with respect to each $\theta^{(d)}$ and $\phi^{(k)}$ over their respective sample spaces yields
$$\begin{align} \cp{z, w, \Eta, y, \omega}{\alpha, \beta, \mu, \nu^2, b} &= \prod_{k=1}^{K} \frac{\Betaf(b^{(k)} + \beta)}{\Betaf(\beta)} \prod_{d=1}^{D} \frac{\Betaf(a^{(d)} + \alpha)}{\Betaf(\alpha)} \\ &\quad\quad \times \prod_{k_1=1}^{K} \prod_{k_2=1}^{K} \cN{\Eta_{k_1, k_2}}{\mu, \nu^2} \prod_{d_1=1}^D \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^D 2^{-b} \exp \left( \kappa^{(d_1, d_2)} \zeta^{(d_1, d_2)} - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) \cp{\omega^{(d_1, d_2)}}{b, 0} \\ &= \cp{w}{z, \beta} \cp{z}{\alpha} \cp{\Eta}{\mu, \nu^2} \cPsi{y, \omega}{\Eta, z, b}. \end{align}$$See my LDA notebook for step-by-step details of the previous two calculations.
Our goal is to calculate the posterior distribution
$$\cp{z, \Eta, \omega}{w, y, \alpha, \beta, \mu, \nu^2, b} = \frac{\cp{z, w, \Eta, y, \omega}{\alpha, \beta, \mu, \nu^2, b}} {\sum_{z'} \iint \cp{z', w, \Eta', y, \omega{'}}{\alpha, \beta, \mu, \nu^2, b} d\Eta' d\omega{'}}$$in order to infer the topic assignments $z$ and regression coefficients $\Eta$ from the given term assignments $w$ and link data $y$. Since calculating this directly is infeasible, we resort to collapsed Gibbs sampling. The sampler is "collapsed" because we marginalized out $\theta$ and $\phi$, and will estimate them from the topic assignments $z$:
$$\hat\theta_k^{(d)} = \frac{a_k^{(d)} + \alpha_k}{\sum_{k'=1}^K \left(a_{k'}^{(d)} + \alpha_{k'} \right)},\quad \hat\phi_v^{(k)} = \frac{b_v^{(k)} + \beta_v}{\sum_{v'=1}^V \left(b_{v'}^{(k)} + \beta_{v'} \right)}.$$Gibbs sampling requires us to compute the full conditionals for each $z_n^{(d)}$, $\omega^{(d, d')}$ and $\Eta_{k, k'}$. For example, we need to calculate, for all $n$, $d$ and $k$,
$$\begin{align} \cp{z_n^{(d)} = k}{z \setminus z_n^{(d)}, w, H, y, \omega, \alpha, \beta, \mu, \nu^2, b} &\propto \cp{z_n^{(d)} = k, z \setminus z_n^{(d)}, w, H, y, \omega}{\alpha, \beta, \mu, \nu^2, b} \\ &\propto \frac{b_{w_n^{(d)}}^{(k)} \setminus z_n^{(d)} + \beta_{w_n^{(d)}}}{ \sum_{v=1}^V \left( b_v^{(k)} \setminus z_n^{(d)} + \beta_v\right)} \left( a_k^{(d)} \setminus z_n^{(d)} + \alpha_k \right) \prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \exp \left( \kappa^{(d_1, d_2)} \zeta^{(d_1, d_2)} - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) \\ &\propto \frac{b_{w_n^{(d)}}^{(k)} \setminus z_n^{(d)} + \beta_{w_n^{(d)}}}{ \sum_{v=1}^V \left( b_v^{(k)} \setminus z_n^{(d)} + \beta_v\right)} \left( a_k^{(d)} \setminus z_n^{(d)} + \alpha_k \right) \\ &\quad\quad\times \exp \left( \sum_{\substack{d_1=1 \\ d_1 \neq d}}^{D} \left[ \left( \kappa^{(d_1, d)} - \omega^{(d_1, d)} ( \zeta^{(d_1, d)} \setminus z_n^{(d)}) \right) \frac{H_{:, k}^T \etd{d_1}}{N^{(d)}} - \frac{ \omega^{(d_1, d)} }{2} \left( \frac{H_{:, k}^T \etd{d_1}}{N^{(d)}} \right)^2 \right] \right. \\ &\quad\quad\quad\quad + \left. \sum_{\substack{d_2=1 \\ d_2 \neq d}}^{D} \left[ \left( \kappa^{(d, d_2)} - \omega^{(d, d_2)} ( \zeta^{(d, d_2)} \setminus z_n^{(d)}) \right) \frac{H_{k, :} \etd{d_2}}{N^{(d)}} - \frac{ \omega^{(d, d_2)} }{2} \left( \frac{H_{k, :} \etd{d_2}}{N^{(d)}} \right)^2 \right] \right) \end{align}$$where the "set-minus" notation $\cdot \setminus z_n^{(d)}$ denotes the variable the notation is applied to with the entry $z_n^{(d)}$ removed (again, see my LDA notebook for details). This final proportionality is true since
$$\begin{align} \prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \exp \left( \kappa^{(d_1, d_2)} \zeta^{(d_1, d_2)} - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) &= \prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \exp \left( \kappa^{(d_1, d_2)} \left( \zeta^{(d_1, d_2)} \setminus z_n^{(d)} + \Delta_{d, d_1, d_2}^{(k)} \right) - \frac{ \omega^{(d_1, d_2)} }{2} \left( \zeta^{(d_1, d_2)} \setminus z_n^{(d)} + \Delta_{d, d_1, d_2}^{(k)} \right)^2 \right) \\ &\propto \prod_{\substack{d_1=1 \\ d_1 \neq d}}^{D} \exp \left( \kappa^{(d_1, d)} \left( \zeta^{(d_1, d)} \setminus z_n^{(d)} + \Delta_{d, d_1, d}^{(k)} \right) - \frac{ \omega^{(d_1, d)} }{2} \left( \zeta^{(d_1, d)} \setminus z_n^{(d)} + \Delta_{d, d_1, d}^{(k)} \right)^2 \right) \\ &\quad\quad\times \prod_{\substack{d_2=1 \\ d_2 \neq d}}^{D} \exp \left( \kappa^{(d, d_2)} \left( \zeta^{(d, d_2)} \setminus z_n^{(d)} + \Delta_{d, d, d_2}^{(k)} \right) - \frac{ \omega^{(d, d_2)} }{2} \left( \zeta^{(d, d_2)} \setminus z_n^{(d)} + \Delta_{d, d, d_2}^{(k)} \right)^2 \right) \\ &\propto \exp \left( \sum_{\substack{d_1=1 \\ d_1 \neq d}}^{D} \left[ \left( \kappa^{(d_1, d)} - \omega^{(d_1, d)} ( \zeta^{(d_1, d)} \setminus z_n^{(d)}) \right) \frac{H_{:, k}^T (\etd{d_1} \setminus z_n^{(d)})}{N^{(d)}} - \frac{ \omega^{(d_1, d)} }{2} \left( \frac{H_{:, k}^T (\etd{d_1} \setminus z_n^{(d)})}{N^{(d)}} \right)^2 \right] \right. \\ &\quad\quad + \left. \sum_{\substack{d_2=1 \\ d_2 \neq d}}^{D} \left[ \left( \kappa^{(d, d_2)} - \omega^{(d, d_2)} ( \zeta^{(d, d_2)} \setminus z_n^{(d)}) \right) \frac{H_{k, :} (\etd{d_2} \setminus z_n^{(d)})}{N^{(d)}} - \frac{ \omega^{(d, d_2)} }{2} \left( \frac{H_{k, :} (\etd{d_2} \setminus z_n^{(d)})}{N^{(d)}} \right)^2 \right] \right) \\ &= \exp \left( \sum_{\substack{d_1=1 \\ d_1 \neq d}}^{D} \left[ \left( \kappa^{(d_1, d)} - \omega^{(d_1, d)} ( \zeta^{(d_1, d)} \setminus z_n^{(d)}) \right) \frac{H_{:, k}^T \etd{d_1}}{N^{(d)}} - \frac{ \omega^{(d_1, d)} }{2} \left( \frac{H_{:, k}^T \etd{d_1}}{N^{(d)}} \right)^2 \right] \right. \\ &\quad\quad + \left. \sum_{\substack{d_2=1 \\ d_2 \neq d}}^{D} \left[ \left( \kappa^{(d, d_2)} - \omega^{(d, d_2)} ( \zeta^{(d, d_2)} \setminus z_n^{(d)}) \right) \frac{H_{k, :} \etd{d_2}}{N^{(d)}} - \frac{ \omega^{(d, d_2)} }{2} \left( \frac{H_{k, :} \etd{d_2}}{N^{(d)}} \right)^2 \right] \right) \end{align}$$where
$$\Delta_{d, d_1, d_2}^{(k)} = \delta_{d, d_1} \frac{H_{k, :} (\etd{d_2} \setminus z_n^{(d)})}{N^{(d)}} + \delta_{d, d_2} \frac{H_{:, k}^T (\etd{d_1} \setminus z_n^{(d)})}{N^{(d)}},$$$\delta_{d, d'}$ is the Kronecker delta, and $H_{k, :}$ and $H_{:, k}$ are the $k$th row and column of $H$, respectively. The first proportionality is a result of the fact that $\Delta_{d, d_1, d_2}^{(k)}$ is nonzero only when $d = d_1$ or $d = d_2$. The last equality follows from the fact that $d \neq d_1$ in the first summation and $d \neq d_2$ in the second.
In order to calculate the full conditional for $H$, let $\eta = (H_{:,1}^T \cdots H_{:, K}^T)^T$ be the vector of concatenated columns of $H$, $Z = (\etd{1, 1} \cdots \etd{D, D})$ be the matrix whose columns are the vectors $\etd{d, d'} = \etd{d'} \otimes \etd{d}$, where $\otimes$ is the Kronecker product, $\Omega = \diag(\omega^{(1,1)}, \ldots, \omega^{(D,D)})$ be the diagonal matrix whose diagonal entries are $\omega^{(d, d')}$, $I$ be the identity matrix, and $\one$ be the vector of ones, and note that
$$\prod_{k_1=1}^{K} \prod_{k_2=1}^{K} \cN{H_{k_1, k_2}}{\mu, \nu^2} = \cN{\eta}{\mu \one, \nu^2 I}$$$$\prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \exp \left( \kappa^{(d_1, d_2)} \zeta^{(d_1, d_2)} - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) = \exp \left( \eta^T Z \kappa - \frac{1}{2} \eta^T Z \Omega Z^T \eta \right) \propto \cN{\eta}{(Z \Omega Z^T)^{-1} Z \kappa, (Z \Omega Z^T)^{-1}}.$$Therefore
$$\begin{align} \cp{\eta}{z, w, y, \omega, \alpha, \beta, \mu, \nu^2, b} &\propto \cp{z, w, \eta, y, \omega}{\alpha, \beta, \mu, \nu^2, b} \\ &\propto \cN{\eta}{\mu \one, \nu^2 I} \cN{\eta}{(Z \Omega Z^T)^{-1} Z \kappa, (Z \Omega Z^T)^{-1}} \\ &\propto \cN{\eta}{\Sigma \left( \frac{\mu}{\nu^2} \one + Z \kappa \right), \Sigma} \end{align}$$where $\Sigma^{-1} = \nu^{-2} I + Z \Omega Z^T$ (see Section 8.1.8 of the Matrix Cookbook).
We also need to calculate the full conditional for $\omega$. We calculate
$$\begin{align} \cp{\omega}{z, w, H, y, \alpha, \beta, \mu, \nu^2, b} &\propto \cp{z, w, H, y, \omega}{\alpha, \beta, \mu, \nu^2, b} \\ &\propto \prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \exp \left( - \frac{ \omega^{(d_1, d_2)} }{2} (\zeta^{(d_1, d_2)})^2 \right) \cp{\omega^{(d_1, d_2)}}{b, 0} \\ &= \prod_{d_1=1}^{D} \prod_{\substack{d_2=1 \\ d_2 \neq d_1}}^{D} \cp{\omega^{(d_1, d_2)}}{b, \zeta^{(d_1, d_2)}} \end{align}$$that is, $\omega^{(d_1, d_2)} \sim \PG(b, \eHe)$ for each pair of documents $d_1$ and $d_2$. We sample from the Polya-Gamma distribution according to the method of Polson et al. 2012, implemented for Python 3 in this code repo.
In [22]:
%matplotlib inline
from modules.helpers import plot_images
from functools import partial
from sklearn.metrics import (roc_auc_score, roc_curve)
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
imshow = partial(plt.imshow, cmap='gray', interpolation='nearest', aspect='auto')
sns.set(style='white')
In [48]:
V = 25
K = 10
N = 100
D = 1000
In [49]:
topics = []
topic_base = np.concatenate((np.ones((1, 5)) * 0.2, np.zeros((4, 5))), axis=0).ravel()
for i in range(5):
topics.append(np.roll(topic_base, i * 5))
topic_base = np.concatenate((np.ones((5, 1)) * 0.2, np.zeros((5, 4))), axis=1).ravel()
for i in range(5):
topics.append(np.roll(topic_base, i))
topics = np.array(topics)
plt.figure(figsize=(10, 5))
plot_images(plt, topics, (5, 5), layout=(2, 5), figsize=(10, 5))
In [50]:
alpha = np.ones(K)
np.random.seed(42)
thetas = np.random.dirichlet(alpha, size=D)
topic_assignments = np.array([np.random.choice(range(K), size=100, p=theta)
for theta in thetas])
word_assignments = np.array([[np.random.choice(range(V), size=1, p=topics[topic_assignments[d, n]])[0]
for n in range(N)] for d in range(D)])
doc_term_matrix = np.array([np.histogram(word_assignments[d], bins=V, range=(0, V - 1))[0] for d in range(D)])
imshow(doc_term_matrix)
Out[50]:
In [51]:
from itertools import product
from sklearn.cross_validation import StratifiedKFold
In [52]:
# choose parameter values
mu = 0.
nu2 = 1.
np.random.seed(14)
H = np.random.normal(loc=mu, scale=nu2, size=(K, K))
zeta = pd.DataFrame([(i, j, np.dot(np.dot(thetas[i], H), thetas[j])) for i, j in product(range(D), repeat=2)],
columns=('tail', 'head', 'zeta'))
_ = zeta.zeta.hist(bins=50)
In [53]:
# choose parameter values
zeta['y'] = (zeta.zeta >= 0).astype(int)
# plot histogram of responses
print('positive examples {} ({:.1f}%)'.format(zeta.y.sum(), zeta.y.sum() / D / D * 100))
_ = zeta.y.hist()
In [54]:
y = zeta[['tail', 'head', 'y']].values
In [55]:
skf = StratifiedKFold(y[:, 2], n_folds=100)
_, train_idx = next(iter(skf))
train_idx.shape
Out[55]:
In [56]:
from slda.topic_models import GRTM
In [58]:
_K = 10
_alpha = alpha[:_K]
_beta = np.repeat(0.01, V)
_mu = mu
_nu2 = nu2
_b = 1.
n_iter = 500
grtm = GRTM(_K, _alpha, _beta, _mu, _nu2, _b, n_iter, seed=42)
In [59]:
%%time
grtm.fit(doc_term_matrix, y[train_idx])
In [60]:
plot_images(plt, grtm.phi, (5, 5), (2, 5), figsize=(10, 5))
In [ ]:
topic_order = [4, 7, 3, 1, 0, 9, 5, 2, 8]
plot_images(plt, grtm.phi[topic_order], (5, 5), (2, 5), figsize=(10, 5))
In [24]:
burnin = -1
mean_final_lL = grtm.loglikelihoods[burnin:].mean()
print(mean_final_lL)
plt.plot(grtm.loglikelihoods, label='mean final LL {:.2f}'.format(mean_final_lL))
_ = plt.legend()
In [25]:
imshow(grtm.theta)
Out[25]:
In [26]:
H_pred = grtm.H[burnin:].mean(axis=0)
_ = plt.hist(H_pred.ravel(), bins=20)
In [27]:
_ = plt.hist(H.ravel(), bins=20)
Create 1,000 test documents using the same generative process as our training documents.
In [28]:
np.random.seed(42^2)
thetas_test = np.random.dirichlet(alpha, size=D)
topic_assignments_test = np.array([np.random.choice(range(K), size=100, p=theta)
for theta in thetas_test])
word_assignments_test = np.array([[np.random.choice(range(V), size=1, p=topics[topic_assignments_test[d, n]])[0]
for n in range(N)] for d in range(D)])
doc_term_matrix_test = np.array([np.histogram(word_assignments_test[d], bins=V, range=(0, V - 1))[0] for d in range(D)])
imshow(doc_term_matrix_test)
Out[28]:
Learn their topic distributions using the model trained on the training documents, then calculate the actual and predicted values of $\psi$. For predicted $\psi$, estimate $\eta$ as the mean of our samples of $\eta$ after burn-in.
In [29]:
def bern_param(theta1, theta2, H):
zeta = np.dot(np.dot(theta1, H), theta2)
return np.exp(zeta) / (1 + np.exp(zeta))
In [30]:
thetas_test_grtm = grtm.transform(doc_term_matrix_test)
p_test = np.zeros(D * D)
p_test_grtm = np.zeros(D * D)
for n, i in enumerate(product(range(D), range(D))):
p_test[n] = bern_param(thetas_test[i[0]], thetas_test[i[1]], H)
p_test_grtm[n] = bern_param(thetas_test_grtm[i[0]], thetas_test_grtm[i[1]], H_pred)
Measure the goodness of our prediction by the area under the associated ROC curve.
In [31]:
y_test = (p_test > 0.5).astype(int)
y_grtm = p_test_grtm
fpr, tpr, _ = roc_curve(y_test, y_grtm)
plt.plot(fpr, tpr, label=('AUC = {:.3f}'.format(roc_auc_score(y_test, y_grtm))))
_ = plt.legend(loc='best')
In [32]:
from slda.topic_models import LDA
In [33]:
lda = LDA(_K, _alpha, _beta, n_iter, seed=42)
In [34]:
%%time
lda.fit(doc_term_matrix)
In [35]:
plot_images(plt, lda.phi, (5, 5), (1, 5), figsize=(10, 5))
In [36]:
plt.plot(lda.loglikelihoods)
Out[36]:
In [37]:
imshow(lda.theta)
Out[37]:
In [38]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
Compute Hadamard products between learned topic distributions for training and test documents.
In [39]:
%%time
thetas_test_lda = lda.transform(doc_term_matrix_test)
In [40]:
lda_train = np.zeros((D * D, _K * _K))
lda_test = np.zeros((D * D, _K * _K))
for n, i in enumerate(product(range(D), range(D))):
lda_train[n] = np.kron(lda.theta[i[0]], lda.theta[i[1]])
lda_test[n] = np.kron(thetas_test_lda[i[0]], thetas_test_lda[i[1]])
In [41]:
_C_grid = np.arange(1, 202, 10)
roc_auc_scores = []
for _C in _C_grid:
print('Training Logistic Regression with C = {}'.format(_C))
_lr = LogisticRegression(fit_intercept=False, C=_C)
_lr.fit(lda_train, zeta.y)
_y_lr = _lr.predict_proba(lda_test)[:, 1]
roc_auc_scores.append(roc_auc_score(y_test, _y_lr))
print(' roc_auc_score = {}'.format(roc_auc_scores[-1]))
print(_C_grid[np.argmax(roc_auc_scores)], np.max(roc_auc_scores))
plt.plot(_C_grid, roc_auc_scores)
Out[41]:
In [42]:
lr = LogisticRegression(fit_intercept=False, C=11)
lr.fit(lda_train, zeta.y)
y_lr = lr.predict_proba(lda_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_lr)
plt.plot(fpr, tpr, label=('AUC = {:.3f}'.format(roc_auc_score(y_test, y_lr))))
_ = plt.legend(loc='best')
In [43]:
_C_grid = np.arange(1, 4)
roc_auc_scores = []
for _C in _C_grid:
print('Training Logistic Regression with C = {}'.format(_C))
_gbc = GradientBoostingClassifier(max_depth=_C)
_gbc.fit(lda_train, zeta.y)
_y_gbc = _gbc.predict_proba(lda_test)[:, 1]
roc_auc_scores.append(roc_auc_score(y_test, _y_gbc))
print(' roc_auc_score = {}'.format(roc_auc_scores[-1]))
print(_C_grid[np.argmax(roc_auc_scores)], np.max(roc_auc_scores))
plt.plot(_C_grid, roc_auc_scores)
Out[43]:
In [44]:
gbc = GradientBoostingClassifier(max_depth=3)
gbc.fit(lda_train, zeta.y)
y_gbc = gbc.predict_proba(lda_test)[:, 1]
fpr_gbc, tpr_gbc, _ = roc_curve(y_test, y_gbc)
plt.plot(fpr_gbc, tpr_gbc, label=('AUC = {:.3f}'.format(roc_auc_score(y_test, y_gbc))))
plt.legend(loc='best')
Out[44]:
In [45]:
grtm_train = np.zeros((D * D, _K * _K))
grtm_test = np.zeros((D * D, _K * _K))
for n, i in enumerate(product(range(D), range(D))):
grtm_train[n] = np.kron(grtm.theta[i[0]], grtm.theta[i[1]])
grtm_test[n] = np.kron(thetas_test_grtm[i[0]], thetas_test_grtm[i[1]])
In [46]:
_C_grid = np.arange(1, 52, 10)
roc_auc_scores = []
for _C in _C_grid:
print('Training Logistic Regression with C = {}'.format(_C))
_lr = LogisticRegression(fit_intercept=False, C=_C)
_lr.fit(grtm_train, zeta.y)
_y_lr = _lr.predict_proba(grtm_test)[:, 1]
roc_auc_scores.append(roc_auc_score(y_test, _y_lr))
print(' roc_auc_score = {}'.format(roc_auc_scores[-1]))
print(_C_grid[np.argmax(roc_auc_scores)], np.max(roc_auc_scores))
plt.plot(_C_grid, roc_auc_scores)
Out[46]:
In [47]:
lr = LogisticRegression(fit_intercept=False, C=51)
lr.fit(grtm_train, zeta.y)
y_lr = lr.predict_proba(grtm_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_lr)
plt.plot(fpr, tpr, label=('AUC = {:.3f}'.format(roc_auc_score(y_test, y_lr))))
_ = plt.legend(loc='best')
In [ ]: