Trajectory Recommendation with Active Learning

1. Experimental Protocol for Active Learning

1.1 Notations

$Traj$: a trajectory
$|Traj|$: the number of POIs in trajectory $Traj$
$P_s$: the start (first) POI of a trajectory
$P_e$: the end (last) POI of a trajectory
$u$: a specific user
$\textbf{x}$: an example
$\textbf{y}$: label of an example $\textbf{x}$

1.2 Learning without personalization

Example: $\textbf{x} = (P_s, P_e, |Traj|)$
Label of Example: $\textbf{y} = Traj$

Split dataset into training set(~10%), annotation set (i.e. simulate user annotation)(~50%), evaluation set (a.k.a. test set) (~40%).
Learning parameters (of a probabilistic model) using the above training set, evaluate its performance (e.g. calculate F1 measure) on evaluation set.
Query unlabelled example (in annotation set) using a query strategy (e.g. Information Density, Least Confident as recommended in Settles08), add the label of queried example to training set, reestimate parameters and evaluate its performance on evaluation set.
Possible k-fold cross validation using all data in training set + evaluation set (~50%).
Draw a learning curve: F1 vs. #query

Probabilistic Model

A Markov Chain combining POI category transition probabilities, POI popularity transition probabilities and POI pair distance transition probabilities, all estimated using Maximumu Likelihood (MLE).
A Markov Chain only with POI-POI transition matrix learned using MLE (hard to deal with sparsity) or factorization technique (capable of dealing with sparsity) described in FPMC paper.
How to incorporate POI-pair distance transitions?

Baseline

The same probabilistic model with a random example choosen (of a random user) in the query step (passive learning).

1.3 Learning personalized recommender

Example: $\textbf{x} = (P_s, P_e, |Traj|, u)$
Label of Example: $\textbf{y} = Traj$

For all data (travelling sequences) of each user, split it into training set(~10%), annotation set (i.e. simulate user annotation)(~50%), evaluation set (~40%).
Learning a transition tensor using factorization technique described in FPMC paper with all data in training set, evaluate its performance (e.g. calculate F1 measure) on evaluation set.
Query unlabelled example (in annotation set) using a query strategy (e.g. Information Density, Least Confident as recommended in Settles08), add the label of queried example to training set, reestimate parameters and evaluate its performance on evaluation set.
Possible k-fold cross validation (probably have to due to high sparsity data for each user) using all data in training set + evaluation set (~50%).
Draw a learning curve: F1 vs. #query

Probabilistic Model

A transition tensor estimated using factorisation techniques described in FPMC paper.

Baseline

The same probabilistic model with a random example choosen (of a specific user) in the query step (passive learning).

Possible Drawbacks

The cost of reestimating parameters after each query using techniques described in FPMC paper is very expensive.
Investigate if incremental paramter update is possible.
How to incorporate POI-pair distance transitions?

1.4 Query Strategies

The following query strategies are adapted from sequence labeling tasks describe in Settles08.

Sequence Entropy \begin{equation} \phi^{SE}(\textbf{x}) = - \sum_{\hat{\textbf{y}}} P(\hat{\textbf{y}} | \textbf{x}; \Theta) \log(P(\hat{\textbf{y}} | \textbf{x}; \Theta)) \end{equation} where $\hat{\textbf{y}}$ ranges over all possible labels for example $\textbf{x}$.
Note that the number of possible labels grows exponentially with $|Traj|$ in $\textbf{x}$, to make computation feasible, Kim06 used the $N$-best possible labels to approximate, concretely, define N-best Sequence Entropy as \begin{equation} \phi^{NSE}(\textbf{x}) = - \sum_{\hat{\textbf{y}} \in \mathcal{N}} P(\hat{\textbf{y}} | \textbf{x}; \Theta) \log(P(\hat{\textbf{y}} | \textbf{x}; \Theta)) \end{equation} where $\mathcal{N} = \{\textbf{y}_1^*, \dots, \textbf{y}_N^*\}$ is the set of the $N$ most likely labels of example $\textbf{x}$.
This query strategy select the example $\textbf{x}$ of maximum $\phi^{SE}$ or $\phi^{NSE}$ from all unlabelled examples in a pool to query.

Information Density \begin{equation} \phi^{ID}(\textbf{x}) = \phi^{SE}(\textbf{x}) \times \left( \frac{1}{U} \sum_{u=1}^U \text{sim}(\textbf{x}, \textbf{x}^u) \right)^\beta \end{equation} That is, the informativeness of example $\textbf{x}$ is weighted by its average similarity to all other unlabelled examples (denoted by $\mathcal{U})$ in the pool, subject to parameter $\beta$ which was set to $1$ in Settles08 giving no reason, sequence entropy $\phi^{SE}$ measures the "base" informativeness and could be replaced by $\phi^{NSE}$ defined above.
This query strategy select the example $\textbf{x}$ of maximum $\phi^{ID}$ from all unlabelled examples in a pool to query.
TODO: define a function to compute the simularity between two trajectories, Settles08 uses cosine simularity after transforming a sequence into a fixed length feature vector.

Least Confident \begin{equation} \phi^{LC}(\textbf{x}) = 1 - P(\textbf{y}^* | \textbf{x}; \Theta) \end{equation} where $\textbf{y}^*$ is the most likely label of example $\textbf{x}$ with respect to a probabilistic model of which the parameters are denoted by $\Theta$.
This query strategy select the example $\textbf{x}$ of maximum $\phi^{LC}$ from all unlabelled examples in a pool to query.

1.5 Performance Evaluation

When evaluating the performance of the recommender, given

a probabilistic model
an example $\textbf{x}$: $(P_s, P_e, |Traj|)$ (with user $u$ in the personalized setting)

Predict the label of $\textbf{x}$, i.e. recommend a trajectory

satisfy constraints specified in $\textbf{x}$, i.e. starts at $P_s$, ends at $P_e$, with $|Traj|$ POIs, for user $u$ in the personalized setting
with highest posterior probability with respect to the probabilistic model

Precision, Recall, F1-score

Compute Precision, Recall and F1-score defined here.

Evaluate POI visit duration in recommended trajectory

Estimate POI visit duration as the average visit duration of the same POI category
- (average over) all users (possible inaccuracy)
- (average over) the specific user (possible sparsity)
Compute the mean square error of POI visit durations in recommended trajectory

Problem Formulation

Active Learning Formulation

Formulate trajectory recommendation as an active learning problem.

Example: a tuple (user, trajectory)
Label of example: binary (i.e. positive/negative)
- Observed tuples (user, trajectory) (i.e. those we extracted from data) are labelled as positive
- Unobserved tuples (e.g. trajectories generated on the fly such as synthesize trajectories by enumeration) are unlabelled
- Tuples chosen to query a user will be labelled as positive/negative if the feedback from the user is positive/negative

Query strategies: given (start, end) and trajectory length $l$ for a specific user $u$, we'll select an example for user $u$ to label.

Compute/enumerate all trajectories of length $l$ with (start, end) as candidates
Use a uniform prior for all candidates
Compute the likelihood of candidates as described here
Sort candidates by their posterior probabilities (i.e. $\text{prior} \times \text{likelihood}$) in descending order
Select a trajectory from the top $K$ (e.g. 5) candidates with probability proportional to its posterior for user $u$ to label

TODO: utilise user-specific features when selecting trajectory for user $u$ to label if there are trajectories of user $u$ in our dataset.

Bandit Formulation

Formulate trajectory recommendation as a contextual multi-armed bandit problem.

1) Arm: transition from current POI$_i$ to another POI$_j$, (allow sub-tours?) (encode the target user into each arm?)

2) Context: transition features including POI category transition probabilities, POI popularity transition probabilities, distance of POI pair transition probabilities and user specific features (visit duration/frequency), available in this notebook .

3) Reward: NOT clear yet, but could be one of the followings.

F1-scored based metric, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is [p1, p2, p3], ground truth trajectories [p1, p2, p3] and [p1, p2, p4], the reward is computed as \begin{equation} max\{F1([p1, p2, p3], [p1, p2, p3]), F1([p1, p2, p3], [p1, p2, p4])\} = max\{1.0, 0.667\} = 1.0 \end{equation}

Visiting order based metric, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is [p1, p3, p2], ground truth trajectories [p1, p2, p3] and [p1, p2, p4], with respect to trajectory [p1, p2, p3], pair (p1, p3) in [p1, p3, p2] count because p3 is visited after p1, similarly, pair (p1, p2) count but pair (p3, p2) does not count because p3 is not visited before p2 in trajectory [p1, p2, p3], so the total number of pairs that counts is 2 and there are 3 pairs in total, the metric here is 2/3. Similarly, the metric with respect to trajectory [p1, p2, p4] is 1/3 because there is only one pair, i.e. (p1, p2) among the 3 pairs count. The reward is computed as \begin{equation} max\{\frac{2}{3}, \frac{1}{3}\} = \frac{2}{3} \end{equation}

Metrics incorporating POI category, POI popularity, and the travel distance from the current POI, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is [p1, p3], and ground truth trajectory [p1, p4], if p3 and p4 are of the same category as well as similar popularity (i.e. belong to the same popularity class after discretizing the POI popularity) and similar distance (i.e. same distance class after discretization) from the current POI, then the reward of arm (p1, p3) would be high.

4) Regret: regret after $T$ trials is defined as

\begin{equation} R(T) = \textbf{E}\left[\sum_t^T(\text{optimal reward})_t\right] - \textbf{E}\left[\sum_t^T(\text{actual reward})_t\right] \end{equation}

5) Sampling strategies: NOT clear yet, but could be one of the followings.

$\epsilon$-greedy: choose arm (i.e. transition) with the highest posterior with probability 1-$\epsilon$, choose a random arm with probability $\epsilon$. (using a uniform prior and compute likelihood as describe here)

Thompson sampling: design a prior of arm's reward (uniform? gaussian? etc.), compute likelihood as described here. an interesting paper on Thompson sampling.

UCB-type strategies such as LinUCB introduced in this paper.

Baseline

Prior knowledge only

Given (start, end) and trajectory length $l$ for a specific user $u$, we'll recommend a trajectory to user $u$ as follows:

Estimate transition probabilities using all data
Compute/enumerate all trajectories of length $l$ with (start, end) as candidates
Use a uniform prior for all candidates
Compute the likelihood of candidates as described here
Sort candidates by their posterior probabilities (the same as the likelihood since the prior is uniform) in descending order
Recommend a trajectory with probability proportional to its posterior

User specific knowledge

Given (start, end) and trajectory length $l$ for a specific user $u$, we'll recommend a trajectory to user $u$ as follows:

Estimate transition probabilities using per user data
Compute/enumerate all trajectories of length $l$ with (start, end) as candidates
Use a uniform prior for all candidates
Compute the likelihood of candidates as described here
Sort candidates by their posterior probabilities (the same as the likelihood since the prior is uniform) in descending order
Recommend a trajectory with probability proportional to its posterior

Experiment for choosing formulation

We use the same leave-one-out cross validation approach described in the ijcai15 paper as the basis to measure the performance of different formulations.

Concretely, for each user $u$ in the dataset, choose one trajectory (length >= 3) from all trajectories of $u$ uniformly at random, this trajectory is used as the ground truth to measure the performance of the recommendation (i.e. compute the precision, recall or F1-score defined here), all other trajectories are used to train/estimate parameters. The experiment will iterate through all users and compute a list of precision, recall and F1-score.

Evaluation the performance of Active Learning formulation

In the active learning setting, we'll get a metric (i.e. mean of precision, recall or F1-score as we do cross validation for each user) for each number of examples queried, thus, the result of experiment is a learning curve: (mean) precision/recall/F1-score vs. #example queried.

Evaluation the performance of Bandit formulation

In the bandit setting, we'll generally get a regret vs. #trial curve, where the regret is the cumulative regret defined above, and #trail is the number of trials performed (i.e. choose an arm and get corresponding reward).

Evaluation the performance of the Baseline

In the baseline setting, we'll get a metric (i.e. precision, recall or F1-score) for each validation trajectory, thus, the result is a list of numeric (i.e. a list of precision, recall or F1-score) which can be seen as a number of samples drawn from an unknown distribution that characterises the performance of the algorithm.

Performance Comparison

It is straightforward to compare the performance between the Active Learning and the baseline formulation, for example,

Simply use the mean of results in the baseline setting, we get a single number (e.g. mean precision/recall/F1-score) to measure the performance of the baseline formulation
Simply use the mean of results for each number of examples queried in the active learning setting, we get a list of points (#example queried, metric) to measure the performance of the active learning formulation
Or we can use all numbers instead of just their mean and utilise KL divergence to measure the performance difference.

However, in the bandit setting, we need to convert regret to precision/recall/F1-score to ease the comparison between the bandit setting and other forumations. (how to convert?)