start
(first) POI of a trajectoryend
(last) POI of a trajectorySplit dataset into training set(~10%), annotation set (i.e. simulate user annotation)(~50%), evaluation set (a.k.a. test set) (~40%).
Learning parameters (of a probabilistic model) using the above training set, evaluate its performance (e.g. calculate F1 measure) on evaluation set.
Query unlabelled example (in annotation set) using a query strategy (e.g. Information Density, Least Confident as recommended in Settles08), add the label of queried example to training set, reestimate parameters and evaluate its performance on evaluation set.
Possible k-fold cross validation using all data in training set + evaluation set (~50%).
Draw a learning curve: F1 vs. #query
For all data (travelling sequences) of each user, split it into training set(~10%), annotation set (i.e. simulate user annotation)(~50%), evaluation set (~40%).
Learning a transition tensor using factorization technique described in FPMC paper with all data in training set, evaluate its performance (e.g. calculate F1 measure) on evaluation set.
Query unlabelled example (in annotation set) using a query strategy (e.g. Information Density, Least Confident as recommended in Settles08), add the label of queried example to training set, reestimate parameters and evaluate its performance on evaluation set.
Possible k-fold cross validation (probably have to due to high sparsity data for each user) using all data in training set + evaluation set (~50%).
Draw a learning curve: F1 vs. #query
The following query strategies are adapted from sequence labeling tasks describe in Settles08.
When evaluating the performance of the recommender, given
Predict the label of $\textbf{x}$, i.e. recommend a trajectory
Compute Precision
, Recall
and F1-score
defined here.
Formulate trajectory recommendation as an active learning problem.
Query strategies: given (start, end) and trajectory length $l$ for a specific user $u$, we'll select an example for user $u$ to label.
TODO: utilise user-specific features when selecting trajectory for user $u$ to label if there are trajectories of user $u$ in our dataset.
Formulate trajectory recommendation as a contextual multi-armed bandit problem.
1) Arm: transition from current POI$_i$ to another POI$_j$, (allow sub-tours?) (encode the target user into each arm?)
2) Context: transition features including POI category transition probabilities, POI popularity transition probabilities, distance of POI pair transition probabilities and user specific features (visit duration/frequency), available in this notebook .
3) Reward: NOT clear yet, but could be one of the followings.
[p1, p2, p3]
, ground truth trajectories [p1, p2, p3]
and [p1, p2, p4]
, the reward is computed as
\begin{equation}
max\{F1([p1, p2, p3], [p1, p2, p3]), F1([p1, p2, p3], [p1, p2, p4])\} = max\{1.0, 0.667\} = 1.0
\end{equation}[p1, p3, p2]
, ground truth trajectories [p1, p2, p3]
and [p1, p2, p4]
,
with respect to trajectory [p1, p2, p3]
, pair (p1, p3)
in [p1, p3, p2]
count because p3
is visited after p1
,
similarly, pair (p1, p2)
count but pair (p3, p2)
does not count because p3
is not visited before p2
in trajectory [p1, p2, p3]
, so the total number of pairs that counts is 2
and there are 3
pairs in total, the metric here is 2/3
. Similarly, the metric with respect to trajectory [p1, p2, p4]
is 1/3
because there is only one pair, i.e. (p1, p2)
among the 3
pairs count. The reward is computed as
\begin{equation}
max\{\frac{2}{3}, \frac{1}{3}\} = \frac{2}{3}
\end{equation}[p1, p3]
, and ground truth trajectory [p1, p4]
,
if p3
and p4
are of the same category as well as similar popularity (i.e. belong to the same popularity class after discretizing the POI popularity) and similar distance (i.e. same distance class after discretization) from the current POI, then the reward of arm (p1, p3)
would be high.4) Regret: regret after $T$ trials is defined as
5) Sampling strategies: NOT clear yet, but could be one of the followings.
Given (start, end) and trajectory length $l$ for a specific user $u$, we'll recommend a trajectory to user $u$ as follows:
Given (start, end) and trajectory length $l$ for a specific user $u$, we'll recommend a trajectory to user $u$ as follows:
We use the same leave-one-out cross validation approach described in the ijcai15 paper as the basis to measure the performance of different formulations.
Concretely, for each user $u$ in the dataset, choose one trajectory (length >= 3) from all trajectories of $u$ uniformly at random, this trajectory is used as the ground truth to measure the performance of the recommendation (i.e. compute the precision, recall or F1-score defined here), all other trajectories are used to train/estimate parameters. The experiment will iterate through all users and compute a list of precision, recall and F1-score.
In the active learning setting, we'll get a metric (i.e. mean of precision, recall or F1-score as we do cross validation for each user) for each number of examples queried, thus, the result of experiment is a learning curve: (mean) precision/recall/F1-score
vs. #example queried
.
In the bandit setting, we'll generally get a regret
vs. #trial
curve, where the regret
is the cumulative regret defined above, and #trail
is the number of trials performed (i.e. choose an arm and get corresponding reward).
In the baseline setting, we'll get a metric (i.e. precision, recall or F1-score) for each validation trajectory, thus, the result is a list of numeric (i.e. a list of precision, recall or F1-score) which can be seen as a number of samples drawn from an unknown distribution that characterises the performance of the algorithm.
It is straightforward to compare the performance between the Active Learning and the baseline formulation, for example,
However, in the bandit setting, we need to convert regret
to precision/recall/F1-score to ease the comparison between the bandit setting and other forumations. (how to convert?)