Lab Book for Trajectory Recommendation Experiments

Experiment to check something

Conjecture

Experiment

Results

Conclusion

RankSVM: label of example with rank considered

Conjecture

Experiment

Results

Osaka:

RankSVM:     F1 (0.678, 0.037), pairsF1 (0.425, 0.058), Tau (0.646, 0.040), perfectF1 15/46, perfectPairsF1 13/46
RankSVM-New: F1 (0.656, 0.036), pairsF1 (0.389, 0.058), Tau (0.625, 0.039), perfectF1 13/46, perfectPairsF1 12/46

Conclusion

What is a query: user? length?

Including user in query helps personalised recommendation, but it faces data sparsity and cold-start problem.
Ideally, for users with enough data, we do personalised recommendation, for new users and users without enough data, we do non-personalised recommendation.
Restrict the number of POIs (in query) is not a natural choice in tour recommendation, can we do recommendation with variable length (number of POIs), e.g., length 2 to 10?

SSVM performance with new features

Conjecture:

The performance of multi-label SSVM with top-5 recommendation and new features (SSVM-T5-NEW) performs better than multi-label SSVM with top-5 recommendation (without new features) (SSVM-T5-OLD).

Experiment

Run experiments on dataset using both methods.

Results

Glasgow:

SSVM-T5-OLD: F1 (0.856, 0.024), pairsF1 (0.700, 0.043), Tau (0.828, 0.029), perfectF1 39/64, perfectPairsF1 33/64
SSVM-T5-NEW: F1 (0.835, 0.025), pairsF1 (0.667, 0.044), Tau (0.809, 0.028), perfectF1 35/64, perfectPairsF1 30/64
SSVM-T5-NEW-NOLOG:F1 (0.841, 0.025) pairsF1 (0.688, 0.045) Tau (0.816, 0.029) perfectF1 37/64 perfectPairsF1 35/64

Osaka:

SSVM-T5-OLD: F1 (0.755, 0.037), pairsF1 (0.566, 0.059), Tau (0.728, 0.040), perfectF1 21/46, perfectPairsF1 20/46
SSVM-T5-NEW: F1 (0.729, 0.040), pairsF1 (0.543, 0.062), Tau (0.700, 0.044), perfectF1 21/46, perfectPairsF1 20/46
SSVM-T5-NEW-NOLOG:F1 (0.744, 0.038) pairsF1 (0.560, 0.060) Tau (0.714, 0.042) perfectF1 21/46 perfectPairsF1 20/46

Conclusion

The conjecture is probably FALSE.

Take the log of popularity, number visit and visit duration help?

Conjecture

Take the log of POI popularity, the total number of visit and the visit duration when construct POI features does NOT help RankSVM.

Experiment

Run RankSVM with (+Log-RankSVM) and without (-Log-RankSVM) taking the log of POI popularity, the total number of visit and the visit duration when construct POI features.

Results

Glasgow:

-Log-RankSVM: F1 (0.754, 0.027), pairsF1 (0.538, 0.047), Tau (0.726, 0.030), perfectF1 24/64, perfectPairsF1 24/64
+Log-RankSVM: F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030), perfectF1 25/64, perfectPairsF1 25/64

Osaka:

-Log-RankSVM: F1 (0.633, 0.033), pairsF1 (0.343, 0.055), Tau (0.603, 0.036), perfectF1 10/46, perfectPairsF1 10/46
+Log-RankSVM: F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040), perfectF1 15/46, perfectPairsF1 14/46

Conclusion

The conjecture is FALSE.

Is trajectory length a informative feature?

Conjecture

The performance of RankSVM does NOT change with/without trajectory length in features.

Experiment

Run RankSVM with (+TrajLen) and without (-TrajLen) in features.

Results

Glasgow:

-TrajLen: F1 (0.764, 0.027), pairsF1 (0.550, 0.047), Tau (0.736, 0.030), perfectF1: 25/64, perfectPairsF1: 25/64
+TrajLen: F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030), perfectF1: 25/64, perfectPairsF1: 25/64

Toronto:

-TrajLen: F1 (0.759, 0.021), pairsF1 (0.512, 0.036), Tau (0.721, 0.024), perfectF1: 32/99, perfectPairsF1: 29/99
+TrajLen: F1 (0.759, 0.021), pairsF1 (0.512, 0.036), Tau (0.721, 0.024), perfectF1: 32/99, perfectPairsF1: 29/99

Osaka:

-TrajLen: F1 (0.678, 0.037), pairsF1 (0.425, 0.058), Tau (0.646, 0.040), perfectF1: 15/46, perfectPairsF1: 13/46
+TrajLen: F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040), perfectF1: 15/46, perfectPairsF1: 14/46

Conclusion

Trajectory length as a feature helps a tiny bit on some dataset, so keep it as a feature.

Top-k prediction

Conjecture

The performance of multi-label SSVM will improve a lot if the top-k trajectories are recommended, given that the performance evaluated as $\max_{\bar{y_k}} \max_{y_k} \text{metric}(\bar{y_k}, y_k)$.

Experiment

The training of multi-label SSVM is achieved by the list Viterbi with sub-tours, and the top-k ($k=1$ SSVM-ML-T1 or $k=5$ SSVM-ML-T5) prediction are achieved by the list Viterbi without sub-tours.

Results

Glasgow:

RankSVM:    F1 (0.764, 0.027), pairsF1 (0.550, 0.047), Tau (0.736, 0.030), perfectF1 25/64, perfectPairsF1 25/64
SSVM-SL-T5: F1 (0.739, 0.030), pairsF1 (0.516, 0.049), Tau (0.702, 0.034), perfectF1 26/64, perfectPairsF1 23/64
SSVM-ML-T1: F1 (0.770, 0.030), pairsF1 (0.576, 0.050), Tau (0.740, 0.034), perfectF1 30/64, perfectPairsF1 26/64
SSVM-ML-T5: F1 (0.856, 0.024), pairsF1 (0.700, 0.043), Tau (0.828, 0.029), perfectF1 39/64, perfectPairsF1 33/64

Toronto:

RankSVM:    F1 (0.759, 0.021), pairsF1 (0.512, 0.036), Tau (0.721, 0.024), perfectF1 32/99, perfectPairsF1 29/99
SSVM-SL-T5: F1 (0.720, 0.024), pairsF1 (0.499, 0.038), Tau (0.670, 0.028), perfectF1 37/99, perfectPairsF1 33/99
SSVM-ML-T1: F1 (0.742, 0.022), pairsF1 (0.507, 0.037), Tau (0.703, 0.026), perfectF1 34/99, perfectPairsF1 31/99
SSVM-ML-T5: F1 (0.794, 0.021), pairsF1 (0.583, 0.036), Tau (0.755, 0.024), perfectF1 42/99, perfectPairsF1 37/99

Osaka:

RankSVM:    F1 (0.678, 0.037), pairsF1 (0.425, 0.058), Tau (0.646, 0.040), perfectF1 15/46, perfectPairsF1 13/46
SSVM-SL-T5: F1 (0.746, 0.037), pairsF1 (0.543, 0.058), Tau (0.722, 0.040), perfectF1 20/46, perfectPairsF1 18/46
SSVM-ML-T1: F1 (0.669, 0.036), pairsF1 (0.414, 0.057), Tau (0.637, 0.040), perfectF1 13/46, perfectPairsF1 13/46
SSVM-ML-T5: F1 (0.755, 0.037), pairsF1 (0.566, 0.059), Tau (0.728, 0.040), perfectF1 21/46, perfectPairsF1 20/46

Edinburgh:

RankSVM:    F1 (0.725, 0.017), pairsF1 (0.451, 0.026), Tau (0.663, 0.020), perfectF1 34/147, perfectPairsF1 28/147
SSVM-SL-T5: F1 (0.653, 0.019), pairsF1 (0.391, 0.027), Tau (0.576, 0.023), perfectF1 34/147, perfectPairsF1 27/147
SSVM-ML-T5: F1 (0.689, 0.018), pairsF1 (0.426, 0.027), Tau (0.618, 0.022), perfectF1 39/147, perfectPairsF1 30/147

Conclusion

Multi-label SSVM: with vs without derived features

Conjecture

The performance of multi-label SSVM is similar with (+Derived) or without derived (node) features (-Derived).

Experiment

Evaluate multi-label SSVM (leave-one-out cross validation) with and without turning on the derived (node) features. Loss augmented inference is the list Viterbi algorithm (allow sub-tours), prediction inference is list Viterbi algorithm (do NOT allow sub-tours)

Results

Glasgow:

-Derived: F1 (0.665, 0.030), pairsF1 (0.413, 0.050), Tau (0.628, 0.034), perfectF1: 19/64, perfectPairsF1: 18/64
+Derived: F1 (0.770, 0.030), pairsF1 (0.576, 0.050), Tau (0.740, 0.034), perfectF1: 30/64, perfectPairsF1: 26/64

Osaka:

-Derived: F1 (0.562, 0.035), pairsF1 (0.230, 0.050), Tau (0.525, 0.037), perfectF1:  8/46, perfectPairsF1: 6/46
+Derived: F1 (0.670, 0.040), pairsF1 (0.433, 0.060), Tau (0.637, 0.044), perfectF1: 15/46, perfectPairsF1: 14/46

Conclusion

The conjecture is false.

RankSVM: with vs without derived features

Conjecture

The derived features helps a lot for RankSVM.

Unary features:
1. category: one-hot encoding of POI category, encode True as 1 and False as -1
2. neighbourhood: one-hot encoding of POI cluster, encode True as 1 and False as -1
3. popularity: log of POI popularity, i.e., the number of distinct users that visited the POI
4. nVisit: log of the total number of visit by all users
5. avgDuration: log of average POI visit duration
Derived features:
1. trajLen: trajectory length, i.e., the number of POIs nPOI in trajectory, copy from query
2. sameCatStart: 1 if POI category is the same as that of startPOI, -1 otherwise
3. distStart: distance (haversine formula) from startPOI
4. diffPopStart: difference in POI popularity from startPOI (NO LOG as it could be negative)
5. diffNVisitStart: difference in the total number of visit from startPOI
6. diffDurationStart: difference in average POI visit duration from the actual duration spent at startPOI
7. sameNeighbourhoodStart: 1 if POI resides in the same cluster as that of startPOI, -1 otherwise

Experiment

Evaluate RankSVM (leave-one-out cross validation) with and without turning on the derived features.

Results

Glasgow:

Unary+Derived: F1 (0.764, 0.027), pairsF1 (0.550, 0.047), Tau (0.736, 0.030)
Unary:         F1 (0.659, 0.030), pairsF1 (0.388, 0.049), Tau (0.624, 0.033)

Toronto:

Unary+Derived: F1 (0.759, 0.021), pairsF1 (0.512, 0.036), Tau (0.721, 0.024)
Unary:         F1 (0.676, 0.021), pairsF1 (0.370, 0.033), Tau (0.629, 0.023)

Osaka:

Unary+Derived: F1 (0.678, 0.037), pairsF1 (0.425, 0.058), Tau (0.646, 0.040)
Unary:         F1 (0.622, 0.033), pairsF1 (0.307, 0.053), Tau (0.590, 0.035)

Conclusion

The conjecture is true.

Multi-user multi-label SSVM

Formulation

We would like to formulate a multi-user multi-label SSVM as \begin{align} \min_{\mathbf{w},\xi\ge0} ~& \frac{1}{2} \sum_{i=1}^m \|\mathbf{w}_i - \bar{\mathbf{w}}\|^2 + C \sum_i \sum_j \sum_k \xi_{ijk} \\ s.t. ~& \langle \mathbf{w}_i, \Psi(\mathbf{x}_{ij}, \mathbf{y}_{ijk}) \rangle - \langle \mathbf{w}_i, \Psi(\mathbf{x}_{ij}, \bar{\mathbf{y}}) \rangle \ge \Delta(\mathbf{y}_{ijk}, \bar{\mathbf{y}}) - \xi_{ijk},~ \bar{\mathbf{y}} \in \mathcal{Y} \setminus \{\mathbf{y}_{ijk}\}_{k=1}^{m_{ij}},~ \forall k,~\forall j,~\forall i. \end{align} where $i$ indexes users, $j$ indexes queries of the $i$-th user and $k$ indexes the (multi-) labels of the $j$-th query of the $i$-th user.

Drawback

The standard form of quadratic program (QP) is \begin{align} \min_{x}~& \frac{1}{2}x^\top P x + q^\top x \ s.t. ~& G x \le h \
```
    & A x = b
```
\end{align}
It seems that the regularisation term $\sum_{i=1}^m \|w_i - \bar{w}\|^2$ where $\bar{w} = \frac{1}{m} \sum_{i=1}^m w_i$ can not be rephrased into the standard form of QP objective.
As pystruct solve the dual problem (QP) of the 1-slack SSVM (primal), however, in the Wolfe-Dual of the multi-user multi-label 1-slack SSVM (the former was shown in the appendix of Cutting-Plane Training of Structural SVMs), its objective contains both dual variables and primal variables, which is NOT a QP.
It seems that the regularisation term $\sum_{i=1}^m \|\mathbf{w}_i - \bar{\mathbf{w}}\|^2$ controls only the variance of $w_i$, the mean $\bar{w}$ can still be arbitrarily large/small, e.g. $w_1 = [0.9, 0.9], w_2 = [1.0, 1.0], w_3 = [1.1, 1.1]$ vs. $w_1 = [100.9, 100.9], w_2 = [100.0, 100.0], w_3 = [101.1, 101.1]$

Multi-label SSVM training: list Viterbi vs greedy

Conjecture

When multi-label SSVM is trained using the greedy algorithm, it will perform better than when trained using the list Viterbi algorithm (allow sub-tours), the prediction inference is the list Viterbi algorithm (do NOT allow sub-tours) in both case.

Experiment

Do leave-one-out cross validation, in which multi-label SSVMs are trained using the list Viterbi algorithm (allow sub-tours) in the loss augmented inference, and use the list Viterbi algorithm (do NOT allow sub-tours) to do prediction inference (SSVM-LV-LV)
Do leave-one-out cross validation, in which multi-label SSVMs are trained using the greedy algorithm in the loss augmented inference, and use the list Viterbi algorithm (do NOT allow sub-tours) to do prediction inference (SSVM-GR-LV)

Results

Glasgow:

SSVM-GR-LV: F1 (0.728, 0.029), pairsF1 (0.496, 0.048), Tau (0.696, 0.032), perfectF1 22/64, perfectPairsF1 21/64
SSVM-LV-LV: F1 (0.770, 0.030), pairsF1 (0.576, 0.050), Tau (0.740, 0.034), perfectF1 30/64, perfectPairsF1 26/64
MEMM:       F1 (0.699, 0.028), pairsF1 (0.436, 0.047), Tau (0.668, 0.031), perfectF1 19/64, perfectPairsF1 18/64

Toronto:

SSVM-GR-LV: F1 (0.752, 0.023), pairsF1 (0.520, 0.037), Tau (0.712, 0.026), perfectF1 37/99, perfectPairsF1 32/99
SSVM-LV-LV: F1 (0.742, 0.022), pairsF1 (0.507, 0.037), Tau (0.703, 0.026), perfectF1 34/99, perfectPairsF1 31/99

Osaka:

SSVM-GR-LV: F1 (0.609, 0.041), pairsF1 (0.348, 0.061), Tau (0.576, 0.045), perfectF1 13/46, perfectPairsF1 12/46
SSVM-LV-LV: F1 (0.670, 0.040), pairsF1 (0.433, 0.060), Tau (0.637, 0.044), perfectF1 15/46, perfectPairsF1 14/46
MEMM:       F1 (0.652, 0.035), pairsF1 (0.387, 0.057), Tau (0.624, 0.038), perfectF1 12/46, perfectPairsF1 12/46

Conclusion

There's no clear conclusion if the conjecture is true.

Multi-label SSVM training: list Viterbi with vs without sub-tours

Conjecture

Train the multi-label SSVM using the list Viterbi algorithm but allow sub-tours (the loss augmented inference) will make it perform better than without allowing sub-tours when training.

Experiment

Do leave-one-out cross validation, in which multi-label SSVMs are trained using the list Viterbi algorithm, and allow sub-tours in the loss augmented inference (SSVM-ML-ST)
Do leave-one-out cross validation, in which multi-label SSVMs are trained using the list Viterbi algorithm, but do NOT allow sub-tours in the loss augmented inference (SSVM-ML-NST)

Results

Glasgow:

SSVM-ML-ST:  F1 (0.770, 0.030), pairsF1 (0.576, 0.050), Tau (0.740, 0.034), perfectF1 30/64, perfectPairsF1 26/64
SSVM-ML-NST: F1 (0.752, 0.028), pairsF1 (0.532, 0.048), Tau (0.721, 0.031), perfectF1 25/64, perfectPairsF1 22/64
RankSVM:     F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030), perfectF1 25/64, perfectPairsF1 25/64

Toronto:

SSVM-ML-ST:  F1 (0.742, 0.022), pairsF1 (0.507, 0.037), Tau (0.703, 0.026), perfectF1 34/99, perfectPairsF1 31/99
RankSVM:     F1 (0.759, 0.021), pairsF1 (0.512, 0.036), Tau (0.721, 0.024), perfectF1 32/99, perfectPairsF1 29/99

Edinburgh:

SSVM-ML-ST: F1 (0.629, 0.017), pairsF1 (0.337, 0.025), Tau (0.557, 0.020), perfectF1 22/147, perfectPairsF1 20/147
RankSVM:    F1 (0.727, 0.016), pairsF1 (0.454, 0.026), Tau (0.664, 0.020), perfectF1 34/147, perfectPairsF1 28/147

Osaka:

SSVM-ML-ST:  F1 (0.669, 0.036), pairsF1 (0.414, 0.057), Tau (0.637, 0.040), perfectF1 13/46, perfectPairsF1 13/46
SSVM-ML-NST: F1 (0.670, 0.040), pairsF1 (0.433, 0.060), Tau (0.637, 0.044), perfectF1 15/46, perfectPairsF1 14/46
RankSVM:     F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040), perfectF1 15/46, perfectPairsF1 14/46

Conclusion

It seems the conjecture is probably neither true or false. Nevertheless, allowing sub-tours in loss augmented inference makes the training much more efficient.

Single label SSVM training: Viterbi vs List Viterbi

Conjecture

SSVM can be trained better if the list Viterbi algorithm (instead of the Viterbi algorithm) is used to do the loss augmented inference.

Experiment

Train a single label SSVM on Osaka dataset, use the list Viterbi algorithm to do the prediction inference.
Use the Viterbi algorithm and the list Viterbi algorithm to do the loss augmented inference, respectively.
SSVM without parameter sharing, and take the log of transitions features (factorised probabilities).
Evaluate the performance by the leave-one-out cross validation with regularisation constant $C$ tuned using the Monte-Carlo cross validation.

Results

SSVM-V-LV-SL: single label SSVM, loss augmented inference: Viterbi, prediction inference: list Viterbi
SSVM-LV-LV-SL: single label SSVM, loss augmented inference: list Viterbi, prediction inference: list Viterbi
Larger datasets (that Osaka) are computationally too expensive.

SSVM-V-LV-SL:
F1 (0.632, 0.041), pairsF1 (0.386, 0.061), Tau (0.603, 0.044), perfectF1: 13/46, perfectPairsF1: 13/46

SSVM-LV-LV-SL:
F1 (0.557, 0.036), pairsF1 (0.249, 0.055), Tau (0.523, 0.038), perfectF1:  8/46, perfectPairsF1: 8/46

Conclusion

The conjecture is false.

Multi-label SSVM on larger dataset

Conjecture

Multi-label SSVM can compete with RankSVM.

Experiment

Do leave-one-out cross validation of multi-label SSVM (SSVM-ML) and rankSVM (as well as single-label SSVM SSVM-SL) on Glasgow and Toronto dataset. Note that here query is $(s, K)$ and we share parameters among POIs/transitions, besides, we do NOT take the log of transitions features (factorised probabilities).

Results

Glasgow:
RankSVM: F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030), perfectF1: 25/64, perfectPairsF1: 25/64
SSVM-ML: F1 (0.752, 0.028), pairsF1 (0.532, 0.048), Tau (0.721, 0.031), perfectF1: 25/64, perfectPairsF1: 22/64
SSVM-SL: F1 (0.629, 0.031), pairsF1 (0.362, 0.047), Tau (0.589, 0.034), perfectF1: 15/64, perfectPairsF1: 15/64

Osaka:
RankSVM: F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040), perfectF1: 15/46, perfectPairsF1: 14/46
SSVM-ML: F1 (0.670, 0.040), pairsF1 (0.433, 0.060), Tau (0.637, 0.044), perfectF1: 15/46, perfectPairsF1: 14/46
SSVM-SL: F1 (0.599, 0.033), pairsF1 (0.292, 0.052), Tau (0.566, 0.036), perfectF1: 8/46,  perfectPairsF1: 8/46

Multi-label stats for queries

Conjecture

Only a small fraction of users have more than one label for a given query.

Experiment

Design a new query $(s, K, u)$ where

s: start POI of a trajectory
K: the number of desired POIs
u: user ID then check how many of these queries have multi-labels.

Results

when query is $(s, K, u)$, (#queries with multi-label) / (total #queries)

Osaka:     4/181   ~2%
Glasgow:   18/325  ~6%
Edinburgh: 78/1260 ~6%
Toronto:   68/813  ~8%
Melbourne: 39/965  ~4%

and if the query contains only $(s, K)$, the fraction of multi-labels are:

Osaka:     30/47   ~64%
Glasgow:   41/64   ~64%
Edinburgh: 104/147 ~71%
Toronto:   69/99   ~70%
Melbourne: 139/280 ~50%

Lastly, if the query is $(s, u)$, the fraction of multi-labels are:

Osaka:     8/177    ~5%
Glasgow:   29/309   ~9%
Edinburgh: 121/1165 ~10%
Toronto:   79/751   ~11%
Melbourne: 69/925   ~7%

Multi-label SSVM

Experiment

Implement multi-label SSVM, and compare the performance with vanilla SSVM
For multi-label SSVM, use the list Viteri algorithm to do both loss_augmented_inference and inference for prediction
Leave-one-out evaluation on Osaka dataset
This experiment is performed in notebook ssvm_ml.ipynb

Conclusion

Sample results on Osaka dataset

Tie-Log: share parameters among POIs and transitions, and take the log of (factorised) transition probabilities
Tie-NoLog: share parameters among POIs and transitions, but do NOT take the log of (factorised) transition probabilities
NoTie-Log: do NOT share parameters among POIs and transitions, and take the log of (factorised) transition probabilities

NoTie-NoLog: do NOT share parameters among POIs and transitions, and do NOT take the log of (factorised) transition probabilities

SSVM-MultiLabel-Tie-Log:     F1 (0.624, 0.038), pairsF1 (0.363, 0.059), Tau (0.591, 0.042)
SSVM-MultiLabel-Tie-NoLog:   F1 (0.659, 0.040), pairsF1 (0.423, 0.060), Tau (0.626, 0.044)
SSVM-MultiLabel-NoTie-NoLog: F1 (0.649, 0.035), pairsF1 (0.373, 0.057), Tau (0.619, 0.038)
SSVM-MultiLabel-NoTie-Log:   F1 (0.655, 0.035), pairsF1 (0.377, 0.057), Tau (0.624, 0.037)
SSVM-Tie-Log:                F1 (0.611, 0.037), pairsF1 (0.339, 0.058), Tau (0.580, 0.040)
SSVM-Tie-NoLog:              F1 (0.613, 0.037), pairsF1 (0.330, 0.057), Tau (0.579, 0.041) 
SSVM-NoTie-NoLog:            F1 (0.637, 0.039), pairsF1 (0.378, 0.061), Tau (0.605, 0.043)
SSVM-NoTie-Log:              F1 (0.626, 0.039), pairsF1 (0.361, 0.060), Tau (0.593, 0.042)
RankSVM:                     F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040)

Pruning real dataset to create a single label dataset

Experiment

Pruning the Glasgow dataset $\mathcal{M}_0$, and keep only one trajectory for each query with regards to some deterministic method to choose which trajectory should be keeped among trajectories that conform to the query. (e.g., the one with maximum total number of POI popularity etc.), let $\mathcal{M}_1$ be the new dataset
Train SSVM and RankSVM on $\mathcal{M}_1$, features are computed from $\mathcal{M}_0$ or $\mathcal{M}_1$

Conclusion

Sample results:

Features are computed on the $\mathcal{M}_0$

RankSVM:           F1 (0.536, 0.017), pairsF1 (0.145, 0.020), Tau (0.484, 0.018)
SSVM:              F1 (0.612, 0.029), pairsF1 (0.317, 0.047), Tau (0.573, 0.032)
SSVM-NoTransition: F1 (0.610, 0.029), pairsF1 (0.309, 0.044), Tau (0.571, 0.032)

Features are computed on the $\mathcal{M}_1$

RankSVM:           F1 (0.705, 0.030), pairsF1 (0.414, 0.047), Tau (0.667, 0.033)
SSVM:              F1 (0.629, 0.029), pairsF1 (0.334, 0.046), Tau (0.588, 0.032)
SSVM-NoTransition: F1 (0.588, 0.026), pairsF1 (0.265, 0.040), Tau (0.546, 0.029)

Generating Trajectories using SSVM with random weights

Experiment

Features are computed from Glasgow dataset $\mathcal{M}_0$
SSVM weights are random samples from a standard univariate Gaussian distribution numpy.random.randn()
Prediction inference of SSVM is the list Viterbi algorithm, generated dataset $\mathcal{M}_1$
Use leave-one-evaluation to evaluate performance (features are computed from $\mathcal{M}_1$ and labels are from $\mathcal{M}_1$)
This experiment is performed in notebook generated_data.ipynb

Conclusion

The number of visit for POI are concentrate on a small subset of POIs
The transition matrix of the number of visit (discretized) compute from $\mathcal{M}_1$ is similar to that from $\mathcal{M}_0$
The transition features are correlated with POI features (as transition matrix are factorised according to several POI features)
It could be that both the POI features and transition features are good, but transition features don't provide new information (Cheng)

Sample results from $3$ runs (each run with a different random weights, thus different $\mathcal{M}_1$)

SSVM: F1 (0.812, 0.016), pairsF1 (0.574, 0.028), Tau (0.757, 0.019)
SSVM: F1 (0.745, 0.018), pairsF1 (0.500, 0.029), Tau (0.697, 0.020)
SSVM: F1 (0.772, 0.019), pairsF1 (0.544, 0.030), Tau (0.726, 0.021)

Multi-label effects on SSVM and RankSVM

Some thoughts:

Multi-label hurts SSVM as we have multiple labels given an example (a feature vector).
On the other hand, as the label of example (a POI and query specific feature vector) for RankSVM is computed by counting the number of occurrence of the POI in trajectories that conform to the query, it seems there's no multi-label problem in RankSVM training.

Experiment

Experiment on generated dataset (single label for an example/query)
Both POI and transition features are scaled linearly to $(-1,1)$ using MinMaxScaler
$C$ is tuned using Monte-Carlo CV at the beginning and keeped the same for all leave-one-out CV
$C$ for SSVM, SSVM without transition features and the $C$ for RankSVM are tunned seperatly
Prediction inference in SSVM is the list Viterbi algorithm
This experiment is performed in notebook generated_data.ipynb

Conclusion

When features are computed from the original dataset (with duration related features), NOT the generated dataset, SSVM performs better than RankSVM, sample results:

RankSVM:           F1 (0.597, 0.011), pairsF1 (0.248, 0.012), Tau (0.507, 0.011)
SSVM:              F1 (0.883, 0.014), pairsF1 (0.667, 0.027), Tau (0.843, 0.017)
SSVM-NoTransition: F1 (0.827, 0.015), pairsF1 (0.510, 0.021), Tau (0.775, 0.015)

When features are recomputed from the generated dataset (so NO duration related features), RankSVM performs better than SSVM, sample results:

RankSVM:           F1 (0.903, 0.012), pairsF1 (0.612, 0.023), Tau (0.858, 0.013)
SSVM:              F1 (0.865, 0.016), pairsF1 (0.607, 0.026), Tau (0.822, 0.017)
SSVM-NoTransition: F1 (0.869, 0.017), pairsF1 (0.589, 0.026), Tau (0.819, 0.018)

Transition feature scaling

Experiment

Scale transition features, i.e. the log probabilities factorised according to a number of features, to range $(-1,1)$ linearly.
The Osaka dataset is used and the performance is evaluated using leave-one-out cross validation.

Conclusion

SSVM with RankSVM weights

Experiment

This experiment plug the weights of a trained RankSVM, i.e. a vector $w$ and $|w|$ is the number of POI (and query) features, in SSVM and do prediction, this means we ignore transition features and tie weights among different POIs.
A dedicated notebook ssvm_ranksvm_weights.ipynb is created for this experiment.

Conclusion

We found that SSVM with RankSVM weights performance very similar to RankSVM, in fact, when using the list Viterbi algorithm, the predicted trajectory given any query (in leave-one-out cross validation) contains the exactly same set of POIs as the predition by RankSVM, except the visiting order of these POIs is different.
Features are scaled using MinMaxScaler (scikit-learn), and scaled to range $(-1,1)$ linearly, which is the scaling method used in RankSVM (the libsvm implementation).

Sample results:

As POI features for both RankSVM and SSVM are the same, so we can use the inference procedure of SSVM with RankSVM weights. (All transition features are set to zero)
The weights of RankSVM are computed from the leave-one-out, i.e., a vector of weights for each query in training set.
The dataset used here is Glasgow.
The performance of RankSVM:
F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030)
The performance of SSVM using the weights of the above RankSVM:
SSVM-Viterbi: F1 (0.713, 0.032), pairsF1 (0.575, 0.046), Tau (0.740, 0.031)
SSVM-listViterbi: F1 (0.765, 0.026), pairsF1 (0.550, 0.046), Tau (0.739, 0.030)

When the MaxAbsScaler was used, SSVM performed much worse.

SSVM vs RankSVM on generated dataset (Check if multi-label hurts)

Experiment

POI features and transition features are computed on Glasgow dataset, trajectories (i.e., labels) are from generated dataset (#trajectories=125).
POI and transition features are computed from generated data (NO duration related features)
Performance are evaluated using leave-one-out cross validation.
Regularisation parameter $C$ are tuned using Monte-Carlo cross validation at the beginning, then the same $C$ was used for all leave-one-out cross validations.
Prediction inference in SSVM is the list Viterbi algorithm
A dedicated notebook generated_data.ipynb is created for this experiment.

Conclusion

Neither SSVM nor RankSVM can achieve nearly perfect performance, which mean multi-label problem hurt both models.
RankSVM performs better than SSVM.

Sample results:

SSVM:    F1 (0.699, 0.015), pairsF1 (0.416, 0.021), Tau (0.643, 0.016)
RankSVM: F1 (0.881, 0.011), pairsF1 (0.572, 0.019), Tau (0.829, 0.012)