Reproduce the IJCAI15 Paper

NOTE: Before running this notebook, please run script src/ijcai15_setup.py to setup data properly.

1. Dataset ⇈

The dataset used in this paper can be downloaded here, the summary of this dataset is also available. Unfortunately, one critical portion of information is missing, i.e. the geo-coordinates of each POI, which is necessary for calculating the travel time from one POI to another. However, it could be approximated by averaging (longitude, latitude) of all photos mapped to a specific POI by retriving coordinates of all photos in this dataset from the original YFCC100M dataset using photoID.

Simple statistics of this dataset

City	#POIs	#Users	#POI_Visits	#Travel_Sequences
Toronto	29	1,395	39,419	6,057
Osaka	27	450	7,747	1,115
Glasgow	27	601	11,434	2,227
Edinburgh	28	1,454	33,944	5,028

NOTE: the number of photos for each city described in the paper is NOT available in this dataset

1.1 Compute POI Coordinates

To compute the mean value of coordinates for all photos mapped to a POI, we need to search the coordinates for each photo from the 100 million records. To accelerate the searching process, first extract the photo id, longitude and latitude columns from the whole dataset

cut -d $'\t' -f1,11,12 yfcc100m_dataset >> dataset.yfcc

and then import them to a database which was created by the following SQL scripts

CREATE DATABASE yfcc100m;
CREATE TABLE yfcc100m.tdata(
    pv_id           BIGINT UNSIGNED NOT NULL UNIQUE PRIMARY KEY, /* Photo/video identifier */
    longitude       FLOAT,  /* Longitude */
    latitude        FLOAT   /* Latitude */
);
COMMIT;

Python scripts to import these data to DB looks like

import mysql.connector as db
def import_data(fname):
    dbconnection = db.connect(user='USERNAME', password='PASSWORD')
    cursor = dbconnection.cursor()
    with open(fname, 'r') as f:
        for line in f:
            items = line.split('\t')
            assert(len(items) == 3)
            pv_id = items[0].strip()
            lon   = items[1].strip()
            lat   = items[2].strip()
            if len(lon) == 0 or len(lat) == 0:
                continue
            sqlstr = 'INSERT INTO yfcc100m.tdata VALUES (' + pv_id + ', ' + lon + ', ' + lat + ')' 
            try:
                cursor.execute(sqlstr)
            except db.Error as error:
                print('ERROR: {}'.format(error))
    dbconnection.commit()
    dbconnection.close()

Python scripts to search coordinates for photos looks like

import mysql.connector as db
def search_coords(fin, fout):
    dbconnection = db.connect(user='USERNAME', password='PASSWORD', database='yfcc100m')
    cursor = dbconnection.cursor()
    with open(fout, 'w') as fo:
        with open(fin, 'r') as fi:
            for line in fi:
                items = line.split(';')
                assert(len(items) == 7)
                photoID = items[0].strip()
                sqlstr = 'SELECT longitude, latitude FROM tdata WHERE pv_id = ' + photoID
                cursor.execute(sqlstr)
                for longitude, latitude in cursor:
                    fo.write(photoID + ';' + str(longitude) + ';' + str(latitude) + '\n')
    dbconnection.commit()
    dbconnection.close()

The above retrived results are available and will be downloaded automatically by executing scripts src/ijcai15_setup.py.

2. Trajectory Recommendation Problem

2.1 Definitions

For user $u$ and POI $p$, define

Travel History: \begin{equation*} S_u = \{(p_1, t_{p_1}^a, t_{p_1}^d), \dots, (p_n, t_{p_n}^a, t_{p_n}^d)\} \end{equation*} where $t_{p_i}^a$ is the arrival time and $t_{p_i}^d$ the departure time of user $u$ at POI $p_i$

Travel Sequences: split $S_u$ if \begin{equation*} |t_{p_i}^d - t_{p_{i+1}}^a| > \tau ~(\text{e.g.}~ \tau = 8 ~\text{hours}) \end{equation*}

POI Popularity: \begin{equation*} Pop(p) = \sum_{u \in U} \sum_{p_i \in S_u} \delta(p_i == p) \end{equation*}

Average POI Visit Duration: \begin{equation*} \bar{V}(p) = \frac{1}{N} \sum_{u \in U} \sum_{p_i \in S_u} (t_{p_i}^d - t_{p_i}^a) \delta(p_i == p) \end{equation*} where $N$ is #visits of POI $p$ by all users

Define the interest of user $u$ in POI category $c$ as

Time based User Interest: \begin{equation*} Int^{Time}(u, c) = \sum_{p_i \in S_u} \frac{(t_{p_i}^d - t_{p_i}^a)}{\bar{V}(p_i)} \delta(Cat_{p_i} == c) \end{equation*} where $Cat_{p_i}$ is the category of POI $p_i$
we also tried this one \begin{equation*} Int^{Time}(u, c) = \frac{1}{n} \sum_{p_i \in S_u} \frac{(t_{p_i}^d - t_{p_i}^a)}{\bar{V}(p_i)} \delta(Cat_{p_i} == c) \end{equation*} where $n$ is the number of visit of category $c$ by user $u$ (i.e. the frequency base user interest defined below), switch between the two definitions here.

Frequency based User Interest: \begin{equation*} Int^{Freq}(u, c) = \sum_{p_i \in S_u} \delta(Cat_{p_i} == c) \end{equation*}

Evaluation metrics: Let $P_r$ be the set of POIs of the recommended trajectory and $P_v$ be the set of POIs visited in real-life travel sequence.

Tour Recall: \begin{equation} \text{Recall} = \frac{|P_r \cap P_v|}{|P_v|} \end{equation}

Tour Precision: \begin{equation} \text{Precision} = \frac{|P_r \cap P_v|}{|P_r|} \end{equation}

Tour F1-score: \begin{equation} \text{F1-score} = \frac{2 \times \text{Precsion} \times \text{Recall}}{\text{Precsion} + \text{Recall}} \end{equation}

2.2 Problem Formulation

The paper formulates the itinerary recommendation problem as an Integer Linear Programming(ILP) as follows.

Given a set of POIs, time budget $B$, the starting/destination POI $p_1$/$p_N$, recommend a trajectory $(p_1,\dots,p_N)$ to user $u$ that \begin{equation*} \text{Maximize} \sum_{i=2}^{N-1} \sum_{j=2}^{N} x_{i,j} \left(\eta Int(u, Cat_{p_i}) + (1-\eta) Pop(p_i)\right) \end{equation*} Subject to \begin{align*} %x_{i,j} \in \{0, 1\}, & \forall i,j = 1,\dots,N \\ \sum_{j=2}^N x_{1,j} &= \sum_{i=1}^{N-1} x_{i,N} = 1 \\ %\text{(starts/ends at $p_1$/$p_N$)} \\ \sum_{i=1}^{N-1} x_{i,k} &= \sum_{j=2}^{N} x_{k,j} \le 1, \forall k = 2,\dots,N-1 \\ %\text{(connected, enters/leaves $p_k$ at most once)} %q_i \in \{2,\dots,N\}, & \forall i = 2,\dots,N \\ q_i - q_j + 1 &\le (N-1)(1-x_{i,j}), \forall i,j = 2,\dots,N \\ %\text{sub-tour elimination} \\ %\sum_{i=1}^{N-1} \sum_{j=2}^N x_{i,j} Cost(i,j) \le B %\text{(budget constraint)} %\sum_{i=1}^{N-1} \sum_{j=2}^N x_{i,j} \left(T^{Travel}(p_i, p_j) + Int(u, Cat_{p_j}) * \bar{V}(p_j) \right) & \le B \sum_{i=1}^{N-1} \sum_{j=2}^N x_{i,j} & \left(Time(p_i, p_j) + Int(u, Cat_{p_j}) * \bar{V}(p_j) \right) \le B \end{align*}

We use a Python library called PuLP from the COIN-OR project to model the integer programs. PuLP enables many LP solvers such as GLPK, CBC, CPLEX and Gurobi to be called to solve the model. Its comprehensive documentation is available here.

3. Prepare Data

3.1 Load Trajectory Data



In [1]:

    
%matplotlib inline

import os
import re
import sys
import math
import pulp
import random
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime



In [2]:

    
speed = 4 # 4km/h
random.seed(123456789)



In [3]:

    
data_dir = 'data/data-ijcai15'
#fvisit = os.path.join(data_dir, 'userVisits-Osak.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Osak.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Glas.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Glas.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Edin.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Edin.csv')
fvisit = os.path.join(data_dir, 'userVisits-Toro.csv')
fcoord = os.path.join(data_dir, 'photoCoords-Toro.csv')



In [4]:

    
suffix = fvisit.split('-')[-1].split('.')[0]
frecseq = os.path.join(data_dir, 'reccommendSeq-' + suffix + '.pkl')



In [5]:

    
visits = pd.read_csv(fvisit, sep=';')
visits.head()









    Out[5]:






  
    
      
      photoID
      userID
      dateTaken
      poiID
      poiTheme
      poiFreq
      seqID
    
  
  
    
      0
      7941504100
      10007579@N00
      1346844688
      30
      Structure
      1538
      1
    
    
      1
      4886005532
      10012675@N05
      1142731848
      6
      Cultural
      986
      2
    
    
      2
      4886006468
      10012675@N05
      1142732248
      6
      Cultural
      986
      2
    
    
      3
      4885404441
      10012675@N05
      1142732373
      6
      Cultural
      986
      2
    
    
      4
      4886008334
      10012675@N05
      1142732445
      6
      Cultural
      986
      2



In [6]:

    
coords = pd.read_csv(fcoord, sep=';')
coords.head()



In [7]:

    
# merge data frames according to column 'photoID'
assert(visits.shape[0] == coords.shape[0])
traj = pd.merge(visits, coords, on='photoID')
traj.head()









    Out[7]:






  
    
      
      photoID
      userID
      dateTaken
      poiID
      poiTheme
      poiFreq
      seqID
      photoLon
      photoLat
    
  
  
    
      0
      7941504100
      10007579@N00
      1346844688
      30
      Structure
      1538
      1
      -79.380844
      43.645641
    
    
      1
      4886005532
      10012675@N05
      1142731848
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      2
      4886006468
      10012675@N05
      1142732248
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      3
      4885404441
      10012675@N05
      1142732373
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      4
      4886008334
      10012675@N05
      1142732445
      6
      Cultural
      986
      2
      -79.391525
      43.654335



In [8]:

    
pd.DataFrame([traj[['photoLon', 'photoLat']].min(), traj[['photoLon', 'photoLat']].max(), \
             traj[['photoLon', 'photoLat']].max() - traj[['photoLon', 'photoLat']].min()], \
             index = ['min', 'max', 'range'])



In [9]:

    
plt.figure(figsize=[15, 5])
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.scatter(traj['photoLon'], traj['photoLat'], marker='+')









    Out[9]:





<matplotlib.collections.PathCollection at 0x7fe519139cf8>



In [10]:

    
num_photo = traj['photoID'].unique().shape[0]
num_user = traj['userID'].unique().shape[0]
num_seq = traj['seqID'].unique().shape[0]
num_poi = traj['poiID'].unique().shape[0]
pd.DataFrame([num_photo, num_user, num_seq, num_poi, num_photo/num_user, num_seq/num_user], \
             index = ['#photo', '#user', '#seq', '#poi', '#photo/user', '#seq/user'], columns=[str(suffix)])









    Out[10]:






  
    
      
      Toro
    
  
  
    
      #photo
      39419.000000
    
    
      #user
      1395.000000
    
    
      #seq
      6057.000000
    
    
      #poi
      29.000000
    
    
      #photo/user
      28.257348
    
    
      #seq/user
      4.341935

3.2 Compute POI Info

Compute POI (Longitude, Latitude) as the average coordinates of the assigned photos.



In [11]:

    
poi_coords = traj[['poiID', 'photoLon', 'photoLat']].groupby('poiID').agg(np.mean)
poi_coords.reset_index(inplace=True)
poi_coords.rename(columns={'photoLon':'poiLon', 'photoLat':'poiLat'}, inplace=True)
poi_coords.head()

Extract POI category and visiting frequency.



In [12]:

    
poi_catfreq = traj[['poiID', 'poiTheme', 'poiFreq']].groupby('poiID').first()
poi_catfreq.reset_index(inplace=True)
poi_catfreq.head()



In [13]:

    
poi_all = pd.merge(poi_catfreq, poi_coords, on='poiID')
poi_all.set_index('poiID', inplace=True)
poi_all.head()

3.3 Construct Travelling Sequences



In [14]:

    
seq_all = traj[['userID', 'seqID', 'poiID', 'dateTaken']].copy()\
          .groupby(['userID', 'seqID', 'poiID']).agg([np.min, np.max])
seq_all.head()









    Out[14]:






  
    
      
      
      
      dateTaken
    
    
      
      
      
      amin
      amax
    
    
      userID
      seqID
      poiID
      
      
    
  
  
    
      10007579@N00
      1
      30
      1346844688
      1346844688
    
    
      10012675@N05
      2
      6
      1142731848
      1142732445
    
    
      3
      6
      1142916492
      1142916492
    
    
      4
      13
      1319327174
      1319332848
    
    
      10014440@N06
      5
      24
      1196128621
      1196128878



In [15]:

    
seq_all.columns = seq_all.columns.droplevel()
seq_all.head()









    Out[15]:






  
    
      
      
      
      amin
      amax
    
    
      userID
      seqID
      poiID
      
      
    
  
  
    
      10007579@N00
      1
      30
      1346844688
      1346844688
    
    
      10012675@N05
      2
      6
      1142731848
      1142732445
    
    
      3
      6
      1142916492
      1142916492
    
    
      4
      13
      1319327174
      1319332848
    
    
      10014440@N06
      5
      24
      1196128621
      1196128878



In [16]:

    
seq_all.reset_index(inplace=True)
seq_all.head()









    Out[16]:






  
    
      
      userID
      seqID
      poiID
      amin
      amax
    
  
  
    
      0
      10007579@N00
      1
      30
      1346844688
      1346844688
    
    
      1
      10012675@N05
      2
      6
      1142731848
      1142732445
    
    
      2
      10012675@N05
      3
      6
      1142916492
      1142916492
    
    
      3
      10012675@N05
      4
      13
      1319327174
      1319332848
    
    
      4
      10014440@N06
      5
      24
      1196128621
      1196128878



In [17]:

    
seq_all.rename(columns={'amin':'arrivalTime', 'amax':'departureTime'}, inplace=True)
seq_all['poiDuration(sec)'] = seq_all['departureTime'] - seq_all['arrivalTime']
seq_all.head()









    Out[17]:






  
    
      
      userID
      seqID
      poiID
      arrivalTime
      departureTime
      poiDuration(sec)
    
  
  
    
      0
      10007579@N00
      1
      30
      1346844688
      1346844688
      0
    
    
      1
      10012675@N05
      2
      6
      1142731848
      1142732445
      597
    
    
      2
      10012675@N05
      3
      6
      1142916492
      1142916492
      0
    
    
      3
      10012675@N05
      4
      13
      1319327174
      1319332848
      5674
    
    
      4
      10014440@N06
      5
      24
      1196128621
      1196128878
      257



In [18]:

    
#tseq = seq_all[['poiID', 'poiDuration(sec)']].copy().groupby('poiID').agg(np.mean)
#tseq



In [19]:

    
seq_user = seq_all[['seqID', 'userID']].copy()
seq_user = seq_user.groupby('seqID').first()
seq_user.head()









    Out[19]:






  
    
      
      userID
    
    
      seqID
      
    
  
  
    
      1
      10007579@N00
    
    
      2
      10012675@N05
    
    
      3
      10012675@N05
    
    
      4
      10012675@N05
    
    
      5
      10014440@N06



In [20]:

    
seq_len = seq_all[['userID', 'seqID', 'poiID']].copy()
seq_len = seq_len.groupby(['userID', 'seqID']).agg(np.size)
seq_len.reset_index(inplace=True)
seq_len.rename(columns={'poiID':'seqLen'}, inplace=True)
#seq_len.head()
ax = seq_len['seqLen'].hist(bins=20)
ax.set_yscale('log')

3.4 Transition Matrix

3.4.1 Transition Matrix for Time at POI



In [21]:

    
users = seq_all['userID'].unique()
transmat_time = pd.DataFrame(np.zeros((len(users), poi_all.index.shape[0]), dtype=np.float64), \
                             index=users, columns=poi_all.index)



In [22]:

    
poi_time = seq_all[['userID', 'poiID', 'poiDuration(sec)']].copy().groupby(['userID', 'poiID']).agg(np.sum)
poi_time.head()









    Out[22]:






  
    
      
      
      poiDuration(sec)
    
    
      userID
      poiID
      
    
  
  
    
      10007579@N00
      30
      0
    
    
      10012675@N05
      6
      597
    
    
      13
      5674
    
    
      10014440@N06
      4
      0
    
    
      17
      31482



In [23]:

    
for idx in poi_time.index:
    transmat_time.loc[idx[0], idx[1]] += poi_time.loc[idx].iloc[0]
print(transmat_time.shape)
transmat_time.head()









    



(1395, 29)






    Out[23]:






  
    
      poiID
      1
      2
      3
      4
      6
      7
      8
      9
      10
      11
      ...
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
    
  
  
    
      10007579@N00
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      10012675@N05
      0
      0
      0
      0
      597
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      10014440@N06
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      3674
      2287
      7991
      0
      0
      0
      0
      0
    
    
      10031363@N00
      0
      0
      0
      0
      0
      84
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      10116041@N02
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

5 rows × 29 columns



In [24]:

    
# add 1 (sec) to each cell as a smooth factor
log10_transmat_time = np.log10(transmat_time.copy() + 1)
print(log10_transmat_time.shape)
log10_transmat_time.head()









    



(1395, 29)






    Out[24]:






  
    
      poiID
      1
      2
      3
      4
      6
      7
      8
      9
      10
      11
      ...
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
    
  
  
    
      10007579@N00
      0
      0
      0
      0
      0.000000
      0.000000
      0
      0
      0
      0
      ...
      0
      0
      0.000000
      0.000000
      0.000000
      0
      0
      0
      0
      0
    
    
      10012675@N05
      0
      0
      0
      0
      2.776701
      0.000000
      0
      0
      0
      0
      ...
      0
      0
      0.000000
      0.000000
      0.000000
      0
      0
      0
      0
      0
    
    
      10014440@N06
      0
      0
      0
      0
      0.000000
      0.000000
      0
      0
      0
      0
      ...
      0
      0
      3.565257
      3.359456
      3.902655
      0
      0
      0
      0
      0
    
    
      10031363@N00
      0
      0
      0
      0
      0.000000
      1.929419
      0
      0
      0
      0
      ...
      0
      0
      0.000000
      0.000000
      0.000000
      0
      0
      0
      0
      0
    
    
      10116041@N02
      0
      0
      0
      0
      0.000000
      0.000000
      0
      0
      0
      0
      ...
      0
      0
      0.000000
      0.000000
      0.000000
      0
      0
      0
      0
      0
    
  

5 rows × 29 columns

3.4.2 Transition Matrix for POI Category



In [25]:

    
poi_cats = traj['poiTheme'].unique().tolist()
poi_cats.sort()
poi_cats









    Out[25]:





['Amusement', 'Beach', 'Cultural', 'Shopping', 'Sport', 'Structure']



In [26]:

    
ncats = len(poi_cats)
transmat_cat = pd.DataFrame(data=np.zeros((ncats, ncats), dtype=np.float64), index=poi_cats, columns=poi_cats)



In [29]:

    
for seqid in seq_all['seqID'].unique().tolist():
    seqi = seq_all[seq_all['seqID'] == seqid].copy()
    seqi.sort(columns=['arrivalTime'], ascending=True, inplace=True)
    for j in range(len(seqi.index)-1):
        idx1 = seqi.index[j]
        idx2 = seqi.index[j+1]
        poi1 = seqi.loc[idx1, 'poiID']
        poi2 = seqi.loc[idx2, 'poiID']
        cat1 = poi_all.loc[poi1, 'poiTheme']
        cat2 = poi_all.loc[poi2, 'poiTheme']
        transmat_cat.loc[cat1, cat2] += 1
transmat_cat

Normalise each row to get an estimate of transition probabilities (MLE).



In [30]:

    
for r in transmat_cat.index:
    rowsum = transmat_cat.ix[r].sum()
    if rowsum == 0: continue  # deal with lack of data
    transmat_cat.loc[r] /= rowsum
transmat_cat

Compute the log of transition probabilities with smooth factor $\epsilon=10^{-12}$.



In [31]:

    
log10_transmat_cat = np.log10(transmat_cat.copy() + 1e-12)
log10_transmat_cat

4. Trajectory Recommendation -- Approach I

A different leave-one-out cross-validation approach:

For each user, choose one trajectory (with length >= 3) uniformly at random from all of his/her trajectories as the validation trajectory
Use all other trajectories (of all users) to 'train' (i.e. compute metrics for the ILP formulation)

4.1 Choose Cross Validation Sequences



In [32]:

    
cv_seqs = seq_all[['userID', 'seqID', 'poiID']].copy().groupby(['userID', 'seqID']).agg(np.size)
cv_seqs.rename(columns={'poiID':'seqLen'}, inplace=True)
cv_seqs = cv_seqs[cv_seqs['seqLen'] > 2]
cv_seqs.reset_index(inplace=True)
print(cv_seqs.shape)
cv_seqs.head()









    



(335, 3)






    Out[32]:






  
    
      
      userID
      seqID
      seqLen
    
  
  
    
      0
      10502709@N05
      58
      3
    
    
      1
      10502709@N05
      67
      3
    
    
      2
      10502709@N05
      71
      4
    
    
      3
      10627620@N06
      99
      3
    
    
      4
      10627620@N06
      100
      3



In [33]:

    
cv_seq_set = []



In [34]:

    
# choose one sequence for each user in cv_seqs uniformly at random
for user in cv_seqs['userID'].unique():
    seqlist = cv_seqs[cv_seqs['userID'] == user]['seqID'].tolist()
    seqid = random.choice(seqlist)
    cv_seq_set.append(seqid)



In [35]:

    
len(cv_seq_set)









    Out[35]:





196

4.2 Recommendation by Solving ILPs



In [37]:

    
def calc_poi_info(seqid_set, seq_all, poi_all):
    poi_info = seq_all[seq_all['seqID'].isin(seqid_set)][['poiID', 'poiDuration(sec)']].copy()
    poi_info = poi_info.groupby('poiID').agg([np.mean, np.size])
    poi_info.columns = poi_info.columns.droplevel()
    poi_info.reset_index(inplace=True)
    poi_info.rename(columns={'mean':'avgDuration(sec)', 'size':'popularity'}, inplace=True)
    poi_info.set_index('poiID', inplace=True)
    poi_info['poiTheme'] = poi_all.loc[poi_info.index, 'poiTheme']
    poi_info['poiLon'] = poi_all.loc[poi_info.index, 'poiLon']
    poi_info['poiLat'] = poi_all.loc[poi_info.index, 'poiLat']
    return poi_info.copy()



In [38]:

    
def calc_user_interest(seqid_set, seq_all, poi_all, poi_info):
    user_interest = seq_all[seq_all['seqID'].isin(seqid_set)][['userID', 'poiID', 'poiDuration(sec)']].copy()
    user_interest['timeRatio'] = [poi_info.loc[x, 'avgDuration(sec)'] for x in user_interest['poiID']]
    user_interest['timeRatio'] = user_interest['poiDuration(sec)'] / user_interest['timeRatio']
    user_interest['poiTheme'] = [poi_all.loc[x, 'poiTheme'] for x in user_interest['poiID']]
    user_interest.drop(['poiID', 'poiDuration(sec)'], axis=1, inplace=True)
    user_interest = user_interest.groupby(['userID', 'poiTheme']).agg([np.sum, np.size]) # the sum
    user_interest.columns = user_interest.columns.droplevel()
    user_interest.rename(columns={'sum':'timeBased', 'size':'freqBased'}, inplace=True)
    user_interest.reset_index(inplace=True)
    user_interest.set_index(['userID', 'poiTheme'], inplace=True)
    return user_interest.copy()



In [39]:

    
def calc_dist(longitude1, latitude1, longitude2, latitude2):
    """Calculate the distance (unit: km) between two places on earth"""
    # convert degrees to radians
    lon1 = math.radians(longitude1)
    lat1 = math.radians(latitude1)
    lon2 = math.radians(longitude2)
    lat2 = math.radians(latitude2)
    radius = 6371.009 # mean earth radius is 6371.009km, en.wikipedia.org/wiki/Earth_radius#Mean_radius
    # The haversine formula, en.wikipedia.org/wiki/Great-circle_distance
    dlon = math.fabs(lon1 - lon2)
    dlat = math.fabs(lat1 - lat2)
    return 2 * radius * math.asin(math.sqrt(\
               (math.sin(0.5*dlat))**2 + math.cos(lat1) * math.cos(lat2) * (math.sin(0.5*dlon))**2 ))



In [40]:

    
def calc_dist_mat(poi_info):
    poi_dist_mat = pd.DataFrame(data=np.zeros((poi_info.shape[0], poi_info.shape[0]), dtype=np.float64), \
                            index=poi_info.index, columns=poi_info.index)
    for i in range(poi_info.index.shape[0]):
        for j in range(i+1, poi_info.index.shape[0]):
            r = poi_info.index[i]
            c = poi_info.index[j]
            dist = calc_dist(poi_info.loc[r, 'poiLon'], poi_info.loc[r, 'poiLat'], \
                             poi_info.loc[c, 'poiLon'], poi_info.loc[c, 'poiLat'])
            assert(dist > 0.)
            poi_dist_mat.loc[r, c] = dist
            poi_dist_mat.loc[c, r] = dist
    return poi_dist_mat



In [41]:

    
def calc_seq_budget(user, seq, poi_info, poi_dist_mat, user_interest):
    """Calculate the travel budget for the given travelling sequence"""
    assert(len(seq) > 1)
    budget = 0. # travel budget
    for i in range(len(seq)-1):
        px = seq[i]
        py = seq[i+1]
        assert(px in poi_info.index)
        assert(py in poi_info.index)
        budget += 60 * 60 * poi_dist_mat.loc[px, py] / speed  # travel time (seconds)
        caty = poi_info.loc[py, 'poiTheme']
        avgtime = poi_info.loc[py, 'avgDuration(sec)']
        userint = 0
        if (user, caty) in user_interest.index: userint = user_interest.loc[user, caty] # for testing set
        budget += userint * avgtime  # expected visit duration
    return budget



In [42]:

    
def recommend_ILP(user, budget, startPoi, endPoi, poi_info, poi_dist_mat, eta, speed, user_interest):
    assert(0 <= eta <= 1); assert(budget > 0)
    p0 = str(startPoi); pN = str(endPoi); N = poi_info.index.shape[0]
    
    # REF: pythonhosted.org/PuLP/index.html
    pois = [str(p) for p in poi_info.index] # create a string list for each POI
    prob = pulp.LpProblem('TourRecommendation', pulp.LpMaximize) # create problem
    # visit_i_j = 1 means POI i and j are visited in sequence
    visit_vars = pulp.LpVariable.dicts('visit', (pois, pois), 0, 1, pulp.LpInteger) 
    # a dictionary contains all dummy variables
    dummy_vars = pulp.LpVariable.dicts('u', [x for x in pois if x != p0], 2, N, pulp.LpInteger)

    # add objective
    objlist = []
    for pi in [x for x in pois if x not in {p0, pN}]:
        for pj in [y for y in pois if y != p0]:
            cati = poi_info.loc[int(pi), 'poiTheme']
            userint = 0; poipop = 0
            if (user, cati) in user_interest.index: userint = user_interest.loc[user, cati]
            if int(pi) in poi_info.index: poipop = poi_info.loc[int(pi), 'popularity']
            objlist.append(visit_vars[pi][pj] * (eta * userint + (1.-eta) * poipop))
    prob += pulp.lpSum(objlist), 'Objective'
    
    # add constraints, each constraint should be in ONE line
    prob += pulp.lpSum([visit_vars[p0][pj] for pj in pois if pj != p0]) == 1, 'StartAtp0'
    prob += pulp.lpSum([visit_vars[pi][pN] for pi in pois if pi != pN]) == 1, 'EndAtpN'
    for pk in [x for x in pois if x not in {p0, pN}]:
        prob += pulp.lpSum([visit_vars[pi][pk] for pi in pois if pi != pN]) == \
                pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]), 'Connected_' + pk
        prob += pulp.lpSum([visit_vars[pi][pk] for pi in pois if pi != pN]) <= 1, 'LeaveAtMostOnce_' + pk
        prob += pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]) <= 1, 'EnterAtMostOnce_' + pk
    
    costlist = []
    for pi in [x for x in pois if x != pN]:
        for pj in [y for y in pois if y != p0]:
            catj = poi_info.loc[int(pj), 'poiTheme']
            traveltime = 60 * 60 * poi_dist_mat.loc[int(pi), int(pj)] / speed # seconds
            userint = 0; avgtime = 0
            if (user, catj) in user_interest.index: userint = user_interest.loc[user, catj]
            if int(pj) in poi_info.index: avgtime = poi_info.loc[int(pj), 'avgDuration(sec)']
            costlist.append(visit_vars[pi][pj] * (traveltime + userint * avgtime))
    prob += pulp.lpSum(costlist) <= budget, 'WithinBudget'
    
    for pi in [x for x in pois if x != p0]:
        for pj in [y for y in pois if y != p0]:
            prob += dummy_vars[pi] - dummy_vars[pj] + 1 <= (N - 1) * (1 - visit_vars[pi][pj]), \
                    'SubTourElimination_' + str(pi) + '_' + str(pj)

    # solve problem
    #prob.solve() # using PuLP's default solver
    #prob.solve(pulp.PULP_CBC_CMD(options=['-threads', '8', '-strategy', '1', '-maxIt', '2000000'])) # CBC
    #prob.solve(pulp.GLPK_CMD()) # GLPK
    gurobi_options = [('TimeLimit', '7200'), ('Threads', '18'), ('NodefileStart', '0.9'), ('Cuts', '2')]
    prob.solve(pulp.GUROBI_CMD(options=gurobi_options)) # GUROBI
    
    print('status:', pulp.LpStatus[prob.status]) # print the status of the solution
    #print('obj:', pulp.value(prob.objective)) # print the optimised objective function value
    #for v in prob.variables(): # print each variable with it's resolved optimum value
    #    print(v.name, '=', v.varValue)
    #    if v.varValue != 0: print(v.name, '=', v.varValue)

    visit_mat = pd.DataFrame(data=np.zeros((len(pois), len(pois)), dtype=np.float), index=pois, columns=pois)
    for pi in pois:
        for pj in pois: visit_mat.loc[pi, pj] = visit_vars[pi][pj].varValue

    # build the recommended trajectory
    recseq = [p0]
    while True:
        pi = recseq[-1]
        pj = visit_mat.loc[pi].idxmax()
        assert(round(visit_mat.loc[pi, pj]) == 1)
        recseq.append(pj); 
        #print(recseq); sys.stdout.flush()
        if pj == pN: return [int(x) for x in recseq]



In [43]:

    
cv_seq_dict = dict()
rec_seq_dict = dict()



In [46]:

    
for seqid in cv_seq_set:
    seqi = seq_all[seq_all['seqID'] == seqid].copy()
    seqi.sort(columns=['arrivalTime'], ascending=True, inplace=True)
    cv_seq_dict[seqid] = seqi['poiID'].tolist()



In [47]:

    
eta = 0.5
time_based = True



In [48]:

    
doCompute = True



In [49]:

    
if os.path.exists(frecseq):
    seq_dict = pickle.load(open(frecseq, 'rb'))
    if (np.array(sorted(cv_seq_dict.keys())) == np.array(sorted(seq_dict.keys()))).all():
        rec_seq_dict = seq_dict
        doCompute = False



In [50]:

    
if doCompute:
    n = 1
    print('#sequences', len(cv_seq_set))
    for seqid, seq in cv_seq_dict.items():
        train_set = [x for x in seq_all['seqID'].unique() if x != seqid]
        poi_info = calc_poi_info(train_set, seq_all, poi_all)
        user_interest = calc_user_interest(train_set, seq_all, poi_all, poi_info)
        poi_dist_mat = calc_dist_mat(poi_info)
        user = seq_user.loc[seqid].iloc[0]
        the_user_interest = None
        if time_based == True: the_user_interest = user_interest['timeBased'].copy()
        else: the_user_interest = user_interest['freqBased'].copy()
        budget = calc_seq_budget(user, seq, poi_info, poi_dist_mat, the_user_interest)
        print(n, 'sequence', seq, ', user', user, ', budget', budget); sys.stdout.flush()

        recseq = recommend_ILP(user, budget, seq[0], seq[-1], poi_info, poi_dist_mat, eta, speed, the_user_interest)
        rec_seq_dict[seqid] = recseq
        print('->', recseq, '\n'); sys.stdout.flush()
        n += 1
        
    pickle.dump(rec_seq_dict, open(frecseq, 'wb'))

4.3 Evaluation

Results from paper (Toronto data, time-based uesr interest, eta=0.5):

Recall: 0.779±0.10
Precision: 0.706±0.013
F1-score: 0.732±0.012



In [51]:

    
def calc_recall_precision_F1score(seq_act, seq_rec):
    assert(len(seq_act) > 0)
    assert(len(seq_rec) > 0)
    actset = set(seq_act)
    recset = set(seq_rec)
    intersect = actset & recset
    recall = len(intersect) / len(seq_act)
    precision = len(intersect) / len(seq_rec)
    F1score = 2. * precision * recall / (precision + recall)
    return recall, precision, F1score



In [52]:

    
recall = []
precision = []
F1score = []



In [53]:

    
for seqid in rec_seq_dict.keys():
    assert(seqid in cv_seq_dict)
    seq = cv_seq_dict[seqid]
    recseq = rec_seq_dict[seqid]
    r, p, F1 = calc_recall_precision_F1score(seq, recseq)
    recall.append(r)
    precision.append(p)
    F1score.append(F1)



In [54]:

    
print('Recall:', np.mean(recall), np.std(recall))
print('Precision:', np.mean(precision), np.std(precision))
print('F1-score:', np.mean(F1score), np.std(F1score))









    



Recall: 0.799609248461 0.159423808706
Precision: 0.588031802739 0.321897593535
F1-score: 0.647169986114 0.278463638008

5. Trajectory Recommendation -- Approach II

The paper stated "We evaluate PERSTOUR and the baselines using leave-one-out cross-validation [Kohavi,1995] (i.e., when evaluating a specific travel sequence of a user, we use this user's other travel sequences for training our algorithms"

While it's not clear if this means when evaluate a travel sequence for a user,

all other sequences of this user (except the one for validation) as well as all sequences of other users are used for training, (i.e. the approach in the section above) or
use leave-one-out for each user to construct a testing set (the approach in this section)

5.1 Choose Travelling Sequences for Training and Testing

Trajectories with length greater than 3 are used in the paper.



In [558]:

    
seq_ge3 = seq_len[seq_len['seqLen'] >= 3]
seq_ge3['seqLen'].hist(bins=20)









    Out[558]:





<matplotlib.axes._subplots.AxesSubplot at 0x7ff1beef2c50>

Split travelling sequences into training set and testing set using leave-one-out for each user.
For testing purpose, users with less than two travelling sequences are not considered in this experiment.



In [559]:

    
train_set = []
test_set = []



In [560]:

    
user_seqs = seq_ge3[['userID', 'seqID']].groupby('userID')



In [561]:

    
for user, indices in user_seqs.groups.items():
    if len(indices) < 2: continue
    idx = random.choice(indices)
    test_set.append(seq_ge3.loc[idx, 'seqID'])
    train_set.extend([seq_ge3.loc[x, 'seqID'] for x in indices if x != idx])



In [562]:

    
print('#seq in trainset:', len(train_set))
print('#seq in testset:', len(test_set))
seq_ge3[seq_ge3['seqID'].isin(train_set)]['seqLen'].hist(bins=20)
#data = np.array(seqs1['seqLen'])
#hist, bins = np.histogram(data, bins=3)
#print(hist)









    



#seq in trainset: 139
#seq in testset: 44






    Out[562]:





<matplotlib.axes._subplots.AxesSubplot at 0x7ff19ba1bcc0>

Sanity check: the total number of travelling sequences used in training and testing



In [563]:

    
seq_exp = seq_ge3[['userID', 'seqID']].copy()
seq_exp = seq_exp.groupby('userID').agg(np.size)
seq_exp.reset_index(inplace=True)
seq_exp.rename(columns={'seqID':'#seq'}, inplace=True)
seq_exp = seq_exp[seq_exp['#seq'] > 1] # user with more than 1 sequences
print('total #seq for experiment:', seq_exp['#seq'].sum())
#seq_exp.head()









    



total #seq for experiment: 183

5.2 Compute POI popularity and user interest using training set

Compute average POI visit duration, POI popularity as defined at the top of the notebook.



In [564]:

    
poi_info = seq_all[seq_all['seqID'].isin(train_set)]
poi_info = poi_info[['poiID', 'poiDuration(sec)']].copy()



In [565]:

    
poi_info = poi_info.groupby('poiID').agg([np.mean, np.size])
poi_info.columns = poi_info.columns.droplevel()
poi_info.reset_index(inplace=True)
poi_info.rename(columns={'mean':'avgDuration(sec)', 'size':'popularity'}, inplace=True)
poi_info.set_index('poiID', inplace=True)
print('#poi:', poi_info.shape[0])
if poi_info.shape[0] < poi_all.shape[0]:
    extra_index = list(set(poi_all.index) - set(poi_info.index))
    extra_poi = pd.DataFrame(data=np.zeros((len(extra_index), 2), dtype=np.float64), \
                             index=extra_index, columns=['avgDuration(sec)', 'popularity'])
    poi_info = poi_info.append(extra_poi)
    print('#poi after extension:', poi_info.shape[0])
poi_info['poiTheme'] = poi_all.loc[poi_info.index, 'poiTheme']
poi_info['poiLon'] = poi_all.loc[poi_info.index, 'poiLon']
poi_info['poiLat'] = poi_all.loc[poi_info.index, 'poiLat']
poi_info.head()









    



#poi: 23
#poi after extension: 29






    Out[565]:






  
    
      
      avgDuration(sec)
      popularity
      poiTheme
      poiLon
      poiLat
    
    
      poiID
      
      
      
      
      
    
  
  
    
      1
      5045.600000
      10
      Sport
      -79.379243
      43.643183
    
    
      2
      3957.833333
      6
      Sport
      -79.418634
      43.632772
    
    
      3
      1003.428571
      14
      Sport
      -79.380045
      43.662175
    
    
      4
      3126.444444
      9
      Sport
      -79.389290
      43.641297
    
    
      6
      1826.384615
      13
      Cultural
      -79.392396
      43.653662

Compute time/frequency based user interest as defined at the top of the notebook.



In [567]:

    
user_interest = seq_all[seq_all['seqID'].isin(train_set)]
user_interest = user_interest[['userID', 'poiID', 'poiDuration(sec)']].copy()



In [568]:

    
user_interest['timeRatio'] = [poi_info.loc[x, 'avgDuration(sec)'] for x in user_interest['poiID']]
#user_interest[user_interest['poiID'].isin({9, 10, 12, 18, 20, 26})]
#user_interest[user_interest['timeRatio'] < 1]
user_interest.head()









    Out[568]:






  
    
      
      userID
      poiID
      poiDuration(sec)
      timeRatio
    
  
  
    
      71
      10502709@N05
      22
      2164
      2991.687500
    
    
      72
      10502709@N05
      23
      5133
      2725.558824
    
    
      73
      10502709@N05
      28
      17388
      2062.137931
    
    
      77
      10502709@N05
      21
      0
      2595.469697
    
    
      78
      10502709@N05
      22
      0
      2991.687500



In [569]:

    
user_interest['timeRatio'] = user_interest['poiDuration(sec)'] / user_interest['timeRatio']
user_interest.head()









    Out[569]:






  
    
      
      userID
      poiID
      poiDuration(sec)
      timeRatio
    
  
  
    
      71
      10502709@N05
      22
      2164
      0.723338
    
    
      72
      10502709@N05
      23
      5133
      1.883284
    
    
      73
      10502709@N05
      28
      17388
      8.432026
    
    
      77
      10502709@N05
      21
      0
      0.000000
    
    
      78
      10502709@N05
      22
      0
      0.000000



In [570]:

    
user_interest['poiTheme'] = [poi_all.loc[x, 'poiTheme'] for x in user_interest['poiID']]
user_interest.drop(['poiID', 'poiDuration(sec)'], axis=1, inplace=True)

Sum defined in paper, but sum of (time ratio) * (avg duration) will become extremely large in some cases, which is unrealistic, switch between the two to have a look at the effects.



In [571]:

    
#user_interest = user_interest.groupby(['userID', 'poiTheme']).agg([np.sum, np.size]) # the sum
user_interest = user_interest.groupby(['userID', 'poiTheme']).agg([np.mean, np.size]) # try the mean value



In [572]:

    
user_interest.columns = user_interest.columns.droplevel()
#user_interest.rename(columns={'sum':'timeBased', 'size':'freqBased'}, inplace=True)
user_interest.rename(columns={'mean':'timeBased', 'size':'freqBased'}, inplace=True)
user_interest.reset_index(inplace=True)
user_interest.set_index(['userID', 'poiTheme'], inplace=True)
user_interest.head()









    Out[572]:






  
    
      
      
      timeBased
      freqBased
    
    
      userID
      poiTheme
      
      
    
  
  
    
      10502709@N05
      Beach
      0.241113
      3
    
    
      Shopping
      0.941642
      2
    
    
      Structure
      4.297239
      2
    
    
      10627620@N06
      Beach
      0.071663
      1
    
    
      Shopping
      0.067876
      1



In [573]:

    
#user_interest.columns.shape[0]

5.3 Generate ILP



In [575]:

    
poi_dist_mat = pd.DataFrame(data=np.zeros((poi_info.shape[0], poi_info.shape[0]), dtype=np.float64), \
                            index=poi_info.index, columns=poi_info.index)
for i in range(poi_info.index.shape[0]):
    for j in range(i+1, poi_info.index.shape[0]):
        r = poi_info.index[i]
        c = poi_info.index[j]
        dist = calc_dist(poi_info.loc[r, 'poiLon'], poi_info.loc[r, 'poiLat'], \
                         poi_info.loc[c, 'poiLon'], poi_info.loc[c, 'poiLat'])
        assert(dist > 0.)
        poi_dist_mat.loc[r, c] = dist
        poi_dist_mat.loc[c, r] = dist



In [577]:

    
def generate_ILP(lpFilename, user, budget, startPoi, endPoi, poi_info, poi_dist_mat, eta, speed, user_interest):
    """Recommend a trajectory given an existing travel sequence S_N, 
       the first/last POI and travel budget calculated based on S_N
    """
    assert(0 <= eta <= 1)
    assert(budget > 0)
    p0 = str(startPoi)
    pN = str(endPoi)
    N = poi_info.index.shape[0]
    
    # The MIP problem
    # REF: pythonhosted.org/PuLP/index.html
    # create a string list for each POI
    pois = [str(p) for p in poi_info.index]

    # create problem
    prob = pulp.LpProblem('TourRecommendation', pulp.LpMaximize)

    # visit_i_j = 1 means POI i and j are visited in sequence
    visit_vars = pulp.LpVariable.dicts('visit', (pois, pois), 0, 1, pulp.LpInteger)

    # a dictionary contains all dummy variables
    dummy_vars = pulp.LpVariable.dicts('u', [x for x in pois if x != p0], 2, N, pulp.LpInteger)

    # add objective
    objlist = []
    for pi in [x for x in pois if x not in {p0, pN}]:
        for pj in [y for y in pois if y != p0]:
            cati = poi_info.loc[int(pi), 'poiTheme']
            userint = 0
            if (user, cati) in user_interest.index: userint = user_interest.loc[user, cati]
            objlist.append(visit_vars[pi][pj] * (eta * userint + (1.-eta) * poi_info.loc[int(pi), 'popularity']))
    prob += pulp.lpSum(objlist), 'Objective'
    # add constraints
    # each constraint should be in ONE line
    prob += pulp.lpSum([visit_vars[p0][pj] for pj in pois if pj != p0]) == 1, 'StartAtp0' # starts at the first POI
    prob += pulp.lpSum([visit_vars[pi][pN] for pi in pois if pi != pN]) == 1, 'EndAtpN'   # ends at the last POI
    for pk in [x for x in pois if x not in {p0, pN}]:
        prob += pulp.lpSum([visit_vars[pi][pk] for pi in pois if pi != pN]) == \
                pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]), \
                'Connected_' + pk                                                         # the itinerary is connected
        prob += pulp.lpSum([visit_vars[pi][pk] for pi in pois if pi != pN]) <= 1, \
                'LeaveAtMostOnce_' + pk                                                   # LEAVE POIk at most once
        prob += pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]) <= 1, \
                'EnterAtMostOnce_' + pk                                                   # ENTER POIk at most once
    
    # travel cost within budget
    costlist = []
    for pi in [x for x in pois if x != pN]:
        for pj in [y for y in pois if y != p0]:
            catj = poi_info.loc[int(pj), 'poiTheme']
            traveltime = 60 * 60 * poi_dist_mat.loc[int(pi), int(pj)] / speed # seconds
            userint = 0
            if (user, catj) in user_interest.index: userint = user_interest.loc[user, catj]
            costlist.append(visit_vars[pi][pj] * (traveltime + userint * poi_info.loc[int(pj), 'avgDuration(sec)']))
    prob += pulp.lpSum(costlist) <= budget, 'WithinBudget'
    
    for pi in [x for x in pois if x != p0]:
        for pj in [y for y in pois if y != p0]:
            prob += dummy_vars[pi] - dummy_vars[pj] + 1 <= \
                    (N - 1) * (1 - visit_vars[pi][pj]), \
                    'SubTourElimination_' + str(pi) + '_' + str(pj)               # TSP sub-tour elimination

    # write problem data to an .lp file
    prob.writeLP(lpFilename)

5.3.1 Generate ILPs for training set



In [88]:

    
def extract_seq(seqid_set, seq_all):
    """Extract the actual sequences (i.e. a list of POI) from a set of sequence ID"""
    seq_dict = dict()
    for seqid in seqid_set:
        seqi = seq_all[seq_all['seqID'] == seqid].copy()
        seqi.sort(columns=['arrivalTime'], ascending=True, inplace=True)
        seq_dict[seqid] = seqi['poiID'].tolist()
    return seq_dict



In [579]:

    
train_seqs = extract_seq(train_set, seq_all)



In [580]:

    
lpDir = os.path.join(data_dir, 'lp_' + suffix)
if not os.path.exists(lpDir):
    print('Please create directory "' + lpDir + '"')



In [581]:

    
eta = 0.5
#eta = 1
time_based = True



In [585]:

    
for seqid in sorted(train_seqs.keys()):
    if not os.path.exists(lpDir): 
        print('Please create directory "' + lpDir + '"')
        break
    seq = train_seqs[seqid]
    lpFile = os.path.join(lpDir, str(seqid) + '.lp')
    user = seq_user.loc[seqid].iloc[0]
    the_user_interest = None
    if time_based == True:
        the_user_interest = user_interest['timeBased'].copy()
    else: 
        the_user_interest = user_interest['freqBased'].copy()
    budget = calc_seq_budget(user, seq, poi_info, poi_dist_mat, the_user_interest)
    print('generating ILP', lpFile, 'for user', user, 'sequence', seq, 'budget', round(budget, 2))
    generate_ILP(lpFile, user, budget, seq[0], seq[-1], poi_info, poi_dist_mat, eta, speed, the_user_interest)









    



generating ILP data/data-ijcai15/lp_Toro/67.lp for user 10502709@N05 sequence [28, 23, 22] budget 3790.61
generating ILP data/data-ijcai15/lp_Toro/71.lp for user 10502709@N05 sequence [22, 28, 23, 21] budget 12676.1
generating ILP data/data-ijcai15/lp_Toro/100.lp for user 10627620@N06 sequence [21, 23, 30] budget 4559.21
generating ILP data/data-ijcai15/lp_Toro/155.lp for user 11191102@N07 sequence [16, 4, 22] budget 2216.6
generating ILP data/data-ijcai15/lp_Toro/162.lp for user 11191102@N07 sequence [7, 30, 22] budget 1802.19
generating ILP data/data-ijcai15/lp_Toro/316.lp for user 13907834@N00 sequence [21, 27, 11] budget 2359.33
generating ILP data/data-ijcai15/lp_Toro/379.lp for user 14391210@N00 sequence [30, 22, 6] budget 2680.19
generating ILP data/data-ijcai15/lp_Toro/454.lp for user 14878709@N00 sequence [16, 21, 23, 8, 28, 22, 27] budget 26544.12
generating ILP data/data-ijcai15/lp_Toro/523.lp for user 16693950@N00 sequence [16, 22, 23] budget 8715.83
generating ILP data/data-ijcai15/lp_Toro/525.lp for user 16693950@N00 sequence [22, 8, 30, 7, 28, 23, 21, 16] budget 23150.4
generating ILP data/data-ijcai15/lp_Toro/619.lp for user 18412989@N00 sequence [23, 22, 16] budget 1333.41
generating ILP data/data-ijcai15/lp_Toro/624.lp for user 18412989@N00 sequence [7, 23, 1] budget 18014.63
generating ILP data/data-ijcai15/lp_Toro/686.lp for user 20456447@N03 sequence [22, 30, 7, 1, 8, 16, 21] budget 6506.96
generating ILP data/data-ijcai15/lp_Toro/688.lp for user 20456447@N03 sequence [3, 29, 6, 22] budget 4099.53
generating ILP data/data-ijcai15/lp_Toro/708.lp for user 20741443@N00 sequence [7, 23, 28, 21, 30] budget 11635.47
generating ILP data/data-ijcai15/lp_Toro/712.lp for user 20741443@N00 sequence [7, 30, 23] budget 7148.2
generating ILP data/data-ijcai15/lp_Toro/716.lp for user 20741443@N00 sequence [21, 7, 1] budget 15270.49
generating ILP data/data-ijcai15/lp_Toro/722.lp for user 20741443@N00 sequence [7, 22, 23] budget 5285.56
generating ILP data/data-ijcai15/lp_Toro/729.lp for user 20741443@N00 sequence [23, 21, 22, 7, 28] budget 11495.32
generating ILP data/data-ijcai15/lp_Toro/730.lp for user 20741443@N00 sequence [7, 28, 23] budget 5209.45
generating ILP data/data-ijcai15/lp_Toro/731.lp for user 20741443@N00 sequence [21, 23, 22] budget 4824.64
generating ILP data/data-ijcai15/lp_Toro/744.lp for user 20741443@N00 sequence [28, 21, 30] budget 6304.43
generating ILP data/data-ijcai15/lp_Toro/745.lp for user 20741443@N00 sequence [30, 21, 6] budget 5442.44
generating ILP data/data-ijcai15/lp_Toro/753.lp for user 20741443@N00 sequence [30, 28, 23, 7] budget 10649.89
generating ILP data/data-ijcai15/lp_Toro/763.lp for user 20741443@N00 sequence [28, 23, 21] budget 4491.87
generating ILP data/data-ijcai15/lp_Toro/775.lp for user 20741443@N00 sequence [22, 21, 30] budget 6338.22
generating ILP data/data-ijcai15/lp_Toro/778.lp for user 20741443@N00 sequence [7, 28, 30] budget 6445.07
generating ILP data/data-ijcai15/lp_Toro/783.lp for user 20741443@N00 sequence [7, 30, 21, 23, 22] budget 10995.62
generating ILP data/data-ijcai15/lp_Toro/786.lp for user 20741443@N00 sequence [7, 23, 21, 28] budget 7213.92
generating ILP data/data-ijcai15/lp_Toro/788.lp for user 20741443@N00 sequence [28, 23, 21, 11, 22] budget 10798.07
generating ILP data/data-ijcai15/lp_Toro/790.lp for user 20741443@N00 sequence [28, 23, 21] budget 4491.87
generating ILP data/data-ijcai15/lp_Toro/807.lp for user 20741443@N00 sequence [30, 22, 7] budget 7724.2
generating ILP data/data-ijcai15/lp_Toro/809.lp for user 20741443@N00 sequence [7, 30, 21, 23, 28] budget 11049.04
generating ILP data/data-ijcai15/lp_Toro/814.lp for user 20741443@N00 sequence [30, 7, 28, 23] budget 10152.88
generating ILP data/data-ijcai15/lp_Toro/816.lp for user 20741443@N00 sequence [23, 28, 30] budget 6062.71
generating ILP data/data-ijcai15/lp_Toro/821.lp for user 20741443@N00 sequence [23, 28, 22] budget 3818.64
generating ILP data/data-ijcai15/lp_Toro/822.lp for user 20741443@N00 sequence [7, 23, 28, 30] budget 9395.34
generating ILP data/data-ijcai15/lp_Toro/825.lp for user 20741443@N00 sequence [23, 22, 28] budget 3934.8
generating ILP data/data-ijcai15/lp_Toro/843.lp for user 20741443@N00 sequence [23, 16, 30] budget 8731.26
generating ILP data/data-ijcai15/lp_Toro/852.lp for user 20741443@N00 sequence [1, 30, 7] budget 8618.13
generating ILP data/data-ijcai15/lp_Toro/867.lp for user 20741443@N00 sequence [8, 30, 7] budget 8828.0
generating ILP data/data-ijcai15/lp_Toro/870.lp for user 20741443@N00 sequence [28, 3, 21] budget 4852.43
generating ILP data/data-ijcai15/lp_Toro/879.lp for user 20741443@N00 sequence [7, 23, 21] budget 4995.81
generating ILP data/data-ijcai15/lp_Toro/880.lp for user 20741443@N00 sequence [7, 22, 30] budget 6469.65
generating ILP data/data-ijcai15/lp_Toro/883.lp for user 20741443@N00 sequence [21, 23, 30] budget 7112.91
generating ILP data/data-ijcai15/lp_Toro/884.lp for user 20741443@N00 sequence [23, 21, 28, 30] budget 7945.59
generating ILP data/data-ijcai15/lp_Toro/898.lp for user 20741443@N00 sequence [8, 30, 16] budget 7987.32
generating ILP data/data-ijcai15/lp_Toro/904.lp for user 20741443@N00 sequence [22, 23, 30] budget 7178.1
generating ILP data/data-ijcai15/lp_Toro/905.lp for user 20741443@N00 sequence [7, 23, 28, 30, 21, 22, 11] budget 17140.52
generating ILP data/data-ijcai15/lp_Toro/911.lp for user 20741443@N00 sequence [29, 23, 21] budget 5466.45
generating ILP data/data-ijcai15/lp_Toro/920.lp for user 20741443@N00 sequence [30, 7, 21] budget 7302.97
generating ILP data/data-ijcai15/lp_Toro/923.lp for user 20741443@N00 sequence [21, 7, 30] budget 9293.38
generating ILP data/data-ijcai15/lp_Toro/928.lp for user 20741443@N00 sequence [29, 27, 21] budget 3845.88
generating ILP data/data-ijcai15/lp_Toro/929.lp for user 20741443@N00 sequence [7, 21, 28, 23] budget 7406.34
generating ILP data/data-ijcai15/lp_Toro/962.lp for user 20741443@N00 sequence [21, 23, 28] budget 4878.06
generating ILP data/data-ijcai15/lp_Toro/963.lp for user 20741443@N00 sequence [30, 28, 23] budget 5288.77
generating ILP data/data-ijcai15/lp_Toro/965.lp for user 20741443@N00 sequence [30, 7, 21] budget 7302.97
generating ILP data/data-ijcai15/lp_Toro/969.lp for user 20741443@N00 sequence [28, 21, 3] budget 4261.87
generating ILP data/data-ijcai15/lp_Toro/979.lp for user 20741443@N00 sequence [7, 22, 23, 29] budget 9375.0
generating ILP data/data-ijcai15/lp_Toro/980.lp for user 20741443@N00 sequence [24, 7, 28] budget 7971.65
generating ILP data/data-ijcai15/lp_Toro/988.lp for user 20741443@N00 sequence [24, 7, 23, 28] budget 10921.93
generating ILP data/data-ijcai15/lp_Toro/1117.lp for user 23908938@N02 sequence [4, 30, 22] budget 1780.47
generating ILP data/data-ijcai15/lp_Toro/1118.lp for user 23908938@N02 sequence [7, 30, 16, 28] budget 14222.15
generating ILP data/data-ijcai15/lp_Toro/1123.lp for user 23987663@N00 sequence [11, 22, 23, 21] budget 2770.2
generating ILP data/data-ijcai15/lp_Toro/1135.lp for user 23987663@N00 sequence [23, 22, 28] budget 2745.52
generating ILP data/data-ijcai15/lp_Toro/1225.lp for user 24854893@N00 sequence [16, 17, 19] budget 5280.7
generating ILP data/data-ijcai15/lp_Toro/1237.lp for user 24854893@N00 sequence [30, 24, 25, 11] budget 4638.74
generating ILP data/data-ijcai15/lp_Toro/1421.lp for user 28288718@N03 sequence [30, 27, 21] budget 4198.19
generating ILP data/data-ijcai15/lp_Toro/1423.lp for user 28288718@N03 sequence [21, 28, 30] budget 1200.15
generating ILP data/data-ijcai15/lp_Toro/1524.lp for user 29352917@N00 sequence [29, 30, 16] budget 10570.26
generating ILP data/data-ijcai15/lp_Toro/1525.lp for user 29352917@N00 sequence [6, 25, 23] budget 2281.72
generating ILP data/data-ijcai15/lp_Toro/1591.lp for user 30624156@N00 sequence [16, 8, 30, 7] budget 4242.67
generating ILP data/data-ijcai15/lp_Toro/1598.lp for user 30624156@N00 sequence [28, 7, 30, 22, 27] budget 6673.88
generating ILP data/data-ijcai15/lp_Toro/1865.lp for user 33473816@N00 sequence [7, 27, 30] budget 5058.68
generating ILP data/data-ijcai15/lp_Toro/1877.lp for user 33547369@N00 sequence [23, 21, 22] budget 6195.07
generating ILP data/data-ijcai15/lp_Toro/1918.lp for user 34211328@N00 sequence [30, 24, 4] budget 6681.45
generating ILP data/data-ijcai15/lp_Toro/1933.lp for user 34211328@N00 sequence [2, 17, 14] budget 11751.77
generating ILP data/data-ijcai15/lp_Toro/1934.lp for user 34211328@N00 sequence [8, 16, 7] budget 9887.76
generating ILP data/data-ijcai15/lp_Toro/1937.lp for user 34211328@N00 sequence [17, 2, 14] budget 7092.11
generating ILP data/data-ijcai15/lp_Toro/1938.lp for user 34211328@N00 sequence [17, 14, 27, 11] budget 6900.79
generating ILP data/data-ijcai15/lp_Toro/1940.lp for user 34211328@N00 sequence [11, 29, 6, 23, 22] budget 11350.41
generating ILP data/data-ijcai15/lp_Toro/1949.lp for user 34211328@N00 sequence [8, 16, 13] budget 12207.38
generating ILP data/data-ijcai15/lp_Toro/1968.lp for user 34211328@N00 sequence [7, 30, 22, 21, 23] budget 9010.19
generating ILP data/data-ijcai15/lp_Toro/1969.lp for user 34211328@N00 sequence [7, 28, 29] budget 9907.63
generating ILP data/data-ijcai15/lp_Toro/2196.lp for user 35468159247@N01 sequence [24, 16, 30, 6, 21] budget 3650.56
generating ILP data/data-ijcai15/lp_Toro/2207.lp for user 35764233@N00 sequence [23, 13, 21, 1, 22] budget 12659.11
generating ILP data/data-ijcai15/lp_Toro/2331.lp for user 37517876@N07 sequence [11, 27, 23] budget 2910.56
generating ILP data/data-ijcai15/lp_Toro/2338.lp for user 37517876@N07 sequence [1, 30, 7] budget 8380.91
generating ILP data/data-ijcai15/lp_Toro/2340.lp for user 37517876@N07 sequence [24, 28, 22, 21] budget 9497.41
generating ILP data/data-ijcai15/lp_Toro/2461.lp for user 39460517@N03 sequence [2, 14, 17] budget 7573.73
generating ILP data/data-ijcai15/lp_Toro/2466.lp for user 39460517@N03 sequence [22, 28, 23, 21] budget 9572.01
generating ILP data/data-ijcai15/lp_Toro/2471.lp for user 39460517@N03 sequence [22, 28, 21] budget 8989.44
generating ILP data/data-ijcai15/lp_Toro/2497.lp for user 39587684@N00 sequence [25, 1, 7] budget 2387.4
generating ILP data/data-ijcai15/lp_Toro/2499.lp for user 39587684@N00 sequence [23, 28, 22] budget 2773.03
generating ILP data/data-ijcai15/lp_Toro/2705.lp for user 43139087@N00 sequence [21, 23, 28, 22, 7] budget 4233.86
generating ILP data/data-ijcai15/lp_Toro/3037.lp for user 47501960@N00 sequence [23, 28, 3] budget 8602.89
generating ILP data/data-ijcai15/lp_Toro/3401.lp for user 49503002894@N01 sequence [22, 8, 3] budget 30730.46
generating ILP data/data-ijcai15/lp_Toro/3402.lp for user 49503002894@N01 sequence [8, 19, 22] budget 20043.02
generating ILP data/data-ijcai15/lp_Toro/3856.lp for user 58919362@N00 sequence [30, 7, 23, 21] budget 4468.09
generating ILP data/data-ijcai15/lp_Toro/3932.lp for user 60597745@N00 sequence [22, 28, 4] budget 9716.0
generating ILP data/data-ijcai15/lp_Toro/3967.lp for user 61377999@N03 sequence [6, 21, 22] budget 1469.68
generating ILP data/data-ijcai15/lp_Toro/3968.lp for user 61377999@N03 sequence [22, 16, 4] budget 7999.69
generating ILP data/data-ijcai15/lp_Toro/4351.lp for user 69754957@N00 sequence [11, 23, 3, 30, 7, 16, 4, 1, 2] budget 36894.44
generating ILP data/data-ijcai15/lp_Toro/4451.lp for user 71482738@N00 sequence [21, 6, 22, 28] budget 14520.65
generating ILP data/data-ijcai15/lp_Toro/4580.lp for user 75450299@N07 sequence [22, 28, 21] budget 7705.51
generating ILP data/data-ijcai15/lp_Toro/4756.lp for user 7776449@N06 sequence [22, 29, 11] budget 8530.28
generating ILP data/data-ijcai15/lp_Toro/4782.lp for user 78015320@N00 sequence [16, 8, 28, 22, 6] budget 2855.54
generating ILP data/data-ijcai15/lp_Toro/4973.lp for user 81471618@N06 sequence [11, 19, 15] budget 8214.48
generating ILP data/data-ijcai15/lp_Toro/5183.lp for user 84987970@N00 sequence [3, 21, 22, 25] budget 13612.62
generating ILP data/data-ijcai15/lp_Toro/5207.lp for user 84987970@N00 sequence [4, 2, 17] budget 13419.73
generating ILP data/data-ijcai15/lp_Toro/5214.lp for user 84987970@N00 sequence [7, 22, 23, 3, 29] budget 19860.24
generating ILP data/data-ijcai15/lp_Toro/5215.lp for user 84987970@N00 sequence [21, 22, 3] budget 7730.74
generating ILP data/data-ijcai15/lp_Toro/5228.lp for user 84987970@N00 sequence [3, 21, 23] budget 11065.24
generating ILP data/data-ijcai15/lp_Toro/5237.lp for user 84987970@N00 sequence [3, 30, 22] budget 14689.95
generating ILP data/data-ijcai15/lp_Toro/5250.lp for user 84987970@N00 sequence [30, 23, 21, 22, 6, 16] budget 21783.92
generating ILP data/data-ijcai15/lp_Toro/5253.lp for user 84987970@N00 sequence [21, 23, 1] budget 9402.18
generating ILP data/data-ijcai15/lp_Toro/5254.lp for user 84987970@N00 sequence [6, 22, 28, 21, 23, 30, 7] budget 29418.07
generating ILP data/data-ijcai15/lp_Toro/5256.lp for user 84987970@N00 sequence [21, 22, 23] budget 11743.16
generating ILP data/data-ijcai15/lp_Toro/5263.lp for user 84987970@N00 sequence [3, 21, 23, 28, 7, 30, 22] budget 29435.74
generating ILP data/data-ijcai15/lp_Toro/5276.lp for user 84987970@N00 sequence [6, 4, 30, 7, 28, 21] budget 20883.65
generating ILP data/data-ijcai15/lp_Toro/5277.lp for user 84987970@N00 sequence [29, 22, 28, 23, 21, 27] budget 22854.88
generating ILP data/data-ijcai15/lp_Toro/5278.lp for user 84987970@N00 sequence [21, 23, 28, 6] budget 10375.35
generating ILP data/data-ijcai15/lp_Toro/5282.lp for user 84987970@N00 sequence [22, 29, 24, 11] budget 15645.32
generating ILP data/data-ijcai15/lp_Toro/5287.lp for user 84987970@N00 sequence [28, 22, 21] budget 11309.43
generating ILP data/data-ijcai15/lp_Toro/5294.lp for user 84987970@N00 sequence [2, 17, 14] budget 9284.25
generating ILP data/data-ijcai15/lp_Toro/5301.lp for user 84987970@N00 sequence [7, 28, 22, 30] budget 17289.83
generating ILP data/data-ijcai15/lp_Toro/5324.lp for user 84987970@N00 sequence [22, 28, 23, 3] budget 10532.91
generating ILP data/data-ijcai15/lp_Toro/5325.lp for user 84987970@N00 sequence [24, 15, 19] budget 3760.75
generating ILP data/data-ijcai15/lp_Toro/5328.lp for user 84987970@N00 sequence [23, 22, 21] budget 11434.18
generating ILP data/data-ijcai15/lp_Toro/5337.lp for user 84987970@N00 sequence [22, 21, 3, 23, 28] budget 16346.64
generating ILP data/data-ijcai15/lp_Toro/5343.lp for user 84987970@N00 sequence [22, 30, 21] budget 13341.73
generating ILP data/data-ijcai15/lp_Toro/5539.lp for user 9025385@N07 sequence [30, 7, 24, 16] budget 1751.68
generating ILP data/data-ijcai15/lp_Toro/5648.lp for user 93241698@N00 sequence [22, 28, 23] budget 558.03
generating ILP data/data-ijcai15/lp_Toro/5747.lp for user 9449875@N03 sequence [28, 21, 23] budget 3174.28
generating ILP data/data-ijcai15/lp_Toro/5927.lp for user 98715075@N00 sequence [1, 30, 4] budget 1471.23
generating ILP data/data-ijcai15/lp_Toro/5947.lp for user 9911655@N08 sequence [25, 29, 8] budget 7622.8
generating ILP data/data-ijcai15/lp_Toro/5963.lp for user 99127884@N00 sequence [21, 23, 28] budget 551.58
generating ILP data/data-ijcai15/lp_Toro/6011.lp for user 9985167@N04 sequence [22, 28, 21] budget 2682.73
generating ILP data/data-ijcai15/lp_Toro/6018.lp for user 9985167@N04 sequence [28, 23, 21] budget 4326.58

5.3.2 Generate ILPs for testing set



In [586]:

    
test_seqs = extract_seq(test_set, seq_all)



In [587]:

    
for seqid in sorted(test_seqs.keys()):
    if not os.path.exists(lpDir): 
        print('Please create directory "' + lpDir + '"')
        break
    seq = test_seqs[seqid]
    lpFile = os.path.join(lpDir, str(seqid) + '.lp')
    user = seq_user.loc[seqid].iloc[0]
    the_user_interest = None
    if time_based == True:
        the_user_interest = user_interest['timeBased'].copy()
    else: 
        the_user_interest = user_interest['freqBased'].copy()
    budget = calc_seq_budget(user, seq, poi_info, poi_dist_mat, the_user_interest)
    print('generating ILP', lpFile, 'for user', user, 'sequence', seq, 'budget', round(budget, 2))
    generate_ILP(lpFile, user, budget, seq[0], seq[-1], poi_info, poi_dist_mat, eta, speed, the_user_interest)









    



generating ILP data/data-ijcai15/lp_Toro/58.lp for user 10502709@N05 sequence [7, 11, 27] budget 2795.14
generating ILP data/data-ijcai15/lp_Toro/99.lp for user 10627620@N06 sequence [3, 23, 27] budget 2831.06
generating ILP data/data-ijcai15/lp_Toro/157.lp for user 11191102@N07 sequence [27, 23, 11] budget 3718.39
generating ILP data/data-ijcai15/lp_Toro/315.lp for user 13907834@N00 sequence [16, 4, 8] budget 1277.07
generating ILP data/data-ijcai15/lp_Toro/380.lp for user 14391210@N00 sequence [30, 22, 28, 23] budget 2088.65
generating ILP data/data-ijcai15/lp_Toro/453.lp for user 14878709@N00 sequence [23, 28, 16] budget 15637.73
generating ILP data/data-ijcai15/lp_Toro/524.lp for user 16693950@N00 sequence [25, 19, 15] budget 5246.03
generating ILP data/data-ijcai15/lp_Toro/623.lp for user 18412989@N00 sequence [21, 23, 24] budget 5036.3
generating ILP data/data-ijcai15/lp_Toro/687.lp for user 20456447@N03 sequence [7, 16, 4, 8] budget 2421.19
generating ILP data/data-ijcai15/lp_Toro/813.lp for user 20741443@N00 sequence [8, 30, 28, 23] budget 9173.34
generating ILP data/data-ijcai15/lp_Toro/1121.lp for user 23908938@N02 sequence [16, 11, 13] budget 8362.73
generating ILP data/data-ijcai15/lp_Toro/1128.lp for user 23987663@N00 sequence [10, 21, 28, 22] budget 4873.97
generating ILP data/data-ijcai15/lp_Toro/1247.lp for user 24854893@N00 sequence [25, 16, 24] budget 2129.43
generating ILP data/data-ijcai15/lp_Toro/1424.lp for user 28288718@N03 sequence [30, 23, 28] budget 1063.0
generating ILP data/data-ijcai15/lp_Toro/1523.lp for user 29352917@N00 sequence [16, 23, 28, 22] budget 4515.02
generating ILP data/data-ijcai15/lp_Toro/1592.lp for user 30624156@N00 sequence [21, 23, 22, 28] budget 2431.14
generating ILP data/data-ijcai15/lp_Toro/1835.lp for user 33473816@N00 sequence [22, 8, 30] budget 1492.33
generating ILP data/data-ijcai15/lp_Toro/1875.lp for user 33547369@N00 sequence [22, 28, 23] budget 9628.03
generating ILP data/data-ijcai15/lp_Toro/1931.lp for user 34211328@N00 sequence [21, 23, 22] budget 1566.79
generating ILP data/data-ijcai15/lp_Toro/2198.lp for user 35468159247@N01 sequence [23, 21, 22] budget 691.07
generating ILP data/data-ijcai15/lp_Toro/2209.lp for user 35764233@N00 sequence [21, 16, 22] budget 2461.21
generating ILP data/data-ijcai15/lp_Toro/2334.lp for user 37517876@N07 sequence [21, 6, 7, 1] budget 10663.36
generating ILP data/data-ijcai15/lp_Toro/2472.lp for user 39460517@N03 sequence [21, 23, 22, 28] budget 10951.89
generating ILP data/data-ijcai15/lp_Toro/2496.lp for user 39587684@N00 sequence [30, 8, 7] budget 1231.96
generating ILP data/data-ijcai15/lp_Toro/2703.lp for user 43139087@N00 sequence [24, 30, 7] budget 3424.77
generating ILP data/data-ijcai15/lp_Toro/3040.lp for user 47501960@N00 sequence [23, 27, 21] budget 3392.95
generating ILP data/data-ijcai15/lp_Toro/3403.lp for user 49503002894@N01 sequence [8, 16, 22] budget 13953.64
generating ILP data/data-ijcai15/lp_Toro/3851.lp for user 58919362@N00 sequence [30, 24, 22] budget 3512.33
generating ILP data/data-ijcai15/lp_Toro/3935.lp for user 60597745@N00 sequence [13, 4, 16] budget 12341.46
generating ILP data/data-ijcai15/lp_Toro/3969.lp for user 61377999@N03 sequence [7, 11, 25] budget 3693.22
generating ILP data/data-ijcai15/lp_Toro/4354.lp for user 69754957@N00 sequence [29, 21, 3, 30] budget 19329.2
generating ILP data/data-ijcai15/lp_Toro/4450.lp for user 71482738@N00 sequence [16, 8, 30, 11, 25] budget 56244.05
generating ILP data/data-ijcai15/lp_Toro/4578.lp for user 75450299@N07 sequence [25, 6, 22, 21, 28] budget 11238.31
generating ILP data/data-ijcai15/lp_Toro/4758.lp for user 7776449@N06 sequence [21, 23, 6, 25, 27] budget 12372.76
generating ILP data/data-ijcai15/lp_Toro/4783.lp for user 78015320@N00 sequence [6, 13, 11] budget 4784.01
generating ILP data/data-ijcai15/lp_Toro/4972.lp for user 81471618@N06 sequence [28, 23, 21] budget 10819.46
generating ILP data/data-ijcai15/lp_Toro/5180.lp for user 84987970@N00 sequence [22, 28, 23, 21] budget 14345.63
generating ILP data/data-ijcai15/lp_Toro/5541.lp for user 9025385@N07 sequence [30, 16, 4, 1, 7] budget 1942.81
generating ILP data/data-ijcai15/lp_Toro/5649.lp for user 93241698@N00 sequence [23, 22, 16] budget 1570.41
generating ILP data/data-ijcai15/lp_Toro/5751.lp for user 9449875@N03 sequence [14, 2, 17] budget 356.33
generating ILP data/data-ijcai15/lp_Toro/5932.lp for user 98715075@N00 sequence [22, 23, 21] budget 553.74
generating ILP data/data-ijcai15/lp_Toro/5959.lp for user 9911655@N08 sequence [28, 21, 30] budget 5343.48
generating ILP data/data-ijcai15/lp_Toro/5964.lp for user 99127884@N00 sequence [11, 29, 6, 22, 28, 7, 30, 1, 15] budget 7249.42
generating ILP data/data-ijcai15/lp_Toro/6028.lp for user 9985167@N04 sequence [23, 30, 22, 28, 3] budget 5135.14

5.4 Evaluation



In [588]:

    
def load_solution_gurobi(fsol, startPoi, endPoi):
    """Load recommended itinerary from MIP solution file by GUROBI"""
    seqterm = [] 
    with open(fsol, 'r') as f:
        for line in f:
            if re.search('^visit_', line):      # e.g. visit_0_7 1\n
                item = line.strip().split(' ')  #      visit_21_16 1.56406801399038e-09\n
                if round(float(item[1])) == 1:
                    fromto = item[0].split('_')
                    seqterm.append((int(fromto[1]), int(fromto[2])))
    p0 = startPoi
    pN = endPoi
    recseq = [p0]
    while True:
        px = recseq[-1]
        for term in seqterm:
            if term[0] == px:
                recseq.append(term[1])
                if term[1] == pN: 
                    return recseq
                else:
                    seqterm.remove(term)
                    break

5.4.1 Evaluation on training set



In [590]:

    
train_seqs_rec = dict()



In [591]:

    
solDir = os.path.join(data_dir, os.path.join('lp_' + suffix, 'eta05_time'))
#solDir = os.path.join(data_dir, os.path.join('lp_' + suffix, 'eta10_time'))
if not os.path.exists(solDir):
    print('Directory for solution files', solDir, 'does not exist.')



In [592]:

    
for seqid in sorted(train_seqs.keys()):
    if not os.path.exists(solDir):
        print('Directory for solution files', solDir, 'does not exist.')
        break
    seq = train_seqs[seqid]
    solFile = os.path.join(solDir, str(seqid) + '.lp.sol')
    recseq = load_solution_gurobi(solFile, seq[0], seq[-1])
    train_seqs_rec[seqid] = recseq
    print('Sequence', seqid, 'Actual:', seq, ', Recommended:', recseq)









    



Sequence 67 Actual: [28, 23, 22] , Recommended: [28, 7, 21, 22]
Sequence 71 Actual: [22, 28, 23, 21] , Recommended: [22, 23, 7, 1, 16, 8, 24, 4, 6, 11, 27, 3, 21]
Sequence 100 Actual: [21, 23, 30] , Recommended: [21, 23, 30]
Sequence 155 Actual: [16, 4, 22] , Recommended: [16, 30, 28, 22]
Sequence 162 Actual: [7, 30, 22] , Recommended: [7, 28, 22]
Sequence 316 Actual: [21, 27, 11] , Recommended: [21, 22, 29, 11]
Sequence 379 Actual: [30, 22, 6] , Recommended: [30, 1, 28, 23, 6]
Sequence 454 Actual: [16, 21, 23, 8, 28, 22, 27] , Recommended: [16, 4, 1, 30, 7, 22, 28, 23, 21, 3, 29, 11, 27]
Sequence 523 Actual: [16, 22, 23] , Recommended: [16, 4, 24, 1, 30, 28, 23]
Sequence 525 Actual: [22, 8, 30, 7, 28, 23, 21, 16] , Recommended: [22, 6, 27, 3, 21, 23, 28, 7, 30, 1, 24, 4, 16]
Sequence 619 Actual: [23, 22, 16] , Recommended: [23, 22, 16]
Sequence 624 Actual: [7, 23, 1] , Recommended: [7, 28, 21, 22, 8, 16, 30, 1]
Sequence 686 Actual: [22, 30, 7, 1, 8, 16, 21] , Recommended: [22, 8, 16, 30, 7, 28, 23, 3, 21]
Sequence 688 Actual: [3, 29, 6, 22] , Recommended: [3, 23, 28, 22]
Sequence 708 Actual: [7, 23, 28, 21, 30] , Recommended: [7, 21, 23, 22, 30]
Sequence 712 Actual: [7, 30, 23] , Recommended: [7, 22, 21, 23]
Sequence 716 Actual: [21, 7, 1] , Recommended: [21, 23, 22, 1]
Sequence 722 Actual: [7, 22, 23] , Recommended: [7, 21, 23]
Sequence 729 Actual: [23, 21, 22, 7, 28] , Recommended: [23, 21, 22, 30, 28]
Sequence 730 Actual: [7, 28, 23] , Recommended: [7, 28, 23]
Sequence 731 Actual: [21, 23, 22] , Recommended: [21, 23, 22]
Sequence 744 Actual: [28, 21, 30] , Recommended: [28, 21, 30]
Sequence 745 Actual: [30, 21, 6] , Recommended: [30, 21, 6]
Sequence 753 Actual: [30, 28, 23, 7] , Recommended: [30, 21, 22, 7]
Sequence 763 Actual: [28, 23, 21] , Recommended: [28, 23, 21]
Sequence 775 Actual: [22, 21, 30] , Recommended: [22, 21, 30]
Sequence 778 Actual: [7, 28, 30] , Recommended: [7, 28, 30]
Sequence 783 Actual: [7, 30, 21, 23, 22] , Recommended: [7, 21, 23, 28, 22]
Sequence 786 Actual: [7, 23, 21, 28] , Recommended: [7, 23, 21, 28]
Sequence 788 Actual: [28, 23, 21, 11, 22] , Recommended: [28, 3, 23, 21, 22]
Sequence 790 Actual: [28, 23, 21] , Recommended: [28, 23, 21]
Sequence 807 Actual: [30, 22, 7] , Recommended: [30, 22, 7]
Sequence 809 Actual: [7, 30, 21, 23, 28] , Recommended: [7, 23, 21, 22, 28]
Sequence 814 Actual: [30, 7, 28, 23] , Recommended: [30, 28, 22, 21, 23]
Sequence 816 Actual: [23, 28, 30] , Recommended: [23, 28, 30]
Sequence 821 Actual: [23, 28, 22] , Recommended: [23, 21, 22]
Sequence 822 Actual: [7, 23, 28, 30] , Recommended: [7, 22, 21, 30]
Sequence 825 Actual: [23, 22, 28] , Recommended: [23, 21, 28]
Sequence 843 Actual: [23, 16, 30] , Recommended: [23, 21, 22, 30]
Sequence 852 Actual: [1, 30, 7] , Recommended: [1, 21, 7]
Sequence 867 Actual: [8, 30, 7] , Recommended: [8, 21, 7]
Sequence 870 Actual: [28, 3, 21] , Recommended: [28, 23, 21]
Sequence 879 Actual: [7, 23, 21] , Recommended: [7, 23, 21]
Sequence 880 Actual: [7, 22, 30] , Recommended: [7, 22, 30]
Sequence 883 Actual: [21, 23, 30] , Recommended: [21, 23, 30]
Sequence 884 Actual: [23, 21, 28, 30] , Recommended: [23, 21, 22, 30]
Sequence 898 Actual: [8, 30, 16] , Recommended: [8, 21, 16]
Sequence 904 Actual: [22, 23, 30] , Recommended: [22, 23, 30]
Sequence 905 Actual: [7, 23, 28, 30, 21, 22, 11] , Recommended: [7, 30, 22, 28, 23, 21, 27, 11]
Sequence 911 Actual: [29, 23, 21] , Recommended: [29, 23, 21]
Sequence 920 Actual: [30, 7, 21] , Recommended: [30, 22, 23, 21]
Sequence 923 Actual: [21, 7, 30] , Recommended: [21, 22, 23, 30]
Sequence 928 Actual: [29, 27, 21] , Recommended: [29, 27, 21]
Sequence 929 Actual: [7, 21, 28, 23] , Recommended: [7, 22, 21, 23]
Sequence 962 Actual: [21, 23, 28] , Recommended: [21, 23, 28]
Sequence 963 Actual: [30, 28, 23] , Recommended: [30, 28, 23]
Sequence 965 Actual: [30, 7, 21] , Recommended: [30, 22, 23, 21]
Sequence 969 Actual: [28, 21, 3] , Recommended: [28, 21, 3]
Sequence 979 Actual: [7, 22, 23, 29] , Recommended: [7, 23, 21, 29]
Sequence 980 Actual: [24, 7, 28] , Recommended: [24, 22, 23, 28]
Sequence 988 Actual: [24, 7, 23, 28] , Recommended: [24, 23, 21, 22, 28]
Sequence 1117 Actual: [4, 30, 22] , Recommended: [4, 23, 22]
Sequence 1118 Actual: [7, 30, 16, 28] , Recommended: [7, 30, 22, 6, 29, 11, 27, 3, 21, 23, 28]
Sequence 1123 Actual: [11, 22, 23, 21] , Recommended: [11, 22, 23, 21]
Sequence 1135 Actual: [23, 22, 28] , Recommended: [23, 22, 28]
Sequence 1225 Actual: [16, 17, 19] , Recommended: [16, 22, 21, 23, 1, 24, 19]
Sequence 1237 Actual: [30, 24, 25, 11] , Recommended: [30, 1, 16, 22, 23, 21, 3, 11]
Sequence 1421 Actual: [30, 27, 21] , Recommended: [30, 16, 4, 8, 24, 1, 7, 22, 28, 23, 21]
Sequence 1423 Actual: [21, 28, 30] , Recommended: [21, 23, 30]
Sequence 1524 Actual: [29, 30, 16] , Recommended: [29, 3, 21, 23, 28, 22, 1, 24, 4, 16]
Sequence 1525 Actual: [6, 25, 23] , Recommended: [6, 21, 22, 23]
Sequence 1591 Actual: [16, 8, 30, 7] , Recommended: [16, 22, 21, 23, 28, 7]
Sequence 1598 Actual: [28, 7, 30, 22, 27] , Recommended: [28, 22, 16, 4, 8, 24, 1, 7, 23, 21, 3, 11, 27]
Sequence 1865 Actual: [7, 27, 30] , Recommended: [7, 23, 21, 28, 22, 6, 4, 16, 8, 24, 1, 30]
Sequence 1877 Actual: [23, 21, 22] , Recommended: [23, 7, 1, 8, 16, 30, 28, 22]
Sequence 1918 Actual: [30, 24, 4] , Recommended: [30, 24, 4]
Sequence 1933 Actual: [2, 17, 14] , Recommended: [2, 3, 21, 23, 22, 14]
Sequence 1934 Actual: [8, 16, 7] , Recommended: [8, 21, 23, 7]
Sequence 1937 Actual: [17, 2, 14] , Recommended: [17, 22, 14]
Sequence 1938 Actual: [17, 14, 27, 11] , Recommended: [17, 14, 27, 11]
Sequence 1940 Actual: [11, 29, 6, 23, 22] , Recommended: [11, 27, 25, 3, 21, 23, 28, 22]
Sequence 1949 Actual: [8, 16, 13] , Recommended: [8, 22, 23, 21, 27, 13]
Sequence 1968 Actual: [7, 30, 22, 21, 23] , Recommended: [7, 28, 22, 21, 3, 23]
Sequence 1969 Actual: [7, 28, 29] , Recommended: [7, 22, 23, 21, 29]
Sequence 2196 Actual: [24, 16, 30, 6, 21] , Recommended: [24, 16, 30, 7, 28, 22, 21]
Sequence 2207 Actual: [23, 13, 21, 1, 22] , Recommended: [23, 28, 7, 30, 1, 16, 4, 25, 29, 27, 3, 21, 22]
Sequence 2331 Actual: [11, 27, 23] , Recommended: [11, 3, 23]
Sequence 2338 Actual: [1, 30, 7] , Recommended: [1, 23, 7]
Sequence 2340 Actual: [24, 28, 22, 21] , Recommended: [24, 1, 30, 28, 23, 21]
Sequence 2461 Actual: [2, 14, 17] , Recommended: [2, 14, 17]
Sequence 2466 Actual: [22, 28, 23, 21] , Recommended: [22, 28, 23, 21]
Sequence 2471 Actual: [22, 28, 21] , Recommended: [22, 28, 21]
Sequence 2497 Actual: [25, 1, 7] , Recommended: [25, 6, 22, 7]
Sequence 2499 Actual: [23, 28, 22] , Recommended: [23, 21, 7, 1, 22]
Sequence 2705 Actual: [21, 23, 28, 22, 7] , Recommended: [21, 23, 28, 22, 7]
Sequence 3037 Actual: [23, 28, 3] , Recommended: [23, 22, 21, 3]
Sequence 3401 Actual: [22, 8, 3] , Recommended: [22, 21, 23, 28, 30, 1, 24, 16, 4, 17, 14, 2, 25, 29, 11, 27, 10, 3]
Sequence 3402 Actual: [8, 19, 22] , Recommended: [8, 4, 16, 24, 1, 30, 23, 28, 3, 27, 29, 22]
Sequence 3856 Actual: [30, 7, 23, 21] , Recommended: [30, 22, 28, 23, 21]
Sequence 3932 Actual: [22, 28, 4] , Recommended: [22, 28, 4]
Sequence 3967 Actual: [6, 21, 22] , Recommended: [6, 28, 23, 22]
Sequence 3968 Actual: [22, 16, 4] , Recommended: [22, 6, 25, 29, 11, 27, 3, 21, 23, 28, 7, 30, 1, 24, 8, 4]
Sequence 4351 Actual: [11, 23, 3, 30, 7, 16, 4, 1, 2] , Recommended: [11, 27, 3, 21, 23, 22, 28, 7, 1, 24, 4, 16, 8, 6, 25, 14, 2]
Sequence 4451 Actual: [21, 6, 22, 28] , Recommended: [21, 3, 27, 29, 25, 2, 14, 17, 4, 16, 24, 1, 30, 22, 23, 28]
Sequence 4580 Actual: [22, 28, 21] , Recommended: [22, 6, 4, 16, 8, 24, 1, 7, 23, 3, 21]
Sequence 4756 Actual: [22, 29, 11] , Recommended: [22, 21, 11]
Sequence 4782 Actual: [16, 8, 28, 22, 6] , Recommended: [16, 28, 23, 21, 22, 6]
Sequence 4973 Actual: [11, 19, 15] , Recommended: [11, 29, 3, 23, 28, 7, 30, 1, 8, 4, 24, 15]
Sequence 5183 Actual: [3, 21, 22, 25] , Recommended: [3, 21, 28, 7, 25]
Sequence 5207 Actual: [4, 2, 17] , Recommended: [4, 6, 14, 17]
Sequence 5214 Actual: [7, 22, 23, 3, 29] , Recommended: [7, 23, 28, 3, 27, 11, 29]
Sequence 5215 Actual: [21, 22, 3] , Recommended: [21, 28, 7, 3]
Sequence 5228 Actual: [3, 21, 23] , Recommended: [3, 27, 11, 7, 23]
Sequence 5237 Actual: [3, 30, 22] , Recommended: [3, 6, 7, 28, 22]
Sequence 5250 Actual: [30, 23, 21, 22, 6, 16] , Recommended: [30, 28, 3, 21, 23, 7, 16]
Sequence 5253 Actual: [21, 23, 1] , Recommended: [21, 28, 7, 1]
Sequence 5254 Actual: [6, 22, 28, 21, 23, 30, 7] , Recommended: [6, 11, 27, 3, 21, 23, 28, 22, 16, 7]
Sequence 5256 Actual: [21, 22, 23] , Recommended: [21, 7, 28, 23]
Sequence 5263 Actual: [3, 21, 23, 28, 7, 30, 22] , Recommended: [3, 27, 11, 6, 16, 7, 28, 23, 21, 22]
Sequence 5276 Actual: [6, 4, 30, 7, 28, 21] , Recommended: [6, 11, 27, 3, 7, 28, 23, 21]
Sequence 5277 Actual: [29, 22, 28, 23, 21, 27] , Recommended: [29, 3, 21, 23, 7, 28, 6, 25, 11, 27]
Sequence 5278 Actual: [21, 23, 28, 6] , Recommended: [21, 3, 23, 7, 6]
Sequence 5282 Actual: [22, 29, 24, 11] , Recommended: [22, 6, 7, 28, 21, 3, 27, 11]
Sequence 5287 Actual: [28, 22, 21] , Recommended: [28, 7, 6, 3, 21]
Sequence 5294 Actual: [2, 17, 14] , Recommended: [2, 7, 6, 14]
Sequence 5301 Actual: [7, 28, 22, 30] , Recommended: [7, 28, 23, 30]
Sequence 5324 Actual: [22, 28, 23, 3] , Recommended: [22, 28, 7, 6, 11, 27, 3]
Sequence 5325 Actual: [24, 15, 19] , Recommended: [24, 15, 19]
Sequence 5328 Actual: [23, 22, 21] , Recommended: [23, 7, 28, 21]
Sequence 5337 Actual: [22, 21, 3, 23, 28] , Recommended: [22, 6, 11, 27, 3, 21, 7, 28]
Sequence 5343 Actual: [22, 30, 21] , Recommended: [22, 7, 28, 3, 21]
Sequence 5539 Actual: [30, 7, 24, 16] , Recommended: [30, 7, 1, 8, 4, 16]
Sequence 5648 Actual: [22, 28, 23] , Recommended: [22, 28, 23]
Sequence 5747 Actual: [28, 21, 23] , Recommended: [28, 21, 23]
Sequence 5927 Actual: [1, 30, 4] , Recommended: [1, 30, 4]
Sequence 5947 Actual: [25, 29, 8] , Recommended: [25, 6, 3, 21, 22, 7, 1, 16, 4, 8]
Sequence 5963 Actual: [21, 23, 28] , Recommended: [21, 23, 28]
Sequence 6011 Actual: [22, 28, 21] , Recommended: [22, 28, 21]
Sequence 6018 Actual: [28, 23, 21] , Recommended: [28, 30, 7, 21]



In [593]:

    
recall = []
precision = []
F1score = []
for seqid in train_seqs.keys():
    r, p, F1 = calc_recall_precision_F1score(train_seqs[seqid], train_seqs_rec[seqid])
    recall.append(r)
    precision.append(p)
    F1score.append(F1)



In [594]:

    
print('Recall:', round(np.mean(recall), 2), ',', round(np.std(recall), 2))
print('Precision:', round(np.mean(precision), 2), ',', round(np.std(recall), 2))
print('F1-score:', round(np.mean(F1score), 2), ',', round(np.std(recall), 2))









    



Recall: 0.78 , 0.15
Precision: 0.63 , 0.15
F1-score: 0.68 , 0.15

5.4.2 Evaluation on testing set

Results from paper (Toronto data, time-based uesr interest, eta=0.5):

Recall: 0.779±0.10
Precision: 0.706±0.013
F1-score: 0.732±0.012



In [595]:

    
test_seqs_rec = dict()



In [596]:

    
solDirTest = os.path.join(data_dir, os.path.join('lp_' + suffix, 'eta05_time.test'))
if not os.path.exists(solDirTest):
    print('Directory for solution files', solDirTest, 'does not exist.')



In [597]:

    
for seqid in sorted(test_seqs.keys()):
    if not os.path.exists(solDirTest):
        print('Directory for solution files', solDirTest, 'does not exist.')
        break
    seq = test_seqs[seqid]
    solFile = os.path.join(solDirTest, str(seqid) + '.lp.sol')
    recseq = load_solution_gurobi(solFile, seq[0], seq[-1])
    test_seqs_rec[seqid] = recseq
    print('Sequence', seqid, 'Actual:', seq, ', Recommended:', recseq)









    



Sequence 58 Actual: [7, 11, 27] , Recommended: [7, 3, 27]
Sequence 99 Actual: [3, 23, 27] , Recommended: [3, 21, 11, 27]
Sequence 157 Actual: [27, 23, 11] , Recommended: [27, 23, 11]
Sequence 315 Actual: [16, 4, 8] , Recommended: [16, 4, 8]
Sequence 380 Actual: [30, 22, 28, 23] , Recommended: [30, 22, 28, 23]
Sequence 453 Actual: [23, 28, 16] , Recommended: [23, 22, 16]
Sequence 524 Actual: [25, 19, 15] , Recommended: [25, 23, 1, 24, 15]
Sequence 623 Actual: [21, 23, 24] , Recommended: [21, 6, 22, 28, 7, 30, 8, 16, 24]
Sequence 687 Actual: [7, 16, 4, 8] , Recommended: [7, 28, 23, 30, 8]
Sequence 813 Actual: [8, 30, 28, 23] , Recommended: [8, 22, 21, 23]
Sequence 1121 Actual: [16, 11, 13] , Recommended: [16, 22, 28, 23, 21, 29, 13]
Sequence 1128 Actual: [10, 21, 28, 22] , Recommended: [10, 21, 23, 7, 22]
Sequence 1247 Actual: [25, 16, 24] , Recommended: [25, 16, 24]
Sequence 1424 Actual: [30, 23, 28] , Recommended: [30, 23, 28]
Sequence 1523 Actual: [16, 23, 28, 22] , Recommended: [16, 4, 24, 1, 23, 21, 3, 22]
Sequence 1592 Actual: [21, 23, 22, 28] , Recommended: [21, 23, 22, 28]
Sequence 1835 Actual: [22, 8, 30] , Recommended: [22, 23, 28, 7, 30]
Sequence 1875 Actual: [22, 28, 23] , Recommended: [22, 28, 23]
Sequence 1931 Actual: [21, 23, 22] , Recommended: [21, 23, 22]
Sequence 2198 Actual: [23, 21, 22] , Recommended: [23, 21, 22]
Sequence 2209 Actual: [21, 16, 22] , Recommended: [21, 23, 28, 22]
Sequence 2334 Actual: [21, 6, 7, 1] , Recommended: [21, 23, 28, 22, 16, 30, 1]
Sequence 2472 Actual: [21, 23, 22, 28] , Recommended: [21, 3, 27, 11, 29, 6, 16, 4, 8, 24, 1, 30, 7, 23, 28]
Sequence 2496 Actual: [30, 8, 7] , Recommended: [30, 8, 7]
Sequence 2703 Actual: [24, 30, 7] , Recommended: [24, 30, 7]
Sequence 3040 Actual: [23, 27, 21] , Recommended: [23, 28, 22, 30, 7, 21]
Sequence 3403 Actual: [8, 16, 22] , Recommended: [8, 16, 22]
Sequence 3851 Actual: [30, 24, 22] , Recommended: [30, 16, 28, 23, 22]
Sequence 3935 Actual: [13, 4, 16] , Recommended: [13, 27, 11, 29, 25, 6, 22, 21, 23, 28, 7, 30, 24, 8, 16]
Sequence 3969 Actual: [7, 11, 25] , Recommended: [7, 1, 30, 28, 23, 21, 22, 6, 25]
Sequence 4354 Actual: [29, 21, 3, 30] , Recommended: [29, 3, 21, 23, 22, 30]
Sequence 4450 Actual: [16, 8, 30, 11, 25] , Recommended: [16, 4, 17, 14, 2, 19, 15, 24, 1, 30, 7, 28, 22, 23, 21, 10, 3, 27, 11, 29, 25]
Sequence 4578 Actual: [25, 6, 22, 21, 28] , Recommended: [25, 6, 16, 8, 7, 21, 23, 28]
Sequence 4758 Actual: [21, 23, 6, 25, 27] , Recommended: [21, 23, 28, 22, 30, 1, 24, 16, 4, 14, 17, 2, 25, 29, 3, 27]
Sequence 4783 Actual: [6, 13, 11] , Recommended: [6, 22, 30, 7, 28, 23, 21, 11]
Sequence 4972 Actual: [28, 23, 21] , Recommended: [28, 23, 21]
Sequence 5180 Actual: [22, 28, 23, 21] , Recommended: [22, 28, 7, 6, 3, 21]
Sequence 5541 Actual: [30, 16, 4, 1, 7] , Recommended: [30, 22, 23, 7]
Sequence 5649 Actual: [23, 22, 16] , Recommended: [23, 28, 8, 16]
Sequence 5751 Actual: [14, 2, 17] , Recommended: [14, 2, 17]
Sequence 5932 Actual: [22, 23, 21] , Recommended: [22, 23, 21]
Sequence 5959 Actual: [28, 21, 30] , Recommended: [28, 22, 1, 30]
Sequence 5964 Actual: [11, 29, 6, 22, 28, 7, 30, 1, 15] , Recommended: [11, 29, 21, 23, 28, 7, 30, 8, 24, 15]
Sequence 6028 Actual: [23, 30, 22, 28, 3] , Recommended: [23, 7, 1, 30, 28, 21, 3]



In [598]:

    
recallT = []
precisionT = []
F1scoreT = []
for seqid in test_seqs.keys():
    r, p, F1 = calc_recall_precision_F1score(test_seqs[seqid], test_seqs_rec[seqid])
    recallT.append(r)
    precisionT.append(p)
    F1scoreT.append(F1)



In [599]:

    
print('Recall:', round(np.mean(recallT), 2), ',', round(np.std(recallT), 2))
print('Precision:', round(np.mean(precisionT), 2), ',', round(np.std(recallT), 2))
print('F1-score:', round(np.mean(F1scoreT), 2), ',', round(np.std(recallT), 2))









    



Recall: 0.79 , 0.18
Precision: 0.62 , 0.18
F1-score: 0.67 , 0.18

6. Issues ⇈

Large budget leads to unrealistic recommended trajectory.
- large budget mainly comes from user interest times the average POI visit duration, since user interest is cumulative (i.e. sum, as defined at the top of the notebook)
- we try to use averaging instead of cumulative (which seems more realistic, max budget 15 hours vs. max 170 hours).
Is it necessary to consider visiting a certain POI more than one times? This paper ignores this setting.
Dealing with edge case $\bar{V}(p) = 0$

It appears when POIs at which just one photo was taken for each visited user (including some user just took/uploaded two or more photos with the same timestamp), the case does appear in this dataset.

For all users $U$, POI $p$, arrival time $p^a$ and depature time $p^d$, The Average POI Visit Duration is defined as: $\bar{V}(p) = \frac{1}{n}\sum_{u \in U}\sum_{p_x \in S_u}(t_{p_x}^d - t_{p_x}^a)\delta(p_x = p), \forall p \in P$

and Time-based User Interest is defined as: $Int_u^Time(c) = \sum_{p_x \in S_u} \frac{t_{p_x}^d - t_{p_x}^a}{\bar{V}(p_x)} \delta(Cat_{p_x} = c), \forall c \in C$

Up to now, two strategies have been tried:
- let the term $\frac{t_{p_x}^d - t_{p_x}^a}{\bar{V}(p_x)} = K$, where $K$ is a constant (e.g. 2). This approach seems to work, but the effects of different constants should be tested
- discard all photo records in dataset related to the edge case. This approach suffers from throwing too much information, makes the useful dataset too small (at about 1% of the original dataset sometimes)
CBC is too slow for large sequences (length >= 4)
- use Gurobi on CECS servers

	photoLon	photoLat
min	-79.465088	43.618523
max	-79.180336	43.822018
range	0.284752	0.203495

	Amusement	Beach	Cultural	Shopping	Sport	Structure
Amusement	0.091503	0.130719	0.333333	0.111111	0.228758	0.104575
Beach	0.053097	0.126844	0.179941	0.271386	0.056047	0.312684
Cultural	0.133721	0.203488	0.116279	0.209302	0.063953	0.273256
Shopping	0.056940	0.359431	0.206406	0.056940	0.067616	0.252669
Sport	0.328125	0.179688	0.164062	0.101562	0.062500	0.164062
Structure	0.085246	0.298361	0.265574	0.193443	0.078689	0.078689

	Amusement	Beach	Cultural	Shopping	Sport	Structure
Amusement	-1.038563	-0.883661	-0.477121	-0.954243	-0.640623	-0.980571
Beach	-1.274927	-0.896731	-0.744870	-0.566412	-1.251446	-0.504894
Cultural	-0.873801	-0.691460	-0.934498	-0.679226	-1.194136	-0.563431
Shopping	-1.244586	-0.444385	-0.685278	-1.244586	-1.169953	-0.597448
Sport	-0.483961	-0.745482	-0.784991	-0.993267	-1.204120	-0.784991
Structure	-1.069326	-0.525258	-0.575815	-0.713448	-1.104089	-1.104089

	photoID	userID	dateTaken	poiID	poiTheme	poiFreq	seqID
0	7941504100	10007579@N00	1346844688	30	Structure	1538	1
1	4886005532	10012675@N05	1142731848	6	Cultural	986	2
2	4886006468	10012675@N05	1142732248	6	Cultural	986	2
3	4885404441	10012675@N05	1142732373	6	Cultural	986	2
4	4886008334	10012675@N05	1142732445	6	Cultural	986	2

	photoID	photoLon	photoLat
0	7941504100	-79.380844	43.645641
1	4886005532	-79.391525	43.654335
2	4886006468	-79.391525	43.654335
3	4885404441	-79.391525	43.654335
4	4886008334	-79.391525	43.654335

	Toro
#photo	39419.000000
#user	1395.000000
#seq	6057.000000
#poi	29.000000
#photo/user	28.257348
#seq/user	4.341935

	poiID	poiLon	poiLat
0	1	-79.379243	43.643183
1	2	-79.418634	43.632772
2	3	-79.380045	43.662175
3	4	-79.389290	43.641297
4	6	-79.392396	43.653662

	poiTheme	poiFreq	poiLon	poiLat
poiID
1	Sport	3506	-79.379243	43.643183
2	Sport	609	-79.418634	43.632772
3	Sport	688	-79.380045	43.662175
4	Sport	3056	-79.389290	43.641297
6	Cultural	986	-79.392396	43.653662

			dateTaken
			amin	amax
userID	seqID	poiID
10007579@N00	1	30	1346844688	1346844688
10012675@N05	2	6	1142731848	1142732445
	3	6	1142916492	1142916492
	4	13	1319327174	1319332848
10014440@N06	5	24	1196128621	1196128878

poiID	6	7	...	23	24	25
10007579@N00	0.000000	0.000000	...	0.000000	0.000000	0.000000
10012675@N05	2.776701	0.000000	...	0.000000	0.000000	0.000000
10014440@N06	0.000000	0.000000	...	3.565257	3.359456	3.902655
10031363@N00	0.000000	1.929419	...	0.000000	0.000000	0.000000
10116041@N02	0.000000	0.000000	...	0.000000	0.000000	0.000000

	userID	seqID	seqLen
0	10502709@N05	58	3
1	10502709@N05	67	3
2	10502709@N05	71	4
3	10627620@N06	99	3
4	10627620@N06	100	3

	avgDuration(sec)	popularity	poiTheme	poiLon	poiLat
poiID
1	5045.600000	10	Sport	-79.379243	43.643183
2	3957.833333	6	Sport	-79.418634	43.632772
3	1003.428571	14	Sport	-79.380045	43.662175
4	3126.444444	9	Sport	-79.389290	43.641297
6	1826.384615	13	Cultural	-79.392396	43.653662

	userID	poiID	poiDuration(sec)	timeRatio
71	10502709@N05	22	2164	2991.687500
72	10502709@N05	23	5133	2725.558824
73	10502709@N05	28	17388	2062.137931
77	10502709@N05	21	0	2595.469697
78	10502709@N05	22	0	2991.687500

	userID	poiID	poiDuration(sec)	timeRatio
71	10502709@N05	22	2164	0.723338
72	10502709@N05	23	5133	1.883284
73	10502709@N05	28	17388	8.432026
77	10502709@N05	21	0	0.000000
78	10502709@N05	22	0	0.000000