Trajectory Simulation

NOTE: Before running this notebook, please run script src/ijcai15_setup.py to setup data properly.

Experimental Setup

1. Experimental Setup

The states of the Markov chain (MC) corresponds to the categories of POIs, there is a special state "REST" which represents that people are having rests after some travelling.

Simulate trajectories using the transition matrix of the MC, when choosing a specific POI within a certain category, use the following rules:

The Nearest Neighbor of the current POI
The most Popular POI
A random POI choosing with probability proportional to the reciprocal of its distance to current POI
A random POI choosing with probability proportional to its popularity

1.1 Definitions

For user $u$ and POI $p$, define

Travel History: \begin{equation*} S_u = \{(p_1, t_{p_1}^a, t_{p_1}^d), \dots, (p_n, t_{p_n}^a, t_{p_n}^d)\} \end{equation*} where $t_{p_i}^a$ is the arrival time and $t_{p_i}^d$ the departure time of user $u$ at POI $p_i$
Travel Sequences: split $S_u$ if \begin{equation*} |t_{p_i}^d - t_{p_{i+1}}^a| > \tau ~(\text{e.g.}~ \tau = 8 ~\text{hours}) \end{equation*}
POI Popularity: \begin{equation*} Pop(p) = \sum_{u \in U} \sum_{p_i \in S_u} \delta(p_i == p) \end{equation*}

1.2 Load Trajectory Data



In [372]:

    
%matplotlib inline

import os
import math
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime



In [373]:

    
random.seed(123456789)



In [374]:

    
data_dir = 'data/data-ijcai15'
#fvisit = os.path.join(data_dir, 'userVisits-Osak.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Osak.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Glas.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Glas.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Edin.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Edin.csv')
fvisit = os.path.join(data_dir, 'userVisits-Toro.csv')
fcoord = os.path.join(data_dir, 'photoCoords-Toro.csv')



In [375]:

    
suffix = fvisit.split('-')[-1].split('.')[0]



In [376]:

    
visits = pd.read_csv(fvisit, sep=';')
visits.head()









    Out[376]:






  
    
      
      photoID
      userID
      dateTaken
      poiID
      poiTheme
      poiFreq
      seqID
    
  
  
    
      0
      7941504100
      10007579@N00
      1346844688
      30
      Structure
      1538
      1
    
    
      1
      4886005532
      10012675@N05
      1142731848
      6
      Cultural
      986
      2
    
    
      2
      4886006468
      10012675@N05
      1142732248
      6
      Cultural
      986
      2
    
    
      3
      4885404441
      10012675@N05
      1142732373
      6
      Cultural
      986
      2
    
    
      4
      4886008334
      10012675@N05
      1142732445
      6
      Cultural
      986
      2



In [377]:

    
coords = pd.read_csv(fcoord, sep=';')
coords.head()



In [378]:

    
# merge data frames according to column 'photoID'
assert(visits.shape[0] == coords.shape[0])
traj = pd.merge(visits, coords, on='photoID')
traj.head()









    Out[378]:






  
    
      
      photoID
      userID
      dateTaken
      poiID
      poiTheme
      poiFreq
      seqID
      photoLon
      photoLat
    
  
  
    
      0
      7941504100
      10007579@N00
      1346844688
      30
      Structure
      1538
      1
      -79.380844
      43.645641
    
    
      1
      4886005532
      10012675@N05
      1142731848
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      2
      4886006468
      10012675@N05
      1142732248
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      3
      4885404441
      10012675@N05
      1142732373
      6
      Cultural
      986
      2
      -79.391525
      43.654335
    
    
      4
      4886008334
      10012675@N05
      1142732445
      6
      Cultural
      986
      2
      -79.391525
      43.654335



In [379]:

    
num_photo = traj['photoID'].unique().shape[0]
num_user = traj['userID'].unique().shape[0]
num_seq = traj['seqID'].unique().shape[0]
num_poi = traj['poiID'].unique().shape[0]
pd.DataFrame([num_photo, num_user, num_seq, num_poi, num_photo/num_user, num_seq/num_user], \
             index = ['#photo', '#user', '#seq', '#poi', '#photo/user', '#seq/user'], columns=[str(suffix)])









    Out[379]:






  
    
      
      Toro
    
  
  
    
      #photo
      39419.000000
    
    
      #user
      1395.000000
    
    
      #seq
      6057.000000
    
    
      #poi
      29.000000
    
    
      #photo/user
      28.257348
    
    
      #seq/user
      4.341935

1.3 Compute POI Info

Compute POI (Longitude, Latitude) as the average coordinates of the assigned photos.



In [380]:

    
poi_coords = traj[['poiID', 'photoLon', 'photoLat']].groupby('poiID').agg(np.mean)
poi_coords.reset_index(inplace=True)
poi_coords.rename(columns={'photoLon':'poiLon', 'photoLat':'poiLat'}, inplace=True)
poi_coords.head()

Extract POI category and visiting frequency.



In [381]:

    
poi_catfreq = traj[['poiID', 'poiTheme', 'poiFreq']].groupby('poiID').first()
poi_catfreq.reset_index(inplace=True)
poi_catfreq.head()



In [382]:

    
poi_all = pd.merge(poi_catfreq, poi_coords, on='poiID')
poi_all.set_index('poiID', inplace=True)
poi_all.head()

1.4 Construct Travelling Sequences



In [383]:

    
seq_all = traj[['userID', 'seqID', 'poiID', 'dateTaken']].copy()\
          .groupby(['userID', 'seqID', 'poiID']).agg([np.min, np.max])
seq_all.columns = seq_all.columns.droplevel()
seq_all.reset_index(inplace=True)
seq_all.rename(columns={'amin':'arrivalTime', 'amax':'departureTime'}, inplace=True)
seq_all['poiDuration(sec)'] = seq_all['departureTime'] - seq_all['arrivalTime']
seq_all.head()









    Out[383]:






  
    
      
      userID
      seqID
      poiID
      arrivalTime
      departureTime
      poiDuration(sec)
    
  
  
    
      0
      10007579@N00
      1
      30
      1346844688
      1346844688
      0
    
    
      1
      10012675@N05
      2
      6
      1142731848
      1142732445
      597
    
    
      2
      10012675@N05
      3
      6
      1142916492
      1142916492
      0
    
    
      3
      10012675@N05
      4
      13
      1319327174
      1319332848
      5674
    
    
      4
      10014440@N06
      5
      24
      1196128621
      1196128878
      257



In [384]:

    
seq_start = seq_all[['userID', 'seqID', 'arrivalTime']].copy().groupby(['userID', 'seqID']).agg(np.min)
seq_start.rename(columns={'arrivalTime':'startTime'}, inplace=True)
seq_start.reset_index(inplace=True)
seq_start.head()









    Out[384]:






  
    
      
      userID
      seqID
      startTime
    
  
  
    
      0
      10007579@N00
      1
      1346844688
    
    
      1
      10012675@N05
      2
      1142731848
    
    
      2
      10012675@N05
      3
      1142916492
    
    
      3
      10012675@N05
      4
      1319327174
    
    
      4
      10014440@N06
      5
      1196128621



In [385]:

    
seq_end = seq_all[['userID', 'seqID', 'departureTime']].copy().groupby(['userID', 'seqID']).agg(np.max)
seq_end.rename(columns={'departureTime':'endTime'}, inplace=True)
seq_end.reset_index(inplace=True)
seq_end.head()









    Out[385]:






  
    
      
      userID
      seqID
      endTime
    
  
  
    
      0
      10007579@N00
      1
      1346844688
    
    
      1
      10012675@N05
      2
      1142732445
    
    
      2
      10012675@N05
      3
      1142916492
    
    
      3
      10012675@N05
      4
      1319332848
    
    
      4
      10014440@N06
      5
      1196128878



In [386]:

    
assert(seq_start.shape[0] == seq_end.shape[0])
user_seqs = pd.merge(seq_start, seq_end, on=['userID', 'seqID'])
user_seqs.head()
#user_seqs.loc[0, 'seqID']
#user_seqs['userID'].iloc[-1]









    Out[386]:






  
    
      
      userID
      seqID
      startTime
      endTime
    
  
  
    
      0
      10007579@N00
      1
      1346844688
      1346844688
    
    
      1
      10012675@N05
      2
      1142731848
      1142732445
    
    
      2
      10012675@N05
      3
      1142916492
      1142916492
    
    
      3
      10012675@N05
      4
      1319327174
      1319332848
    
    
      4
      10014440@N06
      5
      1196128621
      1196128878

1.5 POI Category Transition Matrix

Generate the extended transition matrix of POI category for actual trajectories with a special category REST.
For a specific user, if the time gap between the earlier sequence and the latter sequence is less than 'timeGap' (e.g. 24 hours), then add a REST state between the two sequences, otherwise, add a REST to REST transition after the earlier sequence.



In [387]:

    
def generate_ext_transmat(poi_all, seq_all, user_seqs, timeGap):
    """Calculate the extended transition matrix of POI category for actual trajectories with a special category REST.
       For a specific user, if the time gap between the earlier sequence and the latter sequence is less than 'timeGap', 
       then add a REST state between the two sequences, otherwise, 
       add a REST to REST transition after the earlier sequence.
    """
    assert(timeGap > 0)
    states = poi_all['poiTheme'].unique().tolist()
    states.sort()
    states.append('REST')
    
    ext_transmat = pd.DataFrame(data=np.zeros((len(states), len(states)), dtype=np.float64), \
                                index=states, columns=states)
    
    for user in user_seqs['userID'].unique():
        sequ = user_seqs[user_seqs['userID'] == user].copy()
        sequ.sort(columns=['startTime'], ascending=True, inplace=True)
        prev_seqEndTime = None 
        prev_endPOICat = None 
        # sequence with length 1 should be considered
        for i in range(len(sequ.index)):
            idx = sequ.index[i]
            seqid = sequ.loc[idx, 'seqID']
            seq = seq_all[seq_all['seqID'] == seqid].copy()
            seq.sort(columns=['arrivalTime'], ascending=True, inplace=True)
            for j in range(len(seq.index)-1):
                poi1 = seq.loc[seq.index[j], 'poiID']
                poi2 = seq.loc[seq.index[j+1], 'poiID']
                cat1 = poi_all.loc[poi1, 'poiTheme']
                cat2 = poi_all.loc[poi2, 'poiTheme']
                ext_transmat.loc[cat1, cat2] += 1
            
            # REST state
            if i > 0: 
                startTime = sequ.loc[idx, 'startTime']
                assert(prev_seqEndTime is not None)
                assert(startTime >= prev_seqEndTime)
                ext_transmat.loc[prev_endPOICat, 'REST'] += 1  # POI-->REST
                if startTime - prev_seqEndTime < timeGap:      # REST-->POI
                    poi0 = seq.loc[seq.index[0], 'poiID']
                    startPOICat = poi_all.loc[poi0, 'poiTheme']
                    ext_transmat.loc['REST', startPOICat] += 1
                else:                                          # REST-->REST
                    ext_transmat.loc['REST', 'REST'] += 1
                    
            # memorise info of previous sequence       
            prev_seqEndTime = sequ.loc[idx, 'endTime']
            poiN = seq.loc[seq.index[-1], 'poiID']
            prev_endPOICat = poi_all.loc[poiN, 'poiTheme']

    # normalize each row to get the transition probability from cati to catj
    for r in ext_transmat.index:
        rowsum = ext_transmat.ix[r].sum()
        if rowsum == 0: continue  # deal with lack of data
        ext_transmat.loc[r] /= rowsum
    return ext_transmat



In [388]:

    
timeGap = 24 * 60 * 60  # 24 hours



In [389]:

    
trans_mat = generate_ext_transmat(poi_all, seq_all, user_seqs, timeGap)
trans_mat



In [390]:

    
#trans_mat.columns[-1]
#trans_mat.loc['Sport']
#np.array(trans_mat.loc['Sport'])
#np.array(trans_mat.loc['Sport']).sum()

1.6 POI Transition Rules

When choosing a specific POI within a certain POI category, consider two types of rules:

Rules based the distance between candidate POI and the current POI
Rules based on popularity of candidate POI



In [391]:

    
def calc_dist(longitude1, latitude1, longitude2, latitude2):
    """Calculate the distance (unit: km) between two places on earth"""
    # convert degrees to radians
    lon1 = math.radians(longitude1)
    lat1 = math.radians(latitude1)
    lon2 = math.radians(longitude2)
    lat2 = math.radians(latitude2)
    radius = 6371.009 # mean earth radius is 6371.009km, en.wikipedia.org/wiki/Earth_radius#Mean_radius
    # The haversine formula, en.wikipedia.org/wiki/Great-circle_distance
    dlon = math.fabs(lon1 - lon2)
    dlat = math.fabs(lat1 - lat2)
    return 2 * radius * math.asin( math.sqrt( \
               (math.sin(0.5*dlat))**2 + math.cos(lat1) * math.cos(lat2) * (math.sin(0.5*dlon))**2 ))

Distance based rules

The Nearest Neighbor of the current POI
A random POI choosing with probability proportional to the reciprocal of its distance to current POI



In [392]:

    
def rule_NN(current_poi, next_poi_cat, poi_all, randomized):
    """
    choosing a specific POI within a category.
    if randomized == True, 
    return a random POI choosing with probability proportional to the reciprocal of its distance to current POI
    otherwise, return the Nearest Neighbor of the current POI
    """
    assert(current_poi in poi_all.index)
    assert(next_poi_cat in poi_all['poiTheme'].unique())
    poi_index = None
    if poi_all.loc[current_poi, 'poiTheme'] == next_poi_cat:
        poi_index = [x for x in poi_all[poi_all['poiTheme'] == next_poi_cat].index if x != current_poi]
    else:
        poi_index = poi_all[poi_all['poiTheme'] == next_poi_cat].index
    
    probs = np.zeros(len(poi_index), dtype=np.float64)
    for i in range(len(poi_index)):
        dist = calc_dist(poi_all.loc[current_poi, 'poiLon'], poi_all.loc[current_poi, 'poiLat'], \
                         poi_all.loc[poi_index[i],'poiLon'], poi_all.loc[poi_index[i],'poiLat'])
        assert(dist > 0.)
        probs[i] = 1. / dist
    
    idx = None
    if randomized == True:
        probs /= np.sum(probs) # normalise
        sample = np.random.multinomial(1, probs) # catgorical/multinoulli distribution, multinomial distribution (n=1)
        for j in range(len(sample)):
            if sample[j] == 1: 
                idx = j
                break
    else:
        idx = probs.argmax()
    assert(idx is not None)
    return poi_index[idx]

POI Popularity based rules

The most Popular POI
A random POI choosing with probability proportional to its popularity



In [393]:

    
def rule_Pop(current_poi, next_poi_cat, poi_all, randomized):
    """
    choosing a specific POI within a category.
    if randomized == True,
    returen a random POI choosing with probability proportional to its popularity
    otherwise, return the The most Popular POI
    """
    assert(current_poi in poi_all.index)
    assert(next_poi_cat in poi_all['poiTheme'].unique())
    poi_index = None
    if poi_all.loc[current_poi, 'poiTheme'] == next_poi_cat:
        poi_index = [x for x in poi_all[poi_all['poiTheme'] == next_poi_cat].index if x != current_poi]
    else:
        poi_index = poi_all[poi_all['poiTheme'] == next_poi_cat].index
    
    probs = np.zeros(len(poi_index), dtype=np.float64)
    for i in range(len(poi_index)):
        probs[i] = poi_all.loc[poi_index[i],'poiFreq']
    
    idx = None
    if randomized == True:
        probs /= np.sum(probs) # normalise
        sample = np.random.multinomial(1, probs) # catgorical/multinoulli distribution, multinomial distribution (n=1) 
        for j in range(len(sample)):
            if sample[j] == 1: 
                idx = j
                break
    else:
        idx = probs.argmax()
    assert(idx is not None)
    return poi_index[idx]

1.7 Simulation



In [394]:

    
def extract_seq(seqid_set, seq_all):
    """Extract the actual sequences (i.e. a list of POI) from a set of sequence ID"""
    seq_dict = dict()
    for seqid in seqid_set:
        seqi = seq_all[seq_all['seqID'] == seqid].copy()
        seqi.sort(columns=['arrivalTime'], ascending=True, inplace=True)
        seq_dict[seqid] = seqi['poiID'].tolist()
    return seq_dict



In [395]:

    
all_seqid = seq_all['seqID'].unique()



In [396]:

    
all_seq_dict = extract_seq(all_seqid, seq_all)



In [397]:

    
def choose_start_poi(all_seq_dict, seqLen):
    """choose the first POI in a random actual sequence"""
    assert(seqLen > 0)
    while True:
        seqid = random.choice(sorted(all_seq_dict.keys()))
        if len(all_seq_dict[seqid]) > seqLen:
            return all_seq_dict[seqid][0]



In [398]:

    
obs_mat = trans_mat.copy() * 0
obs_mat



In [399]:

    
prefer_NN_over_Pop = True
randomized = True
N = 1000 # number of observations



In [400]:

    
prevpoi = choose_start_poi(all_seq_dict, 1)
prevcat = poi_all.loc[prevpoi, 'poiTheme']
nextpoi = None
nextcat = None
print('(%s, POI %d)->' % (prevcat, prevpoi))
n = 0
while n < N:
    # choose the next POI category
    # catgorical/multinoulli distribution, special case of multinomial distribution (n=1)
    sample = np.random.multinomial(1, np.array(trans_mat.loc[prevcat]))
    nextcat = None
    for j in range(len(sample)):
        if sample[j] == 1: nextcat = trans_mat.columns[j]
    assert(nextcat is not None)
    
    obs_mat.loc[prevcat, nextcat] += 1
    
    # choose the next POI
    if nextcat == 'REST':
        nextpoi = choose_start_poi(all_seq_dict, 1)  # restart
        print('(REST)->')
    else:
        if prefer_NN_over_Pop == True:
            nextpoi = rule_NN(prevpoi, nextcat, poi_all, randomized)
        else:
            nextpoi = rule_Pop(prevpoi, nextcat, poi_all, randomized)
        print('(%s, POI %d)->' % (nextcat, nextpoi))
            
    prevcat = nextcat
    prevpoi = nextpoi
    n += 1









    



(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 20)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(Structure, POI 30)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(Amusement, POI 16)->
(REST)->
(Beach, POI 19)->
(REST)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(Sport, POI 4)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 21)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 17)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 29)->
(Structure, POI 28)->
(REST)->
(Structure, POI 29)->
(REST)->
(Cultural, POI 11)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(Amusement, POI 17)->
(Cultural, POI 13)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 8)->
(REST)->
(REST)->
(Amusement, POI 16)->
(Shopping, POI 24)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(Structure, POI 29)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(Cultural, POI 11)->
(REST)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(Beach, POI 19)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(Structure, POI 28)->
(Structure, POI 29)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(Cultural, POI 10)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(Cultural, POI 13)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 13)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(Structure, POI 30)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 8)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(Cultural, POI 7)->
(Shopping, POI 25)->
(REST)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(Beach, POI 21)->
(REST)->
(Cultural, POI 11)->
(Structure, POI 28)->
(REST)->
(Beach, POI 21)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->



In [401]:

    
obs_mat



In [402]:

    
# MEL estimation
est_mat = obs_mat.copy()
for r in est_mat.index:
    rowsum = est_mat.ix[r].sum()
    if rowsum == 0: continue  # deal with lack of data
    est_mat.loc[r] /= rowsum



In [403]:

    
est_mat



In [404]:

    
trans_mat

	poiID	poiLon	poiLat
0	1	-79.379243	43.643183
1	2	-79.418634	43.632772
2	3	-79.380045	43.662175
3	4	-79.389290	43.641297
4	6	-79.392396	43.653662

	poiTheme	poiFreq	poiLon	poiLat
poiID
1	Sport	3506	-79.379243	43.643183
2	Sport	609	-79.418634	43.632772
3	Sport	688	-79.380045	43.662175
4	Sport	3056	-79.389290	43.641297
6	Cultural	986	-79.392396	43.653662

	Amusement	Beach	Cultural	Shopping	Sport	Structure	REST
Amusement	0.030501	0.043573	0.111111	0.037037	0.076253	0.034858	0.666667
Beach	0.013265	0.031688	0.044952	0.067797	0.014001	0.078113	0.750184
Cultural	0.031229	0.047522	0.027155	0.048880	0.014936	0.063815	0.766463
Shopping	0.013093	0.082651	0.047463	0.013093	0.015548	0.058101	0.770049
Sport	0.053985	0.029563	0.026992	0.016710	0.010283	0.026992	0.835476
Structure	0.028169	0.098592	0.087757	0.063922	0.026002	0.026002	0.669556
REST	0.010511	0.024024	0.034749	0.019091	0.026598	0.020163	0.864865

	Amusement	Beach	Cultural	Shopping	Sport	Structure	REST
Amusement	0.000000	0.000000	0.071429	0.071429	0.00000	0.071429	0.785714
Beach	0.000000	0.000000	0.032258	0.032258	0.00000	0.064516	0.870968
Cultural	0.000000	0.041667	0.041667	0.125000	0.00000	0.041667	0.750000
Shopping	0.000000	0.130435	0.000000	0.000000	0.00000	0.043478	0.826087
Sport	0.058824	0.000000	0.000000	0.000000	0.00000	0.058824	0.882353
Structure	0.028571	0.114286	0.057143	0.000000	0.00000	0.085714	0.714286
REST	0.014019	0.026869	0.022196	0.019860	0.01986	0.030374	0.866822

	Amusement	Beach	Cultural	Shopping	Sport	Structure	REST
Amusement	0.030501	0.043573	0.111111	0.037037	0.076253	0.034858	0.666667
Beach	0.013265	0.031688	0.044952	0.067797	0.014001	0.078113	0.750184
Cultural	0.031229	0.047522	0.027155	0.048880	0.014936	0.063815	0.766463
Shopping	0.013093	0.082651	0.047463	0.013093	0.015548	0.058101	0.770049
Sport	0.053985	0.029563	0.026992	0.016710	0.010283	0.026992	0.835476
Structure	0.028169	0.098592	0.087757	0.063922	0.026002	0.026002	0.669556
REST	0.010511	0.024024	0.034749	0.019091	0.026598	0.020163	0.864865

	photoID	userID	dateTaken	poiID	poiTheme	poiFreq	seqID
0	7941504100	10007579@N00	1346844688	30	Structure	1538	1
1	4886005532	10012675@N05	1142731848	6	Cultural	986	2
2	4886006468	10012675@N05	1142732248	6	Cultural	986	2
3	4885404441	10012675@N05	1142732373	6	Cultural	986	2
4	4886008334	10012675@N05	1142732445	6	Cultural	986	2

	photoID	photoLon	photoLat
0	7941504100	-79.380844	43.645641
1	4886005532	-79.391525	43.654335
2	4886006468	-79.391525	43.654335
3	4885404441	-79.391525	43.654335
4	4886008334	-79.391525	43.654335

	Toro
#photo	39419.000000
#user	1395.000000
#seq	6057.000000
#poi	29.000000
#photo/user	28.257348
#seq/user	4.341935