Trajectory Simulation

NOTE: Before running this notebook, please run script src/ijcai15_setup.py to setup data properly.

1. Experimental Setup

The states of the Markov chain (MC) corresponds to the categories of POIs, there is a special state "REST" which represents that people are having rests after some travelling.

Simulate trajectories using the transition matrix of the MC, when choosing a specific POI within a certain category, use the following rules:

  1. The Nearest Neighbor of the current POI
  2. The most Popular POI
  3. A random POI choosing with probability proportional to the reciprocal of its distance to current POI
  4. A random POI choosing with probability proportional to its popularity

1.1 Definitions

For user $u$ and POI $p$, define

  • Travel History: \begin{equation*} S_u = \{(p_1, t_{p_1}^a, t_{p_1}^d), \dots, (p_n, t_{p_n}^a, t_{p_n}^d)\} \end{equation*} where $t_{p_i}^a$ is the arrival time and $t_{p_i}^d$ the departure time of user $u$ at POI $p_i$

  • Travel Sequences: split $S_u$ if \begin{equation*} |t_{p_i}^d - t_{p_{i+1}}^a| > \tau ~(\text{e.g.}~ \tau = 8 ~\text{hours}) \end{equation*}

  • POI Popularity: \begin{equation*} Pop(p) = \sum_{u \in U} \sum_{p_i \in S_u} \delta(p_i == p) \end{equation*}

1.2 Load Trajectory Data


In [372]:
%matplotlib inline

import os
import math
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

In [373]:
random.seed(123456789)

In [374]:
data_dir = 'data/data-ijcai15'
#fvisit = os.path.join(data_dir, 'userVisits-Osak.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Osak.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Glas.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Glas.csv')
#fvisit = os.path.join(data_dir, 'userVisits-Edin.csv')
#fcoord = os.path.join(data_dir, 'photoCoords-Edin.csv')
fvisit = os.path.join(data_dir, 'userVisits-Toro.csv')
fcoord = os.path.join(data_dir, 'photoCoords-Toro.csv')

In [375]:
suffix = fvisit.split('-')[-1].split('.')[0]

In [376]:
visits = pd.read_csv(fvisit, sep=';')
visits.head()


Out[376]:
photoID userID dateTaken poiID poiTheme poiFreq seqID
0 7941504100 10007579@N00 1346844688 30 Structure 1538 1
1 4886005532 10012675@N05 1142731848 6 Cultural 986 2
2 4886006468 10012675@N05 1142732248 6 Cultural 986 2
3 4885404441 10012675@N05 1142732373 6 Cultural 986 2
4 4886008334 10012675@N05 1142732445 6 Cultural 986 2

In [377]:
coords = pd.read_csv(fcoord, sep=';')
coords.head()


Out[377]:
photoID photoLon photoLat
0 7941504100 -79.380844 43.645641
1 4886005532 -79.391525 43.654335
2 4886006468 -79.391525 43.654335
3 4885404441 -79.391525 43.654335
4 4886008334 -79.391525 43.654335

In [378]:
# merge data frames according to column 'photoID'
assert(visits.shape[0] == coords.shape[0])
traj = pd.merge(visits, coords, on='photoID')
traj.head()


Out[378]:
photoID userID dateTaken poiID poiTheme poiFreq seqID photoLon photoLat
0 7941504100 10007579@N00 1346844688 30 Structure 1538 1 -79.380844 43.645641
1 4886005532 10012675@N05 1142731848 6 Cultural 986 2 -79.391525 43.654335
2 4886006468 10012675@N05 1142732248 6 Cultural 986 2 -79.391525 43.654335
3 4885404441 10012675@N05 1142732373 6 Cultural 986 2 -79.391525 43.654335
4 4886008334 10012675@N05 1142732445 6 Cultural 986 2 -79.391525 43.654335

In [379]:
num_photo = traj['photoID'].unique().shape[0]
num_user = traj['userID'].unique().shape[0]
num_seq = traj['seqID'].unique().shape[0]
num_poi = traj['poiID'].unique().shape[0]
pd.DataFrame([num_photo, num_user, num_seq, num_poi, num_photo/num_user, num_seq/num_user], \
             index = ['#photo', '#user', '#seq', '#poi', '#photo/user', '#seq/user'], columns=[str(suffix)])


Out[379]:
Toro
#photo 39419.000000
#user 1395.000000
#seq 6057.000000
#poi 29.000000
#photo/user 28.257348
#seq/user 4.341935

1.3 Compute POI Info

Compute POI (Longitude, Latitude) as the average coordinates of the assigned photos.


In [380]:
poi_coords = traj[['poiID', 'photoLon', 'photoLat']].groupby('poiID').agg(np.mean)
poi_coords.reset_index(inplace=True)
poi_coords.rename(columns={'photoLon':'poiLon', 'photoLat':'poiLat'}, inplace=True)
poi_coords.head()


Out[380]:
poiID poiLon poiLat
0 1 -79.379243 43.643183
1 2 -79.418634 43.632772
2 3 -79.380045 43.662175
3 4 -79.389290 43.641297
4 6 -79.392396 43.653662

Extract POI category and visiting frequency.


In [381]:
poi_catfreq = traj[['poiID', 'poiTheme', 'poiFreq']].groupby('poiID').first()
poi_catfreq.reset_index(inplace=True)
poi_catfreq.head()


Out[381]:
poiID poiTheme poiFreq
0 1 Sport 3506
1 2 Sport 609
2 3 Sport 688
3 4 Sport 3056
4 6 Cultural 986

In [382]:
poi_all = pd.merge(poi_catfreq, poi_coords, on='poiID')
poi_all.set_index('poiID', inplace=True)
poi_all.head()


Out[382]:
poiTheme poiFreq poiLon poiLat
poiID
1 Sport 3506 -79.379243 43.643183
2 Sport 609 -79.418634 43.632772
3 Sport 688 -79.380045 43.662175
4 Sport 3056 -79.389290 43.641297
6 Cultural 986 -79.392396 43.653662

1.4 Construct Travelling Sequences


In [383]:
seq_all = traj[['userID', 'seqID', 'poiID', 'dateTaken']].copy()\
          .groupby(['userID', 'seqID', 'poiID']).agg([np.min, np.max])
seq_all.columns = seq_all.columns.droplevel()
seq_all.reset_index(inplace=True)
seq_all.rename(columns={'amin':'arrivalTime', 'amax':'departureTime'}, inplace=True)
seq_all['poiDuration(sec)'] = seq_all['departureTime'] - seq_all['arrivalTime']
seq_all.head()


Out[383]:
userID seqID poiID arrivalTime departureTime poiDuration(sec)
0 10007579@N00 1 30 1346844688 1346844688 0
1 10012675@N05 2 6 1142731848 1142732445 597
2 10012675@N05 3 6 1142916492 1142916492 0
3 10012675@N05 4 13 1319327174 1319332848 5674
4 10014440@N06 5 24 1196128621 1196128878 257

In [384]:
seq_start = seq_all[['userID', 'seqID', 'arrivalTime']].copy().groupby(['userID', 'seqID']).agg(np.min)
seq_start.rename(columns={'arrivalTime':'startTime'}, inplace=True)
seq_start.reset_index(inplace=True)
seq_start.head()


Out[384]:
userID seqID startTime
0 10007579@N00 1 1346844688
1 10012675@N05 2 1142731848
2 10012675@N05 3 1142916492
3 10012675@N05 4 1319327174
4 10014440@N06 5 1196128621

In [385]:
seq_end = seq_all[['userID', 'seqID', 'departureTime']].copy().groupby(['userID', 'seqID']).agg(np.max)
seq_end.rename(columns={'departureTime':'endTime'}, inplace=True)
seq_end.reset_index(inplace=True)
seq_end.head()


Out[385]:
userID seqID endTime
0 10007579@N00 1 1346844688
1 10012675@N05 2 1142732445
2 10012675@N05 3 1142916492
3 10012675@N05 4 1319332848
4 10014440@N06 5 1196128878

In [386]:
assert(seq_start.shape[0] == seq_end.shape[0])
user_seqs = pd.merge(seq_start, seq_end, on=['userID', 'seqID'])
user_seqs.head()
#user_seqs.loc[0, 'seqID']
#user_seqs['userID'].iloc[-1]


Out[386]:
userID seqID startTime endTime
0 10007579@N00 1 1346844688 1346844688
1 10012675@N05 2 1142731848 1142732445
2 10012675@N05 3 1142916492 1142916492
3 10012675@N05 4 1319327174 1319332848
4 10014440@N06 5 1196128621 1196128878

1.5 POI Category Transition Matrix

Generate the extended transition matrix of POI category for actual trajectories with a special category REST.
For a specific user, if the time gap between the earlier sequence and the latter sequence is less than 'timeGap' (e.g. 24 hours), then add a REST state between the two sequences, otherwise, add a REST to REST transition after the earlier sequence.


In [387]:
def generate_ext_transmat(poi_all, seq_all, user_seqs, timeGap):
    """Calculate the extended transition matrix of POI category for actual trajectories with a special category REST.
       For a specific user, if the time gap between the earlier sequence and the latter sequence is less than 'timeGap', 
       then add a REST state between the two sequences, otherwise, 
       add a REST to REST transition after the earlier sequence.
    """
    assert(timeGap > 0)
    states = poi_all['poiTheme'].unique().tolist()
    states.sort()
    states.append('REST')
    
    ext_transmat = pd.DataFrame(data=np.zeros((len(states), len(states)), dtype=np.float64), \
                                index=states, columns=states)
    
    for user in user_seqs['userID'].unique():
        sequ = user_seqs[user_seqs['userID'] == user].copy()
        sequ.sort(columns=['startTime'], ascending=True, inplace=True)
        prev_seqEndTime = None 
        prev_endPOICat = None 
        # sequence with length 1 should be considered
        for i in range(len(sequ.index)):
            idx = sequ.index[i]
            seqid = sequ.loc[idx, 'seqID']
            seq = seq_all[seq_all['seqID'] == seqid].copy()
            seq.sort(columns=['arrivalTime'], ascending=True, inplace=True)
            for j in range(len(seq.index)-1):
                poi1 = seq.loc[seq.index[j], 'poiID']
                poi2 = seq.loc[seq.index[j+1], 'poiID']
                cat1 = poi_all.loc[poi1, 'poiTheme']
                cat2 = poi_all.loc[poi2, 'poiTheme']
                ext_transmat.loc[cat1, cat2] += 1
            
            # REST state
            if i > 0: 
                startTime = sequ.loc[idx, 'startTime']
                assert(prev_seqEndTime is not None)
                assert(startTime >= prev_seqEndTime)
                ext_transmat.loc[prev_endPOICat, 'REST'] += 1  # POI-->REST
                if startTime - prev_seqEndTime < timeGap:      # REST-->POI
                    poi0 = seq.loc[seq.index[0], 'poiID']
                    startPOICat = poi_all.loc[poi0, 'poiTheme']
                    ext_transmat.loc['REST', startPOICat] += 1
                else:                                          # REST-->REST
                    ext_transmat.loc['REST', 'REST'] += 1
                    
            # memorise info of previous sequence       
            prev_seqEndTime = sequ.loc[idx, 'endTime']
            poiN = seq.loc[seq.index[-1], 'poiID']
            prev_endPOICat = poi_all.loc[poiN, 'poiTheme']

    # normalize each row to get the transition probability from cati to catj
    for r in ext_transmat.index:
        rowsum = ext_transmat.ix[r].sum()
        if rowsum == 0: continue  # deal with lack of data
        ext_transmat.loc[r] /= rowsum
    return ext_transmat

In [388]:
timeGap = 24 * 60 * 60  # 24 hours

In [389]:
trans_mat = generate_ext_transmat(poi_all, seq_all, user_seqs, timeGap)
trans_mat


Out[389]:
Amusement Beach Cultural Shopping Sport Structure REST
Amusement 0.030501 0.043573 0.111111 0.037037 0.076253 0.034858 0.666667
Beach 0.013265 0.031688 0.044952 0.067797 0.014001 0.078113 0.750184
Cultural 0.031229 0.047522 0.027155 0.048880 0.014936 0.063815 0.766463
Shopping 0.013093 0.082651 0.047463 0.013093 0.015548 0.058101 0.770049
Sport 0.053985 0.029563 0.026992 0.016710 0.010283 0.026992 0.835476
Structure 0.028169 0.098592 0.087757 0.063922 0.026002 0.026002 0.669556
REST 0.010511 0.024024 0.034749 0.019091 0.026598 0.020163 0.864865

In [390]:
#trans_mat.columns[-1]
#trans_mat.loc['Sport']
#np.array(trans_mat.loc['Sport'])
#np.array(trans_mat.loc['Sport']).sum()

1.6 POI Transition Rules

When choosing a specific POI within a certain POI category, consider two types of rules:

  1. Rules based the distance between candidate POI and the current POI
  2. Rules based on popularity of candidate POI

In [391]:
def calc_dist(longitude1, latitude1, longitude2, latitude2):
    """Calculate the distance (unit: km) between two places on earth"""
    # convert degrees to radians
    lon1 = math.radians(longitude1)
    lat1 = math.radians(latitude1)
    lon2 = math.radians(longitude2)
    lat2 = math.radians(latitude2)
    radius = 6371.009 # mean earth radius is 6371.009km, en.wikipedia.org/wiki/Earth_radius#Mean_radius
    # The haversine formula, en.wikipedia.org/wiki/Great-circle_distance
    dlon = math.fabs(lon1 - lon2)
    dlat = math.fabs(lat1 - lat2)
    return 2 * radius * math.asin( math.sqrt( \
               (math.sin(0.5*dlat))**2 + math.cos(lat1) * math.cos(lat2) * (math.sin(0.5*dlon))**2 ))

Distance based rules

  1. The Nearest Neighbor of the current POI
  2. A random POI choosing with probability proportional to the reciprocal of its distance to current POI

In [392]:
def rule_NN(current_poi, next_poi_cat, poi_all, randomized):
    """
    choosing a specific POI within a category.
    if randomized == True, 
    return a random POI choosing with probability proportional to the reciprocal of its distance to current POI
    otherwise, return the Nearest Neighbor of the current POI
    """
    assert(current_poi in poi_all.index)
    assert(next_poi_cat in poi_all['poiTheme'].unique())
    poi_index = None
    if poi_all.loc[current_poi, 'poiTheme'] == next_poi_cat:
        poi_index = [x for x in poi_all[poi_all['poiTheme'] == next_poi_cat].index if x != current_poi]
    else:
        poi_index = poi_all[poi_all['poiTheme'] == next_poi_cat].index
    
    probs = np.zeros(len(poi_index), dtype=np.float64)
    for i in range(len(poi_index)):
        dist = calc_dist(poi_all.loc[current_poi, 'poiLon'], poi_all.loc[current_poi, 'poiLat'], \
                         poi_all.loc[poi_index[i],'poiLon'], poi_all.loc[poi_index[i],'poiLat'])
        assert(dist > 0.)
        probs[i] = 1. / dist
    
    idx = None
    if randomized == True:
        probs /= np.sum(probs) # normalise
        sample = np.random.multinomial(1, probs) # catgorical/multinoulli distribution, multinomial distribution (n=1)
        for j in range(len(sample)):
            if sample[j] == 1: 
                idx = j
                break
    else:
        idx = probs.argmax()
    assert(idx is not None)
    return poi_index[idx]

POI Popularity based rules

  1. The most Popular POI
  2. A random POI choosing with probability proportional to its popularity

In [393]:
def rule_Pop(current_poi, next_poi_cat, poi_all, randomized):
    """
    choosing a specific POI within a category.
    if randomized == True,
    returen a random POI choosing with probability proportional to its popularity
    otherwise, return the The most Popular POI
    """
    assert(current_poi in poi_all.index)
    assert(next_poi_cat in poi_all['poiTheme'].unique())
    poi_index = None
    if poi_all.loc[current_poi, 'poiTheme'] == next_poi_cat:
        poi_index = [x for x in poi_all[poi_all['poiTheme'] == next_poi_cat].index if x != current_poi]
    else:
        poi_index = poi_all[poi_all['poiTheme'] == next_poi_cat].index
    
    probs = np.zeros(len(poi_index), dtype=np.float64)
    for i in range(len(poi_index)):
        probs[i] = poi_all.loc[poi_index[i],'poiFreq']
    
    idx = None
    if randomized == True:
        probs /= np.sum(probs) # normalise
        sample = np.random.multinomial(1, probs) # catgorical/multinoulli distribution, multinomial distribution (n=1) 
        for j in range(len(sample)):
            if sample[j] == 1: 
                idx = j
                break
    else:
        idx = probs.argmax()
    assert(idx is not None)
    return poi_index[idx]

1.7 Simulation


In [394]:
def extract_seq(seqid_set, seq_all):
    """Extract the actual sequences (i.e. a list of POI) from a set of sequence ID"""
    seq_dict = dict()
    for seqid in seqid_set:
        seqi = seq_all[seq_all['seqID'] == seqid].copy()
        seqi.sort(columns=['arrivalTime'], ascending=True, inplace=True)
        seq_dict[seqid] = seqi['poiID'].tolist()
    return seq_dict

In [395]:
all_seqid = seq_all['seqID'].unique()

In [396]:
all_seq_dict = extract_seq(all_seqid, seq_all)

In [397]:
def choose_start_poi(all_seq_dict, seqLen):
    """choose the first POI in a random actual sequence"""
    assert(seqLen > 0)
    while True:
        seqid = random.choice(sorted(all_seq_dict.keys()))
        if len(all_seq_dict[seqid]) > seqLen:
            return all_seq_dict[seqid][0]

In [398]:
obs_mat = trans_mat.copy() * 0
obs_mat


Out[398]:
Amusement Beach Cultural Shopping Sport Structure REST
Amusement 0 0 0 0 0 0 0
Beach 0 0 0 0 0 0 0
Cultural 0 0 0 0 0 0 0
Shopping 0 0 0 0 0 0 0
Sport 0 0 0 0 0 0 0
Structure 0 0 0 0 0 0 0
REST 0 0 0 0 0 0 0

In [399]:
prefer_NN_over_Pop = True
randomized = True
N = 1000 # number of observations

In [400]:
prevpoi = choose_start_poi(all_seq_dict, 1)
prevcat = poi_all.loc[prevpoi, 'poiTheme']
nextpoi = None
nextcat = None
print('(%s, POI %d)->' % (prevcat, prevpoi))
n = 0
while n < N:
    # choose the next POI category
    # catgorical/multinoulli distribution, special case of multinomial distribution (n=1)
    sample = np.random.multinomial(1, np.array(trans_mat.loc[prevcat]))
    nextcat = None
    for j in range(len(sample)):
        if sample[j] == 1: nextcat = trans_mat.columns[j]
    assert(nextcat is not None)
    
    obs_mat.loc[prevcat, nextcat] += 1
    
    # choose the next POI
    if nextcat == 'REST':
        nextpoi = choose_start_poi(all_seq_dict, 1)  # restart
        print('(REST)->')
    else:
        if prefer_NN_over_Pop == True:
            nextpoi = rule_NN(prevpoi, nextcat, poi_all, randomized)
        else:
            nextpoi = rule_Pop(prevpoi, nextcat, poi_all, randomized)
        print('(%s, POI %d)->' % (nextcat, nextpoi))
            
    prevcat = nextcat
    prevpoi = nextpoi
    n += 1


(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 20)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(Structure, POI 30)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(Amusement, POI 16)->
(REST)->
(Beach, POI 19)->
(REST)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(Sport, POI 4)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 21)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 17)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 29)->
(Structure, POI 28)->
(REST)->
(Structure, POI 29)->
(REST)->
(Cultural, POI 11)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(Amusement, POI 17)->
(Cultural, POI 13)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 8)->
(REST)->
(REST)->
(Amusement, POI 16)->
(Shopping, POI 24)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 6)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 2)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(REST)->
(Structure, POI 29)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(Cultural, POI 11)->
(REST)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(Beach, POI 19)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(Structure, POI 28)->
(Structure, POI 29)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 21)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(Shopping, POI 24)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(Sport, POI 3)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(Cultural, POI 10)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 30)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 11)->
(Beach, POI 19)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 7)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 27)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 16)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 22)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Beach, POI 19)->
(Cultural, POI 13)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 13)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 1)->
(Structure, POI 30)->
(Structure, POI 28)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(REST)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 25)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Cultural, POI 8)->
(REST)->
(REST)->
(REST)->
(Structure, POI 28)->
(REST)->
(Cultural, POI 7)->
(Shopping, POI 25)->
(REST)->
(Structure, POI 29)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Sport, POI 4)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Shopping, POI 23)->
(Beach, POI 21)->
(REST)->
(Cultural, POI 11)->
(Structure, POI 28)->
(REST)->
(Beach, POI 21)->
(Shopping, POI 23)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(Amusement, POI 14)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->
(REST)->

In [401]:
obs_mat


Out[401]:
Amusement Beach Cultural Shopping Sport Structure REST
Amusement 0 0 1 1 0 1 11
Beach 0 0 1 1 0 2 27
Cultural 0 1 1 3 0 1 18
Shopping 0 3 0 0 0 1 19
Sport 1 0 0 0 0 1 15
Structure 1 4 2 0 0 3 25
REST 12 23 19 17 17 26 742

In [402]:
# MEL estimation
est_mat = obs_mat.copy()
for r in est_mat.index:
    rowsum = est_mat.ix[r].sum()
    if rowsum == 0: continue  # deal with lack of data
    est_mat.loc[r] /= rowsum

In [403]:
est_mat


Out[403]:
Amusement Beach Cultural Shopping Sport Structure REST
Amusement 0.000000 0.000000 0.071429 0.071429 0.00000 0.071429 0.785714
Beach 0.000000 0.000000 0.032258 0.032258 0.00000 0.064516 0.870968
Cultural 0.000000 0.041667 0.041667 0.125000 0.00000 0.041667 0.750000
Shopping 0.000000 0.130435 0.000000 0.000000 0.00000 0.043478 0.826087
Sport 0.058824 0.000000 0.000000 0.000000 0.00000 0.058824 0.882353
Structure 0.028571 0.114286 0.057143 0.000000 0.00000 0.085714 0.714286
REST 0.014019 0.026869 0.022196 0.019860 0.01986 0.030374 0.866822

In [404]:
trans_mat


Out[404]:
Amusement Beach Cultural Shopping Sport Structure REST
Amusement 0.030501 0.043573 0.111111 0.037037 0.076253 0.034858 0.666667
Beach 0.013265 0.031688 0.044952 0.067797 0.014001 0.078113 0.750184
Cultural 0.031229 0.047522 0.027155 0.048880 0.014936 0.063815 0.766463
Shopping 0.013093 0.082651 0.047463 0.013093 0.015548 0.058101 0.770049
Sport 0.053985 0.029563 0.026992 0.016710 0.010283 0.026992 0.835476
Structure 0.028169 0.098592 0.087757 0.063922 0.026002 0.026002 0.669556
REST 0.010511 0.024024 0.034749 0.019091 0.026598 0.020163 0.864865