In this notebook file, we will give a detail analysis of trajectories extracted from trajectory_construction
notebook file.
The analysis is based on two files trajectory_photos.csv
and trajectory_stats.csv
. The first file keeps the detail information about each point(photo/video) in each trajectory, and the second file keeps the detail statistics about each trajectory.
And at the end of the notebook, we will show how to generate KML data which helps to plot trajectories on various map services (e.g. Google Map), and show some example of interesting trajectories on the Google Map.
Before analyzing, we need to include some libraries.
In [1]:
%matplotlib inline
import os
import matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
today = pd.datetime.strftime(pd.datetime.today(),'%Y%m%d')
Pandas provide various data analysis tools. We will load two trajectory table files using Pandas library.
In [2]:
# read data and convert timestamps
data_dir = '../data/'
photo_table = os.path.join(data_dir, 'trajectory_photos.csv')
traj_table = os.path.join(data_dir, 'trajectory_stats.csv')
traj = pd.read_csv(photo_table, delimiter=',', parse_dates=[3], skipinitialspace=True)
traj_stats = pd.read_csv(traj_table, delimiter=',', parse_dates=[3], skipinitialspace=True)
The first table is from trajectory-photos.csv
file.
Here's five sample entries from the trajectory table. Each entry of trajectory table corresponds to single photo/video.
The table consists of following attributes(columns):
Trajectory_ID
: trajectory ID of entry (multiple entries belong to the same trajectory will have the same trajectory ID)Photo_ID
: Unique Photo ID of entryUser_ID
: User IDTimestamp
: Timestamp of when the photo was takenLongitude
: Longitude of entry Latitude
: Latitude of entryAccuracy
: GPS Accuracy level (16 - the most accurate, 1 - the least accurate)Marker
: 0 if the entry is photo, 1 if the entry is videoURL
: flickr URL to the entry
In [3]:
traj.head()
Out[3]:
The second table is about statistics for each trajectory. Each entry of this table corresponds to single trajectory.
This table consists of following attributes(columns):
Trajectory_ID
: Unique trajectory IDUser_ID
: User ID#Photo
: Number of photos in the trajectoryStart_Time
: When the first photo was takenTravel_Distance(km)
: Sum of the distances between consecutive photos (Euclidean Distance)Total_Time(min)
: The time gap between the first photo and the last photoAverage_Speed(km/h)
: Travel_Distances(km)/Total_Time(h)
In [4]:
traj_stats.head()
Out[4]:
In [5]:
num_user = traj_stats['User_ID'].unique().size
num_traj = traj_stats['Trajectory_ID'].unique().size
avg_traj_per_user = num_traj/num_user
print('# users :', num_user)
print('# trajectories :', num_traj)
print('Average trajectories per user :', avg_traj_per_user)
In [6]:
basic_stats = pd.DataFrame([traj_stats.min(), traj_stats.max(), traj_stats.median(), traj_stats.mean()], \
index=['min','max', 'median', 'mean'])
basic_stats.drop('Start_Time', axis=1, inplace=True)
basic_stats.drop('Trajectory_ID', axis=1, inplace=True)
basic_stats.drop('User_ID', axis=1, inplace=True)
basic_stats
Out[6]:
Here's the bar chart that plots the number of photos taken by each year.
In [7]:
yeardict = dict()
for i in traj.index:
dt = traj.ix[i]['Timestamp']
if dt.year not in yeardict: yeardict[dt.year] = 1
else: yeardict[dt.year] += 1
In [8]:
plt.figure(figsize=[9, 5])
plt.xlabel('Year')
plt.ylabel('#Photos')
X = list(sorted(yeardict.keys()))
Y = [yeardict[x] for x in X]
plt.bar(X, Y, width=1)
Out[8]:
The original dataset provide the accuracy of geo-tag for each photo. Accuracy of 16 is the most accurate and the accuracy of 1 is the least accurate. (The default accuracy is set to 16 by Flickr when the user does not provide any accuracy information about photo.)
Here's the Description about accuracy in Tumblr API
In [9]:
print("Number and % of points at max accuracy (16)", len(traj[traj['Accuracy']==16]), \
1.*len(traj[traj['Accuracy']==16])/len(traj))
print("Number and % of points at accuracy >=12", len(traj[traj['Accuracy']>=11]), \
1.*len(traj[traj['Accuracy']>=11])/len(traj))
ax1 = plt.figure(figsize=[10,3]).add_subplot(111)
traj.hist(column=['Accuracy'], bins=15, ax=ax1)
Out[9]:
In [10]:
ax1 = plt.figure(figsize=[10,3]).add_subplot(111)
traj_stats.hist(column='Total_Time(min)', bins=50, ax=ax1)
Out[10]:
In [11]:
ax1 = plt.figure(figsize=[10,3]).add_subplot(111)
traj_stats.hist(column='Travel_Distance(km)', bins=50, ax=ax1)
Out[11]:
In [12]:
ax1 = plt.figure(figsize=[10,3]).add_subplot(111)
traj_stats.hist(column='Average_Speed(km/h)', bins=50, ax=ax1)
Out[12]:
KML file is a useful tool to visualize trajectories on commercial map services such as Google Map.
In this section, we provide how to generate KML files for visualize trajectories on the map of Melbourne.
We implemented KML file generator in traj_visualise.py
file. So let's first import that file.
In [13]:
import traj_visualise # for visualization on map
traj_visualise.gen_kml
function takes list of trajectories as an input and generate KML files for the list of trajectories.
def gen_kml(fname, traj_data, traj_stats, traj_id_list, traj_name_list=None)
fname
: output file pathtraj_data
: Trajectory tabletraj_stats
: Trajectory stat tabletraj_id_list
: List of trajectory IDstraj_name_list
: List of names for each trajectory in traj_id_list
Upload the generated KML files to my Google map helps navigating trajectories over map.
In this section, we show statistics of some extreme cases in our dataset and plot these trajectories on Google map.
Trajectory with the most number of photos: Link to Google Map
In [14]:
mostphoto_idx = traj_stats['#Photo'].idxmax()
mostphoto_traj_id = traj_stats.ix[mostphoto_idx].Trajectory_ID
output_file = '../data/most_photos.kml'
traj_visualise.gen_kml(output_file, traj, traj_stats, [mostphoto_traj_id], ['The most number of photos'])
traj_stats.ix[mostphoto_idx]
Out[14]:
Trajectory with the longest travel time: Link to Google Map
In [15]:
time_idx = traj_stats['Total_Time(min)'].idxmax()
time_traj_id = traj_stats.ix[time_idx].Trajectory_ID
output_file = '../data/longest_time.kml'
traj_visualise.gen_kml(output_file, traj, traj_stats, [time_traj_id], ['The longest travel time'])
traj_stats.ix[time_idx]
Out[15]:
Longest travel distance: Link to Google Map
In [16]:
longest_idx = traj_stats['Travel_Distance(km)'].idxmax()
longest_traj_id = traj_stats.ix[longest_idx].Trajectory_ID
output_file = '../data/longest.kml'
traj_visualise.gen_kml(output_file, traj, traj_stats, [longest_traj_id], ['longest_traj'])
traj_stats.ix[longest_idx]
Out[16]:
Fastest trajectory: Link to Google Map
In [17]:
fastest_idx = traj_stats['Average_Speed(km/h)'].idxmax()
fastest_traj_id = traj_stats.ix[fastest_idx].Trajectory_ID
output_file = '../data/fastest.kml'
traj_visualise.gen_kml(output_file, traj, traj_stats, [fastest_traj_id], ['fastest_traj'])
traj_stats.ix[fastest_idx]
Out[17]: