Process Melbourne Data using Methods presented in IJCAI15 Paper

NOTE: Please view this page via IPython Notebook Viewer Service, otherwise the within-page links may not work properly.

1. Dataset

Photos:

The photos were selected from YFCC100M dataset, while Melbourne's Geo-Coordinates is 37°48′49″S 144°57′47″E, data in a square, from (39.5S, 140.9E) to (35.5S, 148.5E) with accuracy = 16 are used, the total number of photos is 87,362.

POIs:

POIs are from OpenStreeMap, e.g. downloading data from one of these mirrors


In [ ]:
$ wget ftp://ftp.spline.de/pub/openstreetmap/pbf/planet-latest.osm.pbf

clipping a bounding box of Melbourne: [140.9,-38.7, 148.5, -35.5], e.g. clipping using one of these tools


In [ ]:
$ osmconvert planet-latest.osm.pbf -b=140.9,-38.7,148.5,-35.5 -o=melbourne.osm

then filtering interested POI tags (described in the table below) from the tag list, the total number of POIs is 3360.

Python scripts for filtering POI tags from the clipped data using this python library is filter_node.py, e.g.


In [ ]:
python2 filter_node.py Melb_tags.list

file Melb_tags.list is available here.

keyvalues
```amenity``````college, library, school, university, arts_centre, cinema, fountain, planetarium, theatre, clock, place_of_worship, ranger_station, townhall```
```building``````farm, cathedral, chapel, church, mosque, temple, synagogue, shrine, school, stadium, university, bridge```
```geological``````_ALL_```(indicating all values)
```historic``````_ALL_``` (indicating all values)
```leisure``````garden, nature_reserve, park, pitch, sports_centre, stadium, swimming_area, track, wildlife_hide```
```man_made``````beacon, breakwater, bridge, communications_tower, embankment, dyke, groyne, lighthouse, pier, tower, windmill```
```natural``````_ALL_``` (indicating all values)
```tourism``````attraction, artwork, gallery, museum, picnic_site, theme_park, viewpoint, zoo```
```waterway``````river, riverbank, stream, dam, weir, waterfall```

1.1 Simple Facts

Some simple facts of Melbourne data as well as data of four other cities used in ijcai15 paper are summaried in the table below.

City ΔLongtitude (degree) ΔLatitude (degree) #POIs #Users #POI_Visits #Travel_Sequences Min_Distance_between_POI (km) Max_Distance_between_POI (km)
Edinburgh0.250.08281,45433,9445,0280.08816.354
Toronto0.280.20291,39539,4196,0570.14729.655
Glasgow0.390.372760111,4342,2270.18245.344
Osaka4.341.07274507,7471,1150.216410.46
Melbourne6.842.812701,30644,74810,5992.01616.80

The distribution of sequence length for each city was shown below.

City #Length 1 #Length 2 #Length 3 #Length 4 #Length 5 #Length 6 #Length 7 #Length 8 #Length 9 #Length 10 #Length 11 #Length 12 #Length 13
Edinburgh 36167783001467648 301575052
Toronto 508064221660339 9421001
Glasgow 1876239772010221 00000
Osaka 9291393277100 00000
Melbourne 9817672812242 1000000

1.3 Photo Scatter Plot

Photo Scatter Plot for Edinburgh | Glasgow | Osaka | Toronto | Melbourne

1.4 Issues & Solutions

Q: Picking POIs is a somewhat hard task

POIs picked manually according to photo scatter plot are much better than the results of k-means clustering/kernel density estimation, but still not good enough

A: With the help of OpenStreeMap and NationalMap/Google Maps, it would be much easier to select and visualize POIs.

Further processing the POI data:

  1. filtering the list of POIs further, such as POIs too close, POIs with too few photos etc. (how to?)
  2. classify/label each POI with assistance of online maps and its original tag. (manually?)

NOTES:

  • Both NationalMap and GoogleMaps support headers in the first line, specify longitude and latitude in the first line when using NationalMap
  • Satellite images from GoogleMaps are better while NationalMap does not restrict the number of marks (GoogleMaps restrict <= 2000 marks each layer), marks on NationalMap will float around sometimes when dragging the map

Q: How to deal with POIs that are too close, e.g. 0-10m?

A:

Q: POIs are generally associated with multiple labels

how to define these labels? how to label each POI?

A:

Q: Assign photo to a POI

if their distance is less than 200m according to paper seems not to be a good idea, as

  • if POI is something not large, e.g. buildings, 200m seems OK
  • if POI is something large, e.g. natural park, 1-2km seems to be a reasonable distance
  • but, we don't know the type of POI when assigning as picking POI and assigning photos are being done at the same time

A: Assign a photo to the nearest POI if the distance between the two is less than, say 500m?

Q: Travel sequences independence assumption seems to be implausible

Users' travel sequences are generated by splitting travel history of users if their consecutive POI visits occur more than 8 hours, while a common travelling spans several days, which could be represented by several travel sequences with dependence (e.g. user preference patterns: beach-park-shopping, beach-beach-shopping etc.)

A:

2. Results

2.1 Precision, Recall and F1-score

Settings: Melbourne, $\eta$=0.5 with time-based user interest and POI popularity, 28/110 ≈ 25.5% solutions are suboptimal, leave-one-out

RecallPrecisionF1-score
0.735±0.1770.735±0.1770.735±0.177
Value(Recall/Precision/F1-score)1.00.750.670.600.570.500.40
Frequency30/1107/11054/1102/1101/11014/1102/110

Box plot of Recall, Precision and F1-score

Settings: Melbourne, $\eta$=0.0 with POI popularity only, 29/110 ≈ 26.4% solutions are suboptimal, leave-one-out

RecallPrecisionF1-score
0.732±0.1760.732±0.1760.732±0.176
Value(Recall/Precision/F1-score)1.00.830.750.670.600.570.500.40
Frequency29/1101/1106/11055/1102/1101/11014/1102/110

Box plot of Recall, Precision and F1-score

2.2 Transition Matrix

Transition matrix for recommended sequences, $\eta$ = 0.5:

BeachCulturalEducationForestLeisureManMadeNaturalParkReligionShoppingWaterBody
Beach0.1760.0200.0780.0000.0200.0000.0000.1960.2160.2940.000
Cultural0.4290.1430.0000.0000.1430.0000.0000.0000.1430.1430.000
Education0.3530.0000.1180.0000.0000.0000.0000.0590.1760.2940.000
Forest0.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Leisure0.1000.0000.0000.0000.1000.0000.1000.0000.1000.6000.000
ManMade0.0000.0001.0000.0000.0000.0000.0000.0000.0000.0000.000
Natural0.0000.0000.0000.0000.0000.0000.0000.0000.0001.0000.000
Park0.3160.0000.0260.0000.0790.0000.0000.0530.1320.3950.000
Religion0.2400.1600.0000.0400.0400.0400.0000.0800.1200.2800.000
Shopping0.1860.0100.0390.0100.0590.0000.0100.1670.0690.4510.000
WaterBody0.1430.1430.2860.1430.0000.0000.0000.1430.0000.1430.000

Transition matrix for recommended sequences, $\eta$ = 0.0:

BeachCulturalEducationForestLeisureManMadeNaturalParkReligionShoppingWaterBody
Beach0.1430.0200.0610.0000.0200.0000.0000.2650.2040.2860.000
Cultural0.5710.1430.0000.0000.1430.0000.0000.0000.1430.0000.000
Education0.2780.0000.0560.0000.0560.0000.0000.0560.1670.3890.000
Forest0.0000.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Leisure0.0000.0000.0000.0000.1250.0000.0620.1250.0620.6250.000
ManMade0.0000.0000.3330.3330.0000.0000.0000.3330.0000.0000.000
Natural0.0000.0000.0000.0000.0000.0000.0000.0000.0001.0000.000
Park0.3420.0000.1050.0000.1050.0260.0000.0260.0530.3160.026
Religion0.1740.1740.0000.0430.0430.0430.0000.0870.1740.2610.000
Shopping0.2190.0100.0420.0000.0940.0000.0100.1150.0830.4270.000
WaterBody0.1250.1250.2500.1250.0000.1250.0000.2500.0000.0000.000

Transition matrix for actual sequences:

BeachCulturalEducationForestLeisureManMadeNaturalParkReligionShoppingWaterBody
Beach0.4540.0080.0080.0000.0420.0000.0000.0420.0920.3530.000
Cultural0.0270.0270.0540.0000.1080.0000.0000.1890.1890.3780.027
Education0.0380.0000.1700.0000.0570.0000.0000.1130.0380.5280.057
Forest0.0000.1000.0000.5000.0000.0000.1000.1000.0000.0000.200
Leisure0.0450.1640.0600.0000.1190.0000.0150.0750.0600.4480.015
ManMade0.0000.0000.0000.0000.0000.0000.0000.0000.5000.5000.000
Natural0.0000.0000.0910.0000.0000.0000.8180.0910.0000.0000.000
Park0.0310.0870.0940.0080.0550.0000.0000.0390.0550.6140.016
Religion0.0910.0730.0730.0180.1270.0000.0000.0730.0730.4360.036
Shopping0.1370.0450.0330.0020.0730.0000.0000.1040.0430.5450.017
WaterBody0.1720.0000.1380.1720.0690.0000.0000.0690.0000.3450.034