Find similar tags

Prepare

  • Download pre-computed tag occurrence and tag co-occurrence data

    cd $HOME/VisualSearch
    wget http://lixirong.net/data/csur2016/train1m-tagfreq.tar.gz
    tar xzf train1m-tagfreq.tar.gz
  • Setup the Jingwei environment

    source ~/github/jingwei/start.sh

In [1]:
from util.tagsim.flickr_similarity import FlickrContextSim
fcs = FlickrContextSim('train1m')


[tagsim.flickr_similarity.FlickrContextSim] 1198818 images, 347369 users, 62862 tags
[tagsim.flickr_similarity.FlickrContextSim] 4552497 tag pairs

In [2]:
# compute similarity for a given pair of tags
print fcs.compute('boeing', 'airplane'), fcs.compute('boeing', 'flight'), fcs.compute('boeing', 'street')


0.719895795351 0.595356835123 0.210904755059

In [3]:
# find similar tags in a given vocabulary
vob = fcs.vob
query_tag = 'boeing'
tagscores = [(tag, fcs.compute(tag, query_tag)) for tag in vob]
tagscores.sort(key=lambda v:v[1], reverse=True)
# show the top 10 ranked tags
print tagscores[:10]


[('boeing', 1.0), ('airliner', 0.8043727834468062), ('747', 0.7838391722717928), ('airline', 0.7763174382609282), ('737', 0.7659568878683966), ('jet', 0.7334135162779946), ('aviation', 0.7331368438285586), ('aircraft', 0.7311557033886409), ('jetliner', 0.7290937741145928), ('airplane', 0.719895795350634)]