Notebook Name: AppendMicrosoftAIData

Author: Sreejith Menon (smenon8@uic.edu)

General Description:

Microsoft Image Tagging API generates a bag of words that can be used to describe a image.
Think of it as, the words (nouns) you will use to describe the image to a person who cannot see the image. Each word that is returned has an associated confidence associated with the prediction. Tags with low confidence will be not considered(or ignored). For the purpose of experiment 2, the confidence level has hardcoded to 0.5.

This notebook has code that will take the API data which has been already parsed into a JSON file and joins it with the share proportion results from Amazon Mechanical Turk albums.

The idea is to check if occurence of a certain word influnces the share rate in any way.


In [3]:
import csv
import json
import JobsMapResultsFilesToContainerObjs as ImageMap
import DeriveFinalResultSet as drs
import DataStructsHelperAPI as DS
import importlib
import pandas as pd
import htmltag as HT
from collections import OrderedDict
#import matplotlib.pyplot as plt
import plotly.plotly as py
import cufflinks as cf # this is necessary to link pandas to plotly
cf.go_online()
flName = "../data/All_Zebra_Count_Tag_Output_Results.txt"
pd.set_option('display.max_colwidth', -1)
imgAlbumDict = ImageMap.genImgAlbumDictFromMap(drs.imgJobMap)
master = ImageMap.createResultDict(1,100)
imgShareNotShareList,noResponse = ImageMap.imgShareCountsPerAlbum(imgAlbumDict,master)
importlib.reload(ImageMap)
importlib.reload(DS)


Out[3]:
<module 'DataStructsHelperAPI' from '/Users/sreejithmenon/Google Drive/Project/AnimalPhotoBias/script/DataStructsHelperAPI.py'>

Rank list of images by share rates with Microsoft Image Tagging API output

Block of code for building rank list of images shared in the descending order of their share rates Appended with Microsoft Image Tagging API results

The output is a rank list of all the images by their share rates along with the tags against every image. There is a capability to display the actual images as well alongside the rank-list.

Known issue - The '<' and '>' characters in the HTML tags in URL are often intepreted as is. Future - make sure to add escape logic for these characters in HTML tags. There are opportunities to convert some of these code blocks into methods.


In [23]:
header,rnkFlLst = DS.genlstTupFrmCsv("../FinalResults/rankListImages_expt2.csv")
rnkListDf = pd.DataFrame(rnkFlLst,columns=header)
rnkListDf['Proportion'] = rnkListDf['Proportion'].astype('float')
rnkListDf.sort_values(by="Proportion",ascending=False,inplace=True)

# create an overall giant csv
gidFtrs = ImageMap.genMSAIDataHighConfidenceTags("../data/GZC_data_tagged.json",0.5)
        
gidFtrsLst = DS.cnvrtDictToLstTup(gidFtrs)
df = pd.DataFrame(gidFtrsLst,columns=['GID','tags'])

shrPropsTags = pd.merge(rnkListDf,df,left_on='GID',right_on='GID')

# shrPropsTags.to_csv("../FinalResults/resultsExpt2RankList_Tags.csv",index=False)
shrPropsTags['URL'] = '<img src = "https://socialmediabias.blob.core.windows.net/wildlifephotos/All_Zebra_Count_Images/' + shrPropsTags['GID'] + '.jpeg" width = "350">'

shrPropsTags.sort_values(by=['Proportion','GID'],ascending=False,inplace=True)
fullFl = HT.html(HT.body(HT.HTML(shrPropsTags.to_html(bold_rows = False,index=False))))

fullFl
# outputFile = open("../FinalResults/resultsExpt2RankList_Tags.html","w")
# outputFile.write(fullFl)
# outputFile.close()

Generate rank list of tags by share rate.


In [24]:
tgsShrNoShrCount = {}
for lst in rnkFlLst:
    tgs = gidFtrs[lst[0]]
    tmpDict = {'share': int(lst[1]), 'not_share': int(lst[2]), 'total' : int(lst[3])}
    for tag in tgs:
        oldDict ={}
        oldDict =  tgsShrNoShrCount.get(tag,{'share' : 0,'not_share' : 0,'total' : 0})
        oldDict['share'] = oldDict.get('share',0) + tmpDict['share']
        oldDict['not_share'] = oldDict.get('not_share',0) + tmpDict['not_share']
        oldDict['total'] = oldDict.get('total',0) + tmpDict['total']

        tgsShrNoShrCount[tag] = oldDict

In [5]:
## Append data into data frames and build visualizations
tgsShrCntDf = pd.DataFrame(tgsShrNoShrCount).transpose()
tgsShrCntDf['proportion'] = tgsShrCntDf['share'] * 100 / tgsShrCntDf['total']
tgsShrCntDf.sort_values(by=['proportion','share'],ascending=False,inplace=True)
tgsShrCntDf = tgsShrCntDf[['share','not_share','total','proportion']]
tgsShrCntDf.to_csv("../FinalResults/RankListTags.csv")

fullFl = HT.html(HT.body(HT.HTML(tgsShrCntDf.to_html(bold_rows = False))))

outputFile = open("../FinalResults/RankListTags.html","w")
outputFile.write(fullFl)
outputFile.close()

In [20]:
iFrameBlock = []
fig = tgsShrCntDf['proportion'].iplot(kind='line',filename="All_Tags",title="Distribution of Tags")
iFrameBlock.append(fig.embed_code)
#plt.savefig("../FinalResults/RankListTags.png",bbox_inches='tight')

In [4]:
gidFtrs = ImageMap.genMSAIDataHighConfidenceTags("../data/GZC_data_tagged.json",0.5)
        
gidFtrsLst = DS.cnvrtDictToLstTup(gidFtrs)
df = pd.DataFrame(gidFtrsLst,columns=['GID','tags'])

In [5]:
df


Out[5]:
GID tags
0 8737 [grass, zebra, outdoor, animal, field, mammal, standing, grassy]
1 4373 [person, window]
2 2162 [giraffe, outdoor, grass, tree, animal, reptile, lizard, field, grassy]
3 3110 [grass, outdoor, zebra, sky, field, animal, mammal, standing, grassy]
4 7930 [grass, zebra, animal, outdoor, mammal, field, grassy]
5 3382 [grass, zebra, outdoor, field, animal, mammal, standing, grassy]
6 5967 [bus]
7 5638 [rock, wall, tree, stone, outdoor, rocky]
8 4259 [grass, zebra, outdoor, animal, field, mammal, standing, open, grassy]
9 7817 [grass, outdoor, sky, zebra, tree, field, herd, animal, mammal, group, grassy]
10 17 [grass, outdoor, sky, animal, mammal, field, giraffe, grassy, tall, open]
11 5053 [outdoor, grass, giraffe, tree, animal, sky, mammal, field, tall, dry]
12 4957 [zebra, outdoor, sky, grass, animal, mammal, ground, standing, field, group]
13 5902 [white, transport]
14 5449 [grass, outdoor, sky, field, nature, grassy, open, hill]
15 7115 [grass, outdoor, zebra, field, sky, animal, tree, mammal, tall, standing, grassy]
16 7764 [grass, zebra, outdoor, animal, sky, mammal, field, standing, dry]
17 6516 [grass, zebra, outdoor, sky, field, animal, mammal, herd, grassy, tall, group, open]
18 7098 [grass, outdoor, sky, tree, field, animal, mammal, dry, grassy, group]
19 7452 [grass, zebra, outdoor, animal, field, mammal, standing, dry, adult, grassy]
20 9205 [grass, outdoor, sky, field, animal, standing, group, mammal, grassy, herd, wild]
21 591 [grass, zebra, outdoor, tree, animal, mammal, field, standing]
22 7502 [grass, outdoor, sky, animal, field, mammal, brown, tall, grassy, standing, dry]
23 8791 [outdoor, grass, zebra, sky, animal, mammal, field, standing, grassy]
24 4451 [sky, outdoor, grass, field]
25 4286 [zebra, grass, outdoor, sky, animal, field, group, mammal, standing, herd]
26 4172 [outdoor, grass, sky, field, animal, mammal, open, grassy]
27 1479 [grass, zebra, outdoor, mammal, animal, field, grassy, group]
28 8663 [grass, zebra, outdoor, animal, field, mammal, standing, dry, tall, grassy]
29 7343 [zebra, outdoor, grass, animal, mammal, field, dry]
... ... ...
9363 7771 [grass, outdoor, zebra, sky, animal, field, mammal, standing]
9364 8793 [zebra, outdoor, animal, water, mammal, standing]
9365 6868 [outdoor, sky, car, tree, road, parked, parking]
9366 1110 [grass, outdoor, animal, field, mammal, elephant]
9367 7585 [grass, outdoor, sky, field, tall, grassy]
9368 8709 [grass, outdoor, zebra, animal, mammal, field, standing, brown, grassy, tall, dry, open]
9369 5856 [person, people]
9370 787 [grass, outdoor, field, animal, mammal, standing, zebra, tall, grassy, open]
9371 2408 [zebra, grass, outdoor, animal, mammal, field, ground, standing]
9372 8992 [grass, outdoor, zebra, animal, field, mammal, standing, tall, group, grassy]
9373 8606 [zebra, grass, outdoor, animal, mammal, field, standing, dry]
9374 3438 [grass, outdoor, animal, giraffe, mammal, sky, field, standing, green, grassy]
9375 9351 [grass, outdoor, sky, field, animal, mammal, giraffe, grassy]
9376 6013 [outdoor, ground, tree, sky]
9377 977 [outdoor, tree, grass, sky, mammal, animal, zebra, path, dirt]
9378 3756 [grass, outdoor, elephant, field, grassy, wild]
9379 2291 [grass, zebra, outdoor, sky, field, animal, mammal, open, group, plain, grassy]
9380 6124 [grass, outdoor, animal, giraffe, tree, mammal, field, standing, grassy]
9381 4550 [sky, outdoor, grass, field, open, mammal]
9382 6877 [outdoor, sky, car, road]
9383 4979 [grass, outdoor, zebra, tree, field, animal, mammal, antelope, tall, wild, standing, herd, brown, bushes]
9384 628 [grass, outdoor, sky, field, tall, brown, dry, standing, grassy]
9385 1763 [grass, zebra, outdoor, animal, mammal, field, standing, dry, grassy]
9386 7408 [grass, outdoor, animal, field, mammal, sky, herd, deer, wild, group, grassy, open]
9387 9370 [outdoor, tree, green, plant]
9388 6067 [person]
9389 8126 [grass, outdoor, zebra, field, animal, mammal, standing, dry, group, open]
9390 2663 [zebra, grass, outdoor, sky, field, mammal, animal, standing, tall, dry, group, brown, grassy]
9391 3312 [grass, zebra, outdoor, animal, field, mammal, standing, grassy]
9392 7564 [grass, outdoor, sky, field, open]

9393 rows × 2 columns


In [ ]: