Instructions

API Tokens and secrets.py

Add a file named secrets.py to the same directory as this notebook with a list of Instagram API Tokens, as follows:

    TOKENS = [
        '123',
        '456',
        '789'
    ]

In [3]:
import os
import string
from datetime import datetime, date, timedelta
import unicodedata

import pymongo

from instagram.client import InstagramAPI
from instagram.bind import InstagramAPIError

from nltk.corpus import stopwords
from nltk.metrics import edit_distance
from nltk.corpus import wordnet as wn
from gensim import corpora, models, similarities


%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from mongo_helper import (
    CLIENT, 
    DB, 
    TRANS_COLLECTION,
    USER_PAIRS_COLLECTION,
    VENMO_INSTAGRAM_MATCHES
)

from instagram_helper import (
    InstagramAPICycler, 
    get_all_paginated_data, 
    instagram_media_to_dict
)
from insta_query import query

from secrets import TOKENS

SLEEP_SECONDS = 60*60
API_CYCLER = InstagramAPICycler(TOKENS)

HEAVY_USER_THRESHOLD = 30

AFTER_CUTOFF_DATE = date(2015, 3, 1)

HOURS_RADIUS = 24

VENMO_DATE_FORMAT_STR = '%Y-%m-%dT%H:%M:%SZ'

STOPLIST = frozenset(stopwords.words('english'))

GRAPHS_PATH = os.path.join(os.getcwd(), 'graphs')
if not os.path.exists(GRAPHS_PATH):
    os.mkdir(GRAPHS_PATH)

Populate MongoDB with the Venmo-Instagram user matches


In [4]:
query(TOKENS, HEAVY_USER_THRESHOLD, repopulate=True)  # Run this to repopulate the user matches collection
# query(TOKENS, HEAVY_USER_THRESHOLD)


------RE-POPULATING MONGODB COLLECTION user_pairs------
Dropped existing user_pairs collection. Completed in 0.005010 seconds.
Completed aggregate query of trans collection. Completed in 147.505237 seconds.
------FINDING VENMO USERS WITH >30 TARGET USERS------
Completed aggregate query of user_pairs collection. Completed in 5.753324 seconds.

Venmo user ibotta transacted with 2054 other users
Venmo user marakimb transacted with 234 other users
Venmo user laurengil transacted with 108 other users
Venmo user Ben-Delhoum transacted with 108 other users
Venmo user zacharydewitt transacted with 111 other users
Venmo user Ben-Delhoum transacted with 108 other users
Venmo user MoveLootInc transacted with 18 other users
Venmo user Nguyen-Vu transacted with 102 other users
Venmo user ShoutInc transacted with 84 other users
Venmo user Indo-DVC transacted with 81 other users
Venmo user Texas-Darlins transacted with 78 other users
Venmo user Jon-Zelin transacted with 71 other users
Venmo user Shayna-Fader transacted with 69 other users
Venmo user JotAtEmory transacted with 69 other users
Venmo user GRUDental-2018 transacted with 68 other users
Venmo user David-Muoser transacted with 67 other users
Venmo user Jennifer-Henderson-1 transacted with 64 other users
Venmo user BlakeRuzich transacted with 2 other users
Venmo user jmudsp transacted with 58 other users
Venmo user fwedeorange transacted with 58 other users
Venmo user Jhameson-Ko transacted with 58 other users
Venmo user acelboncel transacted with 57 other users
Venmo user Hannah-Becker-1 transacted with 2 other users
Venmo user ErikBurbulla transacted with 56 other users
Venmo user Charlotte-Desplan transacted with 56 other users
Venmo user Hayden-Schenker transacted with 55 other users
Venmo user Sergio-Becerra transacted with 53 other users
Venmo user Jared-Robins-1 transacted with 53 other users
Venmo user Dylan-Slinger transacted with 53 other users
Venmo user Miki-Huber transacted with 52 other users
Venmo user Maria-Martinez-15 transacted with 52 other users
Venmo user Dimitrius-Chesne transacted with 52 other users
Venmo user Jonah-Mann transacted with 51 other users
Venmo user lexsilv transacted with 3 other users
Venmo user annawuyang transacted with 51 other users
Venmo user scorina transacted with 17 other users
Venmo user alessandralochen transacted with 49 other users
Venmo user Mayvenn transacted with 49 other users
Venmo user Josh-Levy-9 transacted with 2 other users
Venmo user kathy-z transacted with 48 other users
Venmo user harvardballroom transacted with 47 other users
Venmo user Matthew-Dunn transacted with 47 other users
Venmo user StartupPrinceton transacted with 47 other users
Venmo user John-Sopcisak transacted with 8 other users
Venmo user Sam-Rosenberg-5 transacted with 46 other users
Venmo user danieljacobs transacted with 46 other users
Venmo user AIChE-PennChapter transacted with 46 other users
Venmo user Shayna-Fertig transacted with 45 other users
Venmo user Matt-Segal transacted with 45 other users
Venmo user Margot-Byrne transacted with 1 other users
Venmo user Jake-Levine-5 transacted with 4 other users
Venmo user annejoan transacted with 44 other users
Venmo user Roshan-Ray transacted with 44 other users
Venmo user PiKapp-EtaNu transacted with 44 other users
Venmo user Michael-Rubin-7 transacted with 4 other users
Venmo user Kevin-Park-7 transacted with 44 other users
Venmo user Hayden-Goldberg transacted with 44 other users
Venmo user APO-Chi transacted with 44 other users
Venmo user Eli-Scheinholtz transacted with 1 other users
Venmo user lmaocatherine transacted with 1 other users
Venmo user AsherJZlotnik transacted with 16 other users
Venmo user Jody-Lie transacted with 12 other users
Venmo user George-Ingber transacted with 43 other users
Venmo user IsaacLin transacted with 43 other users
Venmo user Anand-Lakshminarayan transacted with 43 other users
Venmo user rudrapuri transacted with 42 other users
Venmo user bryansilf transacted with 17 other users
Venmo user SM-Dipali transacted with 42 other users
Venmo user Max-Och transacted with 42 other users
Venmo user tgbennett55 transacted with 42 other users
Venmo user Patrick-Maiden transacted with 1 other users
Venmo user AKPsi-Phi-Michigan transacted with 42 other users
Venmo user Andrew-Lay transacted with 42 other users
Venmo user Avery-McCann transacted with 42 other users
Venmo user Isaac-Nilsson transacted with 42 other users
Venmo user Tommy-Shott transacted with 41 other users
Venmo user graceliu transacted with 41 other users
Venmo user Brian-Zeoli transacted with 41 other users
Venmo user Angle-Skelly transacted with 41 other users
Venmo user StuartHarmon transacted with 40 other users
Venmo user Ari-Bender transacted with 40 other users
Venmo user jeffgrimes9 transacted with 40 other users
Venmo user Joe_Haber transacted with 40 other users
Venmo user Elizabeth-Kim-2 transacted with 40 other users
Venmo user Jonah-Hessels transacted with 40 other users
Venmo user viviguo transacted with 39 other users
Venmo user JohnMcDermott transacted with 39 other users
Venmo user Matt-Sant-Miller transacted with 39 other users
Venmo user RyanBasch transacted with 39 other users
Venmo user Victor-Vandekerckhove transacted with 39 other users
Venmo user Aziz-Maredia transacted with 39 other users
Venmo user kevinlie transacted with 39 other users
Venmo user Karen-Sun transacted with 39 other users
Venmo user ConnorMcCollough transacted with 39 other users
Venmo user Andre-Mangkuningrat transacted with 39 other users
Venmo user AllyLowery transacted with 39 other users
Venmo user Sam-Selig transacted with 39 other users
Venmo user Alexander-Wing transacted with 39 other users
Venmo user Alex-Grover transacted with 39 other users
Venmo user Jonah-Schiller transacted with 38 other users
Venmo user Cannon-Karns transacted with 38 other users
Venmo user Catalina-Bermudez transacted with 38 other users
Venmo user Justine-Harrison transacted with 38 other users
Venmo user Arjun-Dundoo transacted with 38 other users
Venmo user Riley-Healey transacted with 38 other users
Venmo user Jamie-Wilkie transacted with 38 other users
Venmo user OscarBromberg transacted with 38 other users
Venmo user James-Wreschner transacted with 38 other users
Venmo user Hayden-Freedman transacted with 3 other users
Venmo user Leah-Kornberg transacted with 5 other users
Venmo user Drew-Kirchhofer transacted with 38 other users
Venmo user Anhandoff transacted with 38 other users
Venmo user David-Caruso transacted with 38 other users
Venmo user Nick-Shee transacted with 38 other users
Venmo user Zachary-Marcus transacted with 37 other users
Venmo user Hoya-Stranger transacted with 5 other users
Venmo user JonKim transacted with 37 other users
Venmo user Sam-Yang-1 transacted with 37 other users
Venmo user claraw transacted with 37 other users
Venmo user Michelle-Vanessa transacted with 37 other users
Venmo user MattBaron transacted with 37 other users
Venmo user Mae-Wang transacted with 37 other users
Venmo user Justin-E-Hall transacted with 37 other users
Venmo user Fede-M transacted with 37 other users
Venmo user John-Cook-7 transacted with 37 other users
Venmo user Matt-Rife transacted with 37 other users
Venmo user uwdsp transacted with 37 other users
Venmo user Aaron-Gutierrez-1 transacted with 25 other users
Venmo user Devina-Unjoto transacted with 36 other users
Venmo user Teddy-Schneider transacted with 36 other users
Venmo user Sumeet-Singh-3 transacted with 36 other users
Venmo user Josh-Levey transacted with 36 other users
Venmo user Andrew-Reed-2 transacted with 36 other users
Venmo user Jeremy-Dickerson transacted with 11 other users
Venmo user Era-Kryzhanovskaya transacted with 36 other users
Venmo user Glen-Olsen transacted with 36 other users
Venmo user Anshum-Sood transacted with 36 other users
Venmo user Corey-Loman transacted with 36 other users
Venmo user Dillon-Stuart transacted with 36 other users
Venmo user mattbach1 transacted with 35 other users
Venmo user eortiz22 transacted with 35 other users
Venmo user Liam-Flynn-1 transacted with 35 other users
Venmo user Patrick-Sheehan transacted with 35 other users
Venmo user NaptimeZZ transacted with 35 other users
Venmo user roberthu transacted with 35 other users
Venmo user Michelle_Nie transacted with 35 other users
Venmo user Markus-Fjortoft transacted with 17 other users
Venmo user Annette-vanSwaay transacted with 35 other users
Venmo user Alan-Delaney transacted with 35 other users
Venmo user Ben-Gutman transacted with 35 other users
Venmo user Boulet transacted with 35 other users
Venmo user Eliza-Kagan-1 transacted with 35 other users
Venmo user philliptjipto transacted with 35 other users
Venmo user FDelgado transacted with 35 other users
Venmo user Adam-Gross-12 transacted with 7 other users
Venmo user Barret-Bender transacted with 34 other users
Venmo user Jesse-Mander transacted with 34 other users
Venmo user Zachary-Spiera transacted with 1 other users
Venmo user jdelpeon transacted with 34 other users
Venmo user Ryanluck3 transacted with 34 other users
Venmo user Ryan-Edmonson transacted with 34 other users
Venmo user Nicole-Hammons transacted with 34 other users
Venmo user Sheila-Deng transacted with 34 other users
Venmo user Scott-Ou transacted with 34 other users
Venmo user Aarian-Rahman transacted with 34 other users
Venmo user Mikael-Harseno transacted with 11 other users
Venmo user Samuel-Gooch transacted with 34 other users
Venmo user robbiek transacted with 34 other users
Venmo user mmaseda transacted with 3 other users
Venmo user emurray transacted with 34 other users
Venmo user Christopher-Teng transacted with 34 other users
Venmo user MattRudin transacted with 34 other users
Venmo user Brian-Pester transacted with 34 other users
Venmo user ConnorBrownell transacted with 34 other users
Venmo user Aron-Wegner transacted with 34 other users
Venmo user Tyler-Douglas-4 transacted with 12 other users
Venmo user kristopher-kwandy transacted with 33 other users
Venmo user Scott-Wurtzel transacted with 33 other users
Venmo user Scott-Greenberg-1 transacted with 33 other users
Venmo user bamsutler transacted with 33 other users
Venmo user Jordan-Elkins transacted with 3 other users
Venmo user Pratyusha-Gupta transacted with 33 other users
Venmo user HaniDaou transacted with 33 other users
Venmo user Philmon-Tanuri transacted with 22 other users
Venmo user Nathan-Fleetwood transacted with 33 other users
Venmo user Reika-Yoshino transacted with 33 other users
Venmo user Katrina-Simon transacted with 33 other users
Venmo user Stefan-Widjaja transacted with 33 other users
Venmo user Sunny-Shah-1 transacted with 33 other users
Venmo user Jordan-Pasetsky transacted with 33 other users
Venmo user Jessica-Phillips-12 transacted with 33 other users
Venmo user Jeremiah-P transacted with 4 other users
Venmo user Michael-Bryan-7 transacted with 8 other users
Venmo user Jake-Pozin transacted with 33 other users
Venmo user dizbick transacted with 33 other users
Venmo user DylanJohn-Gonzales transacted with 33 other users
Venmo user Laura-Nugent transacted with 33 other users
Venmo user venkyj transacted with 32 other users
Venmo user kyle1196 transacted with 32 other users
Venmo user howiebear transacted with 2 other users
Venmo user Chris-Kalamchi transacted with 32 other users
Venmo user Nikhila-Krishnan transacted with 32 other users
Venmo user Torin-DiSalvo transacted with 1 other users
Venmo user Spencer-Hagler transacted with 32 other users
Venmo user Sindhu-Pagadala transacted with 32 other users
Venmo user Troy-Kirwin transacted with 4 other users
Venmo user Nick-Shulder transacted with 32 other users
Venmo user jaser_rollins transacted with 32 other users
Venmo user Phil-Park-3 transacted with 32 other users
Venmo user seanchen transacted with 32 other users
Venmo user Kai-Xiao transacted with 32 other users
Venmo user Justin-Goldman-1 transacted with 32 other users
Venmo user Andy-Gifford transacted with 32 other users
Venmo user Mike-Profeta transacted with 32 other users
Venmo user Austin-Curtis-5 transacted with 32 other users
Venmo user Reed-Silverman transacted with 11 other users
Venmo user Griffin-Amdur transacted with 14 other users
Venmo user Genna-Mazzarella transacted with 32 other users
Venmo user Artie-Trotter transacted with 32 other users
Venmo user Jojochandra transacted with 32 other users
Venmo user Colin-Roth-1 transacted with 32 other users
Venmo user Austin-Hong-3 transacted with 1 other users
Venmo user Matt-Galetta transacted with 32 other users
Venmo user henbeazy transacted with 32 other users
Venmo user Ivan-Darmawan transacted with 32 other users
Venmo user Ari-Bleemer transacted with 32 other users
Venmo user Johnathan-Zeng transacted with 32 other users
Venmo user olin-seniors transacted with 31 other users
Venmo user Ross-Corday transacted with 31 other users
Venmo user PeterPacent transacted with 9 other users
Venmo user Paul-Pettas transacted with 31 other users
Venmo user Miles-Grofaorean transacted with 31 other users
Venmo user Andrew-Dickson1 transacted with 6 other users
Venmo user Ayo-Fagbemi transacted with 2 other users
Venmo user GarrettGriffithQuinn transacted with 31 other users
Venmo user Zachary-Miles-2 transacted with 31 other users
Venmo user Jason-Gottlieb transacted with 31 other users
Venmo user erabera transacted with 31 other users
Venmo user andresfarfan transacted with 31 other users
Venmo user R-Mehta transacted with 7 other users
Venmo user Ally-Garcia transacted with 31 other users
Venmo user Rathnam-Venkat transacted with 31 other users
Venmo user wtfmishison transacted with 31 other users
Venmo user Andrew-Starker transacted with 31 other users
Venmo user RaymondRobinson transacted with 31 other users
Venmo user Alex-Niebergall transacted with 31 other users
Venmo user Jacob-Schreiber-1 transacted with 31 other users
Venmo user lili transacted with 31 other users
Venmo user Ziyad-Knio transacted with 31 other users
Venmo user nictanjr transacted with 30 other users
Venmo user mkim7952 transacted with 30 other users
Venmo user giselaludiarto transacted with 30 other users
Venmo user cindylimsays transacted with 30 other users
Venmo user coletudor transacted with 30 other users
Venmo user Zach-Masserant transacted with 30 other users
Venmo user Winston-Budiman transacted with 30 other users
Venmo user Tina-Tian-1 transacted with 30 other users
Venmo user Sam-Bronstein transacted with 30 other users
Venmo user SEDS-UCSD transacted with 30 other users
Venmo user samfoggan transacted with 30 other users
Venmo user BeauHart235 transacted with 30 other users
Venmo user Zach-Groffsky transacted with 30 other users
Venmo user NoahSnyder transacted with 30 other users
Venmo user Perry-Goffner transacted with 30 other users
Venmo user Jave-Soetoyo transacted with 30 other users
Venmo user VincentCriscuolo transacted with 30 other users
Venmo user Simon-Amat transacted with 30 other users
Venmo user MattChalfant transacted with 6 other users
Venmo user Nick-Demko transacted with 2 other users
Venmo user Mike-Henni transacted with 30 other users
Venmo user NickDemkiw transacted with 19 other users
Venmo user Maxime-Cancre transacted with 30 other users
Venmo user MaidMyDay-CE transacted with 30 other users
Venmo user Leland-Chamlin transacted with 30 other users
Venmo user carinawijaya transacted with 30 other users
Venmo user Kaleb-Germinaro-1 transacted with 30 other users
Venmo user Cheryl-Liu transacted with 30 other users
Venmo user c_linarte transacted with 3 other users
Venmo user Alex-Hasslinger transacted with 30 other users
Venmo user Rohan-Mehrotra transacted with 2 other users
Venmo user Brett-Tracy transacted with 17 other users
Venmo user HaeJin_Park transacted with 30 other users
Venmo user Christian-Hansen transacted with 30 other users
Venmo user Grant-Cohen-1 transacted with 30 other users
Venmo user Anish-Kalra transacted with 9 other users
Venmo user Brendaveline transacted with 2 other users
Venmo user Orianna-Torres transacted with 30 other users
Venmo user Ben-Gargano transacted with 30 other users
Venmo user Alice-Ma-4 transacted with 30 other users
Venmo user Chinae-Gonzales transacted with 30 other users
Venmo user Peter-Nesbitt transacted with 30 other users
Venmo user Paddy-Nopany transacted with 30 other users
Venmo user Frances-Luong transacted with 30 other users
Venmo user Shannon-Turbidy transacted with 30 other users

Find Venmo users paired with Instagram users in MongoDB


In [5]:
user_matches = [result for result in VENMO_INSTAGRAM_MATCHES.find()]
print 'Total Venmo-Instagram user matches: %d' % len(user_matches)


Total Venmo-Instagram user matches: 408

Load the user's Venmo transactions from MongoDB

Includes transactions where the user is either the "actor" (payer) or the "target" (payee)


In [6]:
def venmo_user_trans(user_id):
    pipeline = [
        {"$unwind": "$transactions"},
        {"$match": {"$or": [
            {"actor.id": user_id},
            {"transactions.target.id": user_id}
        ]}},
        {"$sort": {"created_time": 1}}
    ]

    return [r for r in TRANS_COLLECTION.aggregate(pipeline)]

In [7]:
def parse_venmo_datetime(datetime_str):
    return datetime.strptime(datetime_str, VENMO_DATE_FORMAT_STR)

In [8]:
def get_venmo_trans_datetimes(transactions):
    return [parse_venmo_datetime( t.get('created_time') ) for t in transactions]

def get_instagram_datetimes(media):
    return [m.created_time for m in media]

Group Venmo and Instagram posts over time


In [9]:
def group_by_date(datetimes, min_date=None):
    results = {}
    for dt in datetimes:
        if min_date is None or dt.date() >= min_date:
            results[dt.date()] = results.setdefault(dt.date(), 0) + 1
    return results

def normalize_date_data(data_dict, all_dates):
    normalized = {}
    for date in all_dates:
        normalized[date] = data_dict.setdefault(date, 0)
    return normalized

Prepare the data for the plots


In [10]:
def get_instagram_api_data(instagram_user):
    instagram_id = instagram_user.get('id')
    media = get_all_paginated_data(API_CYCLER.api, 'user_recent_media', user_id=instagram_id, count=100)
    print '%d Media fetched for Instagram user %s (%s)' % (len(media), instagram_user.get('username'), instagram_user.get('id'))
    return media


def get_venmo_api_data(venmo_user):
    venmo_id = venmo_user.get('id')
    venmo_trans = venmo_user_trans(venmo_id)    
    print '%d Transactions fetched for Venmo user %s (%s)' % (len(venmo_trans), venmo_user.get('username'), venmo_user.get('id'))
    return venmo_trans


def get_api_data(venmo_user, instagram_user):
    media = get_instagram_api_data(instagram_user)
    venmo_trans = get_venmo_api_data(venmo_user)
    return venmo_trans, media


def normalize_for_plot(trans, media):
    venmo_trans_datetimes = get_venmo_trans_datetimes(trans)
    instagram_datetimes = get_instagram_datetimes(media)
    
    # Group media and transactions activity across individual days
    venmo_date_data = group_by_date(venmo_trans_datetimes, AFTER_CUTOFF_DATE)
    instagram_date_data = group_by_date(instagram_datetimes, AFTER_CUTOFF_DATE)
    
    full_date_set = set(instagram_date_data.keys()).union(venmo_date_data.keys())
    venmo_date_data_norm = normalize_date_data(venmo_date_data, full_date_set)
    instagram_date_data_norm = normalize_date_data(instagram_date_data, full_date_set)
    
    x = venmo_date_data_norm.keys()
    venmo_y = [venmo_date_data_norm[date] for date in venmo_date_data_norm.keys()]
    instagram_y = [instagram_date_data_norm[date] for date in instagram_date_data_norm.keys()]
    
    return x, venmo_y, instagram_y

Plot one user's Venmo and Instagram activity


In [11]:
width = 0.35
days = mdates.DayLocator()
weeks = mdates.WeekdayLocator()
date_fmt = mdates.DateFormatter('%d %b %Y')

def plot_user_data(x, venmo_y, instagram_y, venmo_user, instagram_user, fig_num):
    figure, ax = plt.subplots(figsize=(18, 4), num=fig_num)
    p1 = ax.bar(mdates.date2num(x), instagram_y, color='#ED913D', width=width, linewidth=0)
    p2 = ax.bar(mdates.date2num(x), venmo_y, color='#78b653', width=width, linewidth=0, bottom=instagram_y)
    
    # Formatting
    ax.xaxis.set_major_locator(weeks)
    ax.xaxis.set_minor_locator(days)
    ax.xaxis.set_major_formatter(date_fmt)
    ax.legend( (p1[0], p2[0]), ('Instagram', 'Venmo') )
    ax.set_ylabel('Activity')
    ax.set_xlabel('Dates')

    title_str = 'Venmo user %s (%s) | Instagram user %s (%s)' % (
        venmo_user.get('username'), venmo_user.get('id'),
        instagram_user.get('username'), instagram_user.get('id'),
    )
    ax.set_title(title_str)
    filename = '%s.png' % venmo_user.get('username')
    plt.savefig(os.path.join(GRAPHS_PATH, filename))
    plt.show()

Match Venmo "messages" and Instagram "captions"

Text Matching helper functions


In [12]:
def instagram_caption_words(instagram_media):
    raw_captions = [getattr(post.caption, 'text') for post in instagram_media if hasattr(post.caption, 'text')]
    return get_word_dictionary(raw_captions)


def venmo_message_words(venmo_messages):
    venmo_messages_raw = [tran.get('message') for tran in sample_trans]
    return get_word_dictionary(venmo_messages_raw)


def filter_ascii_punctuation(word):
    return ''.join([c for c in word if c not in string.punctuation])


def get_words(document):
    return [
        filter_ascii_punctuation(word) for 
        word in document.lower().replace('#', '').split() if word not in STOPLIST
   ]

def get_word_dictionary(documents):
    return [get_words(document) for document in documents]

Text matching with tf–idf


In [13]:
tfidf_threshold = 0.75
sim_match_index = 0
sim_match_words = 1

def text_matches(venmo_trans, instagram_media):
    instagram_words = instagram_caption_words(instagram_media)
#     venmo_words = venmo_message_words(trans)
#     v_word_dict = corpora.Dictionary(venmo_words)
    try:
        i_word_dict, tfidf_model, tfidf_index = build_tfidf_model(instagram_words) 
    except ValueError as e:
        yield None, None
#     print i_word_dict

#     doc = 'test sushi with pals'
    for i, msg in enumerate([tran.get('message') for tran in venmo_trans]):
#         vec_bow = i_word_dict.doc2bow(msg.lower().split())
        vec_bow = i_word_dict.doc2bow(get_words(msg))
        vec_tfidf = tfidf_model[vec_bow]
        sims = tfidf_index[vec_tfidf]
        tfidf_sims_list = [sim for sim in list(enumerate(sims)) if sim[sim_match_words] > tfidf_threshold]
#         report_results(tfidf_sims_list, instagram_words)
        if tfidf_sims_list:
#             print 'FOUND MATCHES FOR VENMO MSG %s' % msg
            yield venmo_trans[i], [instagram_media[match[sim_match_index]] for match in tfidf_sims_list]
        else:
            yield None, None


def build_tfidf_model(instagram_words):
    i_word_dict = corpora.Dictionary(instagram_words)
    i_corpus = [i_word_dict.doc2bow(word) for word in instagram_words]
    tfidf_model = models.TfidfModel(i_corpus)
    corpus_tfidf = tfidf_model[i_corpus]
    tfidf_index = similarities.MatrixSimilarity(tfidf_model[corpus_tfidf])
    return i_word_dict, tfidf_model, tfidf_index


def report_results(sims, media_captions):
    for i, sim in enumerate(sims):
        if sim[1] > 0:
            print '%s -- %s' % (sim, media_captions[i])

Match Venmo and Instagram updates around a date range


In [14]:
diff_after = timedelta(hours=-HOURS_RADIUS)
diff_before = timedelta(hours=HOURS_RADIUS)

def media_near_transaction(tran, media):
    tran_datetime = parse_venmo_datetime( tran.get('created_time') )
    after_datetime = tran_datetime + diff_after
    before_datetime = tran_datetime + diff_before
    return [
        m for m in media if 
        m.created_time > after_datetime and
        m.created_time < before_datetime
    ]

In [15]:
# # print media_captions_raw[0]
# # print venmo_messages_raw[0]

# def report_levenshtein_dist(i_captions, v_messages):
#     for ic in i_captions:
#         ld = sorted([(vm, edit_distance(vm, ic)) for vm in v_messages], key=lambda r: r[1], reverse=True)
#         print ic
#         for i, r in enumerate(ld[0:5]):
#             print '\t%d: %s' % (i+1, r)

# # report_levenshtein_dist(media_captions_raw, venmo_messages_raw)
# caption_words =  [[w for w in c.split()] for c in media_captions_raw]
# venmo_words = [[w for w in c.split()] for c in venmo_messages_raw]

# def wn_similarity(word1, word2):
#     return [(s1, s2, wn.path_similarity(s1, s2)) for s1 in wn.synsets(word1) for s2 in wn.synsets(word2)]

# # for doc in caption_words:
# #     for word in doc:
# #         for tran in venmo_words:
# #             for

In [16]:
# # test = u'\U0001f60d\U0001f61c\U0001f632'
# # print unicodedata.name(u'\U0001f632')

# edit_distance("dinner", "dinners")
# # print wn.synsets('fish')[0]
# # [s.hyponyms() for s in wn.synsets('fish')]

# [(s1.hyponyms(), s2.hyponyms(), wn.path_similarity(s1, s2)) for s1 in wn.synsets('lunch') for s2 in wn.synsets('dinner')]

# # wn.path_similarity(wn.synsets('fish'), wn.synsets('sushi'))

# # [exp1 for x in xSet for y in ySet] 

# # is equal to

# # result=[] 
# # for x in xSet:
# #   for y in ySet: 
# #     result.append(exp1)

In [17]:
# target_username = 'fwedeorange'
# sample = [m for m in user_matches if m.get('venmo').get('username') == target_username]
# sample_instagram = sample[-1].get('instagram')
# sample_venmo = sample[-1].get('venmo')
# sample_trans, sample_media = get_api_data(sample_venmo, sample_instagram)

In [18]:
# media1, media2, media3, media4 = sample_media[0], sample_media[1], sample_media[3], sample_media[0]
# m1 = [media1, media2]
# m3 = [media3, media4, media1]

# print set(m1).intersection(m3)

# print getattr(media1, 'id')
# # print sample_media[1] == sample_media[2]

Instagram updates are matched to Venmo transactions when the text matching and date range results return at least 1 common Instagram update


In [19]:
def venmo_instagram_matches(venmo_trans, instagram_media):
    for venmo_tran, instagram_caption_matches in text_matches(venmo_trans, instagram_media):
        if venmo_tran:
#             print venmo_tran.get('message'), [getattr(post.caption, 'text') for post in instagram_caption_matches]
            instagram_nearby_date = media_near_transaction(venmo_tran, instagram_media)
            vi_match = set(instagram_nearby_date).intersection(instagram_caption_matches)
            if vi_match:
                yield venmo_tran, list(vi_match)
            else:
                continue
#             yield venmo_tran, list(slam_dunkins) if slam_dunkins else None

Main loop


In [20]:
update_matches = []
errors = []
for i, user_pair in enumerate(user_matches):
    instagram_user = user_pair.get('instagram')
    venmo_user = user_pair.get('venmo')
    try:
        venmo_trans, instagram_media = get_api_data(venmo_user, instagram_user)
        print
        print 'Checking for matching Venmo and Instragram updates for user %s/%s' % (venmo_user.get('username'), instagram_user.get('username'))
        for va, ia in venmo_instagram_matches(venmo_trans, instagram_media):
            print update_matches.append((venmo, instagram))
        if update_matches:
            print 'FOUND MATCHING UPDATES %s' % update_matches
        else:
            pass
    except InstagramAPIError as e:
        if e.status_code == 400:
            error_str = "ERROR: Instagram user %s -- %s is set to private." % (instagram_user.get('username'), instagram_user.get('id'))
            errors.append(error_str)
        continue
#     print venmo_trans
#     print instagram_media
#     x, venmo_y, instagram_y = normalize_for_plot(venmo_trans, instagram_media)
#     plot_user_data(x, venmo_y, instagram_y, venmo_user, instagram_user, i)
#     match_pair_updates()


78 Media fetched for Instagram user ibotta (1235351878)
3577 Transactions fetched for Venmo user ibotta (820531)
WARNING:gensim.similarities.docsim:scanning corpus to determine the number of features (consider setting `num_features` explicitly)

Checking for matching Venmo and Instragram updates for user ibotta/ibotta
898 Media fetched for Instagram user alliekranick (10676783)
34 Transactions fetched for Venmo user alliekranick (1059179)
WARNING:gensim.similarities.docsim:scanning corpus to determine the number of features (consider setting `num_features` explicitly)

Checking for matching Venmo and Instragram updates for user alliekranick/alliekranick
281 Media fetched for Instagram user fwedeorange (22924935)
275 Transactions fetched for Venmo user fwedeorange (1435640)
WARNING:gensim.similarities.docsim:scanning corpus to determine the number of features (consider setting `num_features` explicitly)

Checking for matching Venmo and Instragram updates for user fwedeorange/fwedeorange
0 Media fetched for Instagram user adedinata706 (626233630)
43 Transactions fetched for Venmo user dre706 (1551758)
WARNING:gensim.similarities.docsim:scanning corpus to determine the number of features (consider setting `num_features` explicitly)

Checking for matching Venmo and Instragram updates for user dre706/adedinata706
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-20-8c53d162e0ef> in <module>()
      8         print
      9         print 'Checking for matching Venmo and Instragram updates for user %s/%s' % (venmo_user.get('username'), instagram_user.get('username'))
---> 10         for va, ia in venmo_instagram_matches(venmo_trans, instagram_media):
     11             print update_matches.append((venmo, instagram))
     12         if update_matches:

<ipython-input-19-2115cfd6aecb> in venmo_instagram_matches(venmo_trans, instagram_media)
      1 def venmo_instagram_matches(venmo_trans, instagram_media):
----> 2     for venmo_tran, instagram_caption_matches in text_matches(venmo_trans, instagram_media):
      3         if venmo_tran:
      4 #             print venmo_tran.get('message'), [getattr(post.caption, 'text') for post in instagram_caption_matches]
      5             instagram_nearby_date = media_near_transaction(venmo_tran, instagram_media)

<ipython-input-13-9a3a08d22654> in text_matches(venmo_trans, instagram_media)
     16     for i, msg in enumerate([tran.get('message') for tran in venmo_trans]):
     17 #         vec_bow = i_word_dict.doc2bow(msg.lower().split())
---> 18         vec_bow = i_word_dict.doc2bow(get_words(msg))
     19         vec_tfidf = tfidf_model[vec_bow]
     20         sims = tfidf_index[vec_tfidf]

UnboundLocalError: local variable 'i_word_dict' referenced before assignment

In [ ]: