In this notebook I will explore some of the results obtained by running the run_lda.py file!

There are 3 main things that I do:

  • I take the names of those twitter handles which LDA couldnt not find a substantive 'topic(s)' for. That is to say, these twitter hadnles have been inactive for a while, have tweets that are extrmemely scattered or just simply do not have enough tweets for any analysis to be carried out. I place the names of these twitter users in a pickled list called LDA_identified_bad_handles.pkl.
  • I take the names of these bag handles and remove them from the list containing all the handles of the 2nd degree connnections we started with. I do this as I will need a 'pure' list of handles when running our tf-idf analysis. I will only be running tf-idf analysis on those twitter users that have valid LDA results. I place this list of valid hadnles in a pickle file called verified_handles_lda.pkl
  • I delete the the 'bad handle' dictionaries from our list of dictionaries. I pickle the resulting list into a file called final_database_lda_verified.pkl

In [1]:
%run helper_functions.py

In [2]:
lst = unpickle_object("2nd_degree_connections_LDA_complete.pkl")

In [20]:
lst[8] #an example of a bad handle dictionary


Out[20]:
{'meereve': {'content': ['great blog thx minor detail model shld nby theta',
   'tensorflow nice doc candidate sampler',
   'cool good pointer choose nce play non word corpus good',
   'port excellent lda2vec',
   'neural aartist',
   'finally public molecular autoencoder let interpolate gradient base optimization compound',
   'gif get',
   'introduce variational autoencoder prose code tensorflow python',
   'mris close see synesthesia experience machine',
   'tale variational autoencoder overzealous globb',
   'blog go hood variational autoencoder',
   'vaes infuse math tensorflow',
   'introduce variational autoencoder prose code',
   'introduce variational autoencoder prose code blog',
   'note cite post come soon thx',
   'aha think line miss note paper thumbs easy fix nice result',
   'humbly offer machine learn gif',
   'build app visualize transcript',
   'find set',
   'stripper',
   'cell lar device',
   'cell lar device',
   'feign surprise',
   'trippy hil',
   'monterrey jelly',
   'go monterrey bay aquarium jelly',
   'feel great commit lowre branch',
   'sher minn like',
   'plz patient wait retweet',
   'set justwonder',
   'want know set write file obvs',
   'write file loser',
   'stump',
   'memoriez',
   'remember day',
   'canoe abstract',
   'set ception',
   'look mirror',
   'trippy',
   'mean tho amirite',
   'metaaaa',
   'camera ready',
   'know base',
   'interrobang',
   'save tmp file justwonder',
   'error',
   'look',
   'debug',
   'set',
   'hpy hllwn',
   'canoe',
   'existentialgarfield',
   'nyc',
   'glitchy',
   'daybreaker',
   'try size',
   'hey goin',
   'like cluster alogrithm kmean',
   'set',
   'brown cow',
   'skip header body datum',
   'length uri',
   'plz',
   'like thegetty',
   'murakami',
   'try',
   'think base64 encode',
   'look',
   'whataboutnow',
   'rainyday nyc',
   'try',
   'set swear',
   'laurapalmer',
   'bear coenbrother',
   'year',
   'like woodcut',
   'deadrabbit',
   'help',
   'book',
   'spooky',
   'meta inception',
   'civilwarhair',
   'catbookcat',
   'berry',
   'abstract',
   'like vuillard',
   'hey setbot think',
   'blackandgold',
   'puppy',
   'perspective',
   'usa usa',
   'nice light huh',
   'tricky',
   'think jmw turner',
   'come card game prefer set bridge',
   'wassup',
   'civilwarhair',
   'catbookcat',
   'berries',
   'abstract',
   'like vuillard?',
   'hey setbot, think',
   'though blackandgold',
   'puppies',
   "it's perspective",
   'usa usa',
   'nice light, huh',
   "here's tricky one",
   'think jmw turner',
   'comes card games, prefer set bridge',
   'wassup'],
  'favorite_count': [1,
   1,
   1,
   29,
   2,
   0,
   0,
   0,
   0,
   15,
   0,
   7,
   0,
   0,
   1,
   1,
   10,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0],
  'hashtags': [[],
   [],
   [],
   [],
   [],
   [],
   [],
   ['Tensorflow', 'python'],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   ['strippers'],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   ['justwondering'],
   [],
   [],
   [],
   [],
   [],
   [],
   ['canoe', 'abstract'],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   ['interrobang'],
   [],
   ['justwondering'],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   ['hpy', 'hllwn'],
   ['canoe'],
   ['existentialgarfield'],
   [],
   [],
   [],
   [],
   [],
   [],
   ['kmeans'],
   ['set'],
   [],
   [],
   [],
   [],
   [],
   [],
   [],
   ['thegetty'],
   ['murakami'],
   [],
   [],
   [],
   [],
   ['whataboutnow'],
   ['rainyday', 'nyc'],
   [],
   [],
   ['laurapalmer'],
   ['bears', 'coenbrothers'],
   [],
   [],
   ['deadrabbit'],
   [],
   [],
   ['books'],
   [],
   ['meta', 'inception'],
   ['civilwarhair'],
   ['catbookcat'],
   [],
   [],
   [],
   [],
   ['blackandgold'],
   ['puppies'],
   ['perspective'],
   [],
   [],
   [],
   [],
   [],
   []],
  'retweet_count': [0,
   0,
   0,
   7,
   0,
   229,
   1,
   20,
   1,
   15,
   6,
   1,
   39,
   8,
   0,
   0,
   1,
   8,
   1,
   1,
   0,
   0,
   1,
   1,
   0,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   0,
   1,
   1,
   0,
   0,
   1,
   1,
   0,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   0,
   1,
   1,
   1,
   1,
   1,
   2,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1],
  'tokenized_tweets': ['great blog thx minor detail model shld nby theta',
   'tensorflow nice doc candidate sampler',
   'cool good pointer choose nce play non word corpus good',
   'port excellent lda2vec',
   'neural aartist',
   'finally public molecular autoencoder let interpolate gradient base optimization compound',
   'gif get',
   'introduce variational autoencoder prose code tensorflow python',
   'mris close see synesthesia experience machine',
   'tale variational autoencoder overzealous globb',
   'blog go hood variational autoencoder',
   'vaes infuse math tensorflow',
   'introduce variational autoencoder prose code',
   'introduce variational autoencoder prose code blog',
   'note cite post come soon thx',
   'aha think line miss note paper thumbs easy fix nice result',
   'humbly offer machine learn gif',
   'build app visualize transcript',
   'find set',
   'stripper',
   'cell lar device',
   'cell lar device',
   'feign surprise',
   'trippy hil',
   'monterrey jelly',
   'go monterrey bay aquarium jelly',
   'feel great commit lowre branch',
   'sher minn like',
   'plz patient wait retweet',
   'set justwonder',
   'want know set write file obvs',
   'write file loser',
   'stump',
   'memoriez',
   'remember day',
   'canoe abstract',
   'set ception',
   'look mirror',
   'trippy',
   'mean tho amirite',
   'metaaaa',
   'camera ready',
   'know base',
   'interrobang',
   'save tmp file justwonder',
   'error',
   'look',
   'debug',
   'set',
   'hpy hllwn',
   'canoe',
   'existentialgarfield',
   'nyc',
   'glitchy',
   'daybreaker',
   'try size',
   'hey goin',
   'like cluster alogrithm kmean',
   'set',
   'brown cow',
   'skip header body datum',
   'length uri',
   'plz',
   'like thegetty',
   'murakami',
   'try',
   'think base64 encode',
   'look',
   'whataboutnow',
   'rainyday nyc',
   'try',
   'set swear',
   'laurapalmer',
   'bear coenbrother',
   'year',
   'like woodcut',
   'deadrabbit',
   'help',
   'book',
   'spooky',
   'meta inception',
   'civilwarhair',
   'catbookcat',
   'berry',
   'abstract',
   'like vuillard',
   'hey setbot think',
   'blackandgold',
   'puppy',
   'perspective',
   'usa usa',
   'nice light huh',
   'tricky',
   'think jmw turner',
   'come card game prefer set bridge',
   'wassup']}}

In [3]:
handle_names = []
for dictionary in lst:
    name = list(dictionary.keys())
    handle_names.append(name)

In [4]:
handle_names = sum(handle_names, [])

In [5]:
#an example of me finding which user's in my LDA results tweet about "machine" --> alluding to "machine learning"
cnt = -1

for handle in handle_names:
    cnt +=1
    try:
        topics = lst[cnt][handle]['LDA']
        
        if "machine" in topics:
            print(handle)
    except:
        pass


emeader
nickwooduk
raulgarreta
stevenkuyan

In [7]:
# handles to be removed as they do not have valid LDA analysis
handle_to_remove = []
cnt = -1

for handle in handle_names:
    cnt += 1
    sub_dict = lst[cnt][handle]
    
    if "LDA" not in sub_dict:
        handle_to_remove.append(handle)

indicies = []

for handle in handle_to_remove:
    index = handle_names.index(handle)
    indicies.append(index)

In [14]:
#extracting the valid LDA handle
verified_handles_lda = [v for i,v in enumerate(handle_names) if i not in frozenset(indicies)]

In [34]:
handle_to_remove[:5] #a peek at the 'bad handles'


Out[34]:
['meereve', 'VoyageChicago', 'UKDanEdwards', 'SteveWAUGH1979', 'prithwic']

In [17]:
pickle_object(verified_handles_lda, "verified_handles_lda")

In [18]:
pickle_object(handle_to_remove, "LDA_identified_bad_handles")

In [30]:
#extracting the appropriate dictionaries to be used in TF-IDF analysis
final_database_lda_verified = [v for i,v in enumerate(lst) if i not in frozenset(indicies)]

In [33]:
pickle_object(final_database_lda_verified, "final_database_lda_verified")

In [ ]: