Constructing Ego Networks from Retweets

(using pre-saved files instead of Twitter authentication)

Yotam Shmargad
University of Arizona
Email: yotam@email.arizona.edu
Web: www.yotamshmargad.com

Introduction


Twitter has become a prominent online social network, playing a major role in how people all over the world share and consume information. Moreover, while some social networks have made it difficult for researchers to extract data from their servers, Twitter remains relatively open for now. This tutorial will go through the details of how to construct a Twitter user’s ego network from retweets they have received on their tweets. Instead of focusing on who follows who on Twitter, the method instead conceptualizes edges as existing between users if they have recently retweeted each other.

Conceptualizing edges as retweets has two primary benefits. First, it captures recent interactions between users rather than decisions that they may have made long ago (i.e. following each other) that may not translate into meaningful interaction today. Second, users often have many more followers than they do retweeters. The method proposed can thus be used to analyze even relatively popular users.

1. Importing libraries


In [ ]:
# Import the libraries we need
import json
import time
import networkx
import matplotlib.pyplot as plt
from collections import Counter

In [ ]:
# Check working directory
os.getcwd()

In [ ]:
# Set working directory
os.chdir('FOLDER FOR SAVING FILES')

In [ ]:
# Check working directory
os.getcwd()

2. Pulling ego tweets


In [ ]:
# Read saved ego tweets
with open('egotweet.json', 'r') as file:
    ego = json.load(file)

In [ ]:
# Looking at a json object
ego[0]

In [ ]:
# Accessing an element of ego tweets
ego[0]["id_str"]

In [ ]:
# Storing one of ego's tweet id
egoid = ego[0]["id_str"]

In [ ]:
# Storing and printing ego tweet ids and retweet counts
tweetids = []
retweets = []

if len(ego) != 0:
    for egotweet in ego:
        tweetids.append(egotweet["id_str"])
        retweets.append(egotweet["retweet_count"])
        print(egotweet["id_str"],egotweet["retweet_count"])

3. Pulling retweeters


In [ ]:
# Sleep for 10 seconds
time.sleep(10)

In [ ]:
# Reading saved ego retweeters
with open('check.json', 'r') as file:
    check = json.load(file)

with open('self.json', 'r') as file:
    self = json.load(file)
    
with open('allretweeters.json', 'r') as file:
    allretweeters = json.load(file)

In [ ]:
# Printing tweet ids, retweet counts, 
# retweeters obtained, and whether a self tweet is included
for a, b, c, d in zip(tweetids,retweets,check,self):
    print(a, b, c, d)

In [ ]:
len(allretweeters)

In [ ]:
allretweeters

4. Visualizing the network of retweeters


In [ ]:
# Assigning edge weight to be number of tweets retweeted
weight = Counter()
for (i, j) in allretweeters:
    weight[(i, j)] +=1

In [ ]:
weight

In [ ]:
# Defining weighted edges
weighted_edges = list(weight.items())

In [ ]:
weighted_edges

In [ ]:
# Defining the network object
G = networkx.Graph()
G.add_edges_from([x[0] for x in weighted_edges])

In [ ]:
# Visualizing the network
networkx.draw(G, width=[x[1] for x in weighted_edges])

5. Pulling retweeter tweets


In [ ]:
# Defining the set of unique retweeters
unique = [x[0][1] for x in weighted_edges]

In [ ]:
len(unique)

In [ ]:
unique

In [ ]:
# Reading saved retweeter tweets
with open('alters.json', 'r') as file:
    alters = json.load(file)

In [ ]:
len(alters)

In [ ]:
# Printing the number of tweets pulled for each retweeter
for alt in alters:
    print(len(alt))

In [ ]:
# Storing and printing alter ids, tweet ids, and retweet counts
altids = []
alttweetids = []
altretweets = []

for alt in alters:
    for alttweet in alt:
        altids.append(alttweet["user"]["id_str"])
        alttweetids.append(alttweet["id_str"])
        altretweets.append(alttweet["retweet_count"])
        print(alttweet["user"]["id_str"],alttweet["id_str"],alttweet["retweet_count"])

6. Pulling retweeters of retweeters


In [ ]:
# Reading saved alter retweeters
with open('altcheck.json', 'r') as file:
    altcheck = json.load(file)

with open('altself.json', 'r') as file:
    altself = json.load(file)
    
with open('altretweeters.json', 'r') as file:
    altretweeters = json.load(file)

with open('allalt.json', 'r') as file:
    allalt = json.load(file)

In [ ]:
# Printing alter user ids, tweet ids, retweet counts, 
# retweeters obtained, and whether a self tweet is included
for a, b, c, d, e in zip(altids,alttweetids,altretweets,altcheck,altself):
    print(a, b, c, d, e)

In [ ]:
len(allalt)

In [ ]:
allalt

7. Visualizing the full network of retweeters


In [ ]:
weight = Counter()
for (i, j) in allalt:
    weight[(i, j)] +=1

In [ ]:
weight

In [ ]:
all_edges = weighted_edges + list(weight.items())

In [ ]:
all_edges

In [ ]:
# Defining the full network object
G = networkx.Graph()
G.add_edges_from([x[0] for x in all_edges])

In [ ]:
# Visualizing the full network
networkx.draw(G, width=[x[1] for x in all_edges])