visualizing discussions on twitter with networkx

kiran garimella, aalto university

michael mathioudakis, aalto university

social media

users generate digital content

status updates, blog posts, pictures, videos, reviews, ...

users interact

comments, likes, ratings, re-posts

digital traces

we can observe human interactions at global scale


microblogging platform

users post short messages, 'tweets'

since 2006, 300m + active users

tweets, retweets, replies

can we learn something from the stucture of people's interactions?

we'll do that by visualizing graphs



what is a graph?

data structure

two types of elements: nodes and edges

what are graphs used for?

represent social connections between people ...

... or represent networks e.g., road networks, computer networks


graph vs network

graphs with networkx

python library

create, process, visualize graphs

development started in 2004

mainly developed in 2014

building a graph

let's build earlier example

In [2]:
import networkx as nx

# initialize
graph = nx.Graph()

people = ['jere', 'ella', 'miika', 'anniina', 'mikko', 'olli', 'laura', 'maria']
connections = [('jere', 'ella'), ('ella', 'anniina'), ('ella', 'miika'),
               ('mikko', 'ella'), ('anniina', 'mikko'), ('laura', 'jere'),
               ('olli', 'jere'), ('jere', 'maria'), ('miika', 'mikko'),
               ('maria', 'laura'), ('olli', 'laura')]

# add all nodes
for node in people:

# add all edges
for node_a, node_b in connections:
    graph.add_edge(node_a, node_b)

visualizing a graph

In [3]:
def get_pyplot_ax(rows = 1, columns = 1, figsize = (7, 7)):
    """ helper function """
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(rows, columns, figsize = figsize)
    if rows == 1 and columns == 1:
    elif rows == 1 or columns == 1:
        for subax in ax:
    elif rows > 1 or columns > 1:
        for i in range(rows):
            for j in range(columns):
    return fig, ax

In [4]:
fig, ax = get_pyplot_ax()

nx.draw_networkx(graph, ax = ax, node_size = 2300, node_color = 'white')

fig.savefig('img/example_friends.png', dpi = 300)

In [5]:
fig, ax = get_pyplot_ax()

nx.draw_networkx(graph, ax = ax, node_size = 2300, node_color = 'black', node_label = None)

fig.savefig('img/generic_graph.png', dpi = 300)

add color to nodes

In [6]:
boy_color = 'green'
girl_color = 'orange'

color_by_gender = {
    'jere': boy_color, 'miika': boy_color, 'mikko': boy_color, 'olli': boy_color,
    'ella': girl_color, 'anniina': girl_color, 'laura': girl_color, 'maria': girl_color

# create a list of color per node
colors = [color_by_gender[node] for node in graph.nodes_iter()]

fig, ax = get_pyplot_ax()
nx.draw_networkx(graph, ax = ax, node_size = 2300, node_color = colors);

custom labels

In [7]:
fig, ax = get_pyplot_ax()

labels = {}
for person in people:
    labels[person] = person.upper()
nx.draw_networkx(graph, labels = labels, ax = ax, node_size = 2300, node_color = colors);

fixed node positions

In [18]:
pos = nx.layout.spring_layout(graph)

fig, ax = get_pyplot_ax(rows = 1, columns = 2, figsize = (12, 5))

left_plot = ax[0]
right_plot = ax[1]

nx.draw_networkx(graph, pos = pos, ax = left_plot, node_size = 2300,)
nx.draw_networkx(graph, pos = pos, ax = right_plot, node_size = 2300, node_color = colors)

analysis of a graph


In [9]:
number_of_friends =

for person in number_of_friends:
    print("{} has {} friends".format(person, number_of_friends[person]))

anniina has 2 friends
ella has 4 friends
olli has 2 friends
mikko has 3 friends
maria has 2 friends
jere has 4 friends
laura has 3 friends
miika has 2 friends


In [10]:
from clustering import spectral_clusters
partition = spectral_clusters(graph, 2)

In [11]:

array([0, 1, 1, 1, 1, 0, 0, 0], dtype=int32)

In [12]:
group_colors = []
for pos, name in enumerate(graph.nodes_iter()):
    side = partition[pos]
    color = 'blue' if side == 1 else 'grey'

In [13]:
fig, ax = get_pyplot_ax()
nx.draw_networkx(graph, ax = ax, node_color = group_colors, node_size = 2300);

back to twitter

  • the twitter data
  • retweets and replies


  • tweets collected using the streaming api (1% sample)
  • can also be collected from the REST api

a tweet

a retweet

original tweet author realDonaldTrump

retweet author terry_golfing


a reply

original tweet author realDonaldTrump

reply author tonyposnanski

text @realDonaldTrump have a better plan than the one you presented

building graph for retweets

In [14]:
dataset = []
tweet =  None
def is_retweet(tweet):
def get_retweet_author(tweet):
def get_original_author(tweet):
def contains_topic(tweet, topic):

In [15]:
topic = '#obamacare'
for tweet in dataset:
    if contains_topic(tweet, topic) and is_retweet(tweet):
        from_who = get_retweet_author(tweet)
        to_who = get_original_author(tweet)
        graph.add_edge(from_who, to_who)


what happens when people disagree?

we notice a difference...

retweet graphs for controversial topics contain two well-separated clusters


we cluster each graph like before

In [16]:
partition = spectral_clusters(graph, 2)

to find sides if they exist

visualize with clusters

In [17]:
group_colors = []
for pos, name in enumerate(graph.nodes_iter()):
    side = partition[pos]
    color = 'blue' if side == 1 else 'red'

nx.draw_networkx(graph, ax = ax, node_color = group_colors, node_size = 2300);

reply networks

what about reply networks?


  • intro to networkx a python library to create, process, visualize graphs
  • application twitter discussions
    • controversial and non-controversial topics
    • retweet and reply graphs
    • retweet graphs are clearly separated for controversial topics
    • replies also from one side to the other
    • consistent pattern for many instances we've seen
  • graphs can offer insights into human interactions

more about networkx

slides at