In this section we are going to parse the tweets we collected and build the social network of interactions between Twitter users. We will also see how to analyze the network using NetworkX.

Parsing tweets

Tweets are saved in JSON format (JavaScript Object Notation) JSON is text, written with JavaScript object notation.

The json python module allows to easily import json file into python Dictonairies



In [1]:

    
#load tweets 

import json

filename = 'AI2.txt'

tweet_list = []

with open(filename, 'r') as fopen:
    # each line correspond to a tweet
    for line in fopen:
        if line != '\n':
            tweet_list.append(json.loads(line))

Let's look at the informations contained in a tweet



In [2]:

    
# take the first tweet of the list
tweet = tweet_list[2]



In [3]:

    
# each tweet is a python dictionary
type(tweet)









    Out[3]:





dict



In [4]:

    
# all the 'entries' of the dictionary
tweet.keys()









    Out[4]:





dict_keys(['created_at', 'id', 'id_str', 'text', 'display_text_range', 'source', 'truncated', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'quoted_status_id', 'quoted_status_id_str', 'quoted_status', 'is_quote_status', 'retweet_count', 'favorite_count', 'entities', 'favorited', 'retweeted', 'possibly_sensitive', 'filter_level', 'lang', 'timestamp_ms'])

you can find a description of the fields in the Twitter API documentation: https://dev.twitter.com/overview/api/tweets



In [5]:

    
#creation time
tweet['created_at']









    Out[5]:





'Thu May 04 19:38:42 +0000 2017'



In [6]:

    
# text of the tweet
print(tweet['text'])









    



Trou* https://t.co/FlQdwMFbmh



In [7]:

    
# user info
tweet['user']









    Out[7]:





{'contributors_enabled': False,
 'created_at': 'Wed Mar 04 18:25:07 +0000 2015',
 'default_profile': True,
 'default_profile_image': False,
 'description': None,
 'favourites_count': 1689,
 'follow_request_sent': None,
 'followers_count': 177,
 'following': None,
 'friends_count': 82,
 'geo_enabled': True,
 'id': 3070524046,
 'id_str': '3070524046',
 'is_translator': False,
 'lang': 'fr',
 'listed_count': 0,
 'location': 'Soissons, France',
 'name': 'Emi ⚛️',
 'notifications': None,
 'profile_background_color': 'C0DEED',
 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
 'profile_background_tile': False,
 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/3070524046/1492117856',
 'profile_image_url': 'http://pbs.twimg.com/profile_images/854034787731988481/BIrwBVvf_normal.jpg',
 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/854034787731988481/BIrwBVvf_normal.jpg',
 'profile_link_color': '1DA1F2',
 'profile_sidebar_border_color': 'C0DEED',
 'profile_sidebar_fill_color': 'DDEEF6',
 'profile_text_color': '333333',
 'profile_use_background_image': True,
 'protected': False,
 'screen_name': 'x_xmilix_x',
 'statuses_count': 10695,
 'time_zone': None,
 'url': None,
 'utc_offset': None,
 'verified': False}



In [8]:

    
# user is itslef a dict
print(type(tweet['user']))

tweet['user']['name']









    



<class 'dict'>






    Out[8]:





'Emi ⚛️'



In [9]:

    
# unique id of the user
tweet['user']['id']









    Out[9]:





3070524046



In [10]:

    
#is the tweet a retweet?
'retweeted_status' in tweet









    Out[10]:





False



In [11]:

    
if 'retweeted_status' in tweet:
    print(tweet['retweeted_status'])
# the `retweeted_status` is also a tweet dictionary



In [12]:

    
# user id and name of the retweeted user?
if 'retweeted_status' in tweet:
    print(tweet['retweeted_status']['user']['id'])
    print(tweet['retweeted_status']['user']['name'])



In [13]:

    
# is the tweet a reply?
'in_reply_to_user_id' in tweet and tweet['in_reply_to_user_id'] is not None









    Out[13]:





False



In [14]:

    
# 'entities' contains the hashtags, urls and usernames used in the tweet
tweet['entities']









    Out[14]:





{'hashtags': [],
 'symbols': [],
 'urls': [{'display_url': 'twitter.com/x_xmilix_x/sta…',
   'expanded_url': 'https://twitter.com/x_xmilix_x/status/860208954428850178',
   'indices': [6, 29],
   'url': 'https://t.co/FlQdwMFbmh'}],
 'user_mentions': []}



In [15]:

    
# user id of the mentioned users
for  mention in tweet['entities']['user_mentions']:
    print(mention['id'])



In [16]:

    
# is the tweet a quote?
'quoted_status' in tweet









    Out[16]:





True

Building the network of interactions

We will use the python module NetworkX to construct and analyze the social network.

A short introduction to networkx: https://github.com/networkx/notebooks

There are four types of interactions between two users in Twitter:

Retweet
Quote
Reply
Mention



In [17]:

    
# let's define some functions to extract the interactions from tweets

def getTweetID(tweet):
    """ If properly included, get the ID of the tweet """
    return tweet.get('id')
    
def getUserIDandScreenName(tweet):
    """ If properly included, get the tweet 
        user ID and Screen Name """
    user = tweet.get('user')
    if user is not None:
        uid = user.get('id')
        screen_name = user.get('screen_name')
        return uid, screen_name
    else:
        return (None, None)

def getRetweetedUserIDandSreenName(tweet):
    """ If properly included, get the retweet 
        source user ID and Screen Name"""
    
    retweet = tweet.get('retweeted_status')
    if retweet is not None:
        return getUserIDandScreenName(retweet)
    else:
        return (None, None)
    
def getRepliedUserIDandScreenName(tweet):
    """ If properly included, get the ID and Screen Name 
        of the user the tweet replies to """
    
    reply_id = tweet.get('in_reply_to_user_id')
    reply_screenname = tweet.get('in_reply_to_screen_name')
    return reply_id, reply_screenname
    
def getUserMentionsIDandScreenName(tweet):
    """ If properly included, return a list of IDs and Screen Names tuple
        of all user mentions, including retweeted and replied users """
        
    mentions = []
    entities = tweet.get('entities')
    if entities is not None:
        user_mentions = entities.get('user_mentions')
        for mention in user_mentions:
            mention_id = mention.get('id')
            screen_name = mention.get('screen_name')
            mentions.append((mention_id, screen_name))
    
    return mentions

    
def getQuotedUserIDandScreenName(tweet):
    """ If properly included, get the ID of the user the tweet is quoting"""
    
    quoted_status = tweet.get('quoted_status')
    
    if quoted_status is not None:
        return getUserIDandScreenName(quoted_status)
    else:
        return (None, None)
    
def getAllInteractions(tweet):
    """ Get all the interactions from this tweet
    
        returns : (tweeter_id, tweeter_screenname), list of (interacting_id, interacting_screenname)
    """
    
    # Get the tweeter
    tweeter = getUserIDandScreenName(tweet)
    
    # Nothing to do if we couldn't get the tweeter
    if tweeter[0] is None:
        return (None, None), []
    
    # a python set is a collection of unique items
    # we use a set to avoid duplicated ids
    interacting_users = set()
    
    # Add person they're replying to
    interacting_users.add(getRepliedUserIDandScreenName(tweet))
    
    # Add person they retweeted
    interacting_users.add(getRetweetedUserIDandSreenName(tweet))
    
    # Add person they quoted
    interacting_users.add(getQuotedUserIDandScreenName(tweet))
    
    # Add mentions
    interacting_users.update(getUserMentionsIDandScreenName(tweet))
  
    # remove the tweeter if he is in the set
    interacting_users.discard(tweeter)
    # remove the None case
    interacting_users.discard((None,None))
    
    # Return our tweeter and their influencers
    return tweeter, list(interacting_users)



In [18]:

    
print(getUserIDandScreenName(tweet))
print(getAllInteractions(tweet))









    



(3070524046, 'x_xmilix_x')
((3070524046, 'x_xmilix_x'), [])

Let's build the network



In [19]:

    
import networkx as nx

# define an empty Directed Graph
# A directed graph is a graph where edges have a direction
# in our case the edges goes from user that sent the tweet to
# the user with whom they interacted (retweeted, mentioned or quoted)
G = nx.DiGraph()

# loop over all the tweets and add edges if the tweet include some interactions
for tweet in tweet_list:
    # find all influencers in the tweet
    tweeter, interactions = getAllInteractions(tweet)
    tweeter_id, tweeter_name = tweeter
    
    # add an edge to the Graph for each influencer
    for interaction in interactions:
        interact_id, interact_name = interaction
        
        # add edges between the two user ids
        # this will create new nodes if the nodes are not already in the network
        G.add_edge(tweeter_id, interact_id)
        
        # add name as a property to each node
        # with networkX each node is a dictionary
        G.node[tweeter_id]['name'] = tweeter_name
        G.node[interact_id]['name'] = interact_name



In [20]:

    
# The graph's node are contained in a dictionary 
print(type(G.node))









    



<class 'dict'>



In [21]:

    
#print(G.node.keys())
# the keys are the user_id
print(G.node[tweeter_id])









    



{'name': 'WithAnOhioBias'}



In [22]:

    
# each node is itself a dictionary with node attributes as key,value pairs
print(type(G.node[tweeter_id]))









    



<class 'dict'>



In [23]:

    
# edges are also contained in a dictionary
print(type(G.edge))









    



<class 'dict'>



In [24]:

    
# we can see all the edges going out of this node
# each edge is a dictionary inside this dictionary with a key 
# corresponding to the target user_id
print(G.edge[tweeter_id])









    



{19934168: {}}



In [25]:

    
# so we can access the edge using the source user_id and the target user_id
G.edge[tweeter_id][interact_id]









    Out[25]:





{}

Some basic properties of the Network:



In [26]:

    
G.number_of_nodes()









    Out[26]:





4310



In [27]:

    
G.number_of_edges()









    Out[27]:





4665



In [28]:

    
# listing all nodes 
node_list = G.nodes()
node_list[:3]









    Out[28]:





[3226000831, 17349560, 837339093214318593]



In [29]:

    
# degree of a node
print(G.degree(node_list[2]))
print(G.in_degree(node_list[2]))
print(G.out_degree(node_list[2]))



In [30]:

    
# dictionary with the degree of all nodes
all_degrees = G.degree(node_list) # this is the degree for undirected edges
in_degrees = G.in_degree(node_list)
out_degrees = G.in_degree(node_list)



In [31]:

    
# average degree
2*G.number_of_edges()/G.number_of_nodes()









    Out[31]:





2.1647331786542923



In [32]:

    
import numpy as np
np.array(list(all_degrees.values())).mean()









    Out[32]:





2.1647331786542923



In [33]:

    
np.array(list(in_degrees.values())).mean()









    Out[33]:





1.0823665893271461



In [34]:

    
np.array(list(out_degrees.values())).mean()









    Out[34]:





1.0823665893271461



In [35]:

    
# maximum degree
max(all_degrees.values())









    Out[35]:





265



In [36]:

    
# we want to make a list with (user_id, username, degree) for all nodes
degree_node_list = []
for node in G.nodes_iter():
    degree_node_list.append((node, G.node[node]['name'], G.degree(node)))
    
print('Unordered user, degree list')    
print(degree_node_list[:10])

# sort the list according the degree in descinding order
degree_node_list = sorted(degree_node_list, key=lambda x:x[2], reverse=True)
print('Ordered user, degree list')    
print(degree_node_list[:10])









    



Unordered user, degree list
[(3226000831, 'LisaStartups', 1), (17349560, 'IoTRecruiting', 209), (837339093214318593, 'eframiyascnsu', 2), (16145086, 'InvestorIdeas', 212), (209484168, 'scoopit', 38), (2390579695, 'NatureGeekRobin', 1), (860148473873739779, 'cristinvincent4', 1), (740545840838774784, 'humcrum1', 4), (3390728889, 'Arron_banks', 1), (801446214784925696, 'WestmonsterUK', 1)]
Ordered user, degree list
[(15438913, 'MailOnline', 265), (16145086, 'InvestorIdeas', 212), (17349560, 'IoTRecruiting', 209), (380285402, 'DailyMail', 123), (841437061, 'DeepLearn007', 110), (111556423, 'DailyMailUK', 100), (335141638, 'BTS_twt', 92), (1927621387, 'AbyssCreations', 89), (111556576, 'MailSport', 76), (709564705304498176, 'calcaware', 59)]



In [37]:

    
# we need to import matplolib for making plots
# and numpy for numerical computations
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Network components

For directed graphs we can define two types of components:

Weakly connected components
Strongly connected components

Weakly connected component (WCC): maximal set of nodes where there exists a path in at least one direction between each pair of nodes.

Strongly connected component (SCC): maximal set of nodes where there exists a path in both directions between each pair of nodes.

Weakly connected giant (largest) component (WCGC): Largest WCC Strongly connected giant (largest) component (SCGC): Largest SCC



In [38]:

    
# this returns a list of set of nodes belonging to the 
# different (weakly) connected components
components = list(nx.weakly_connected_components(G))

# sort the component according to their size
components = list(sorted(components, key=lambda x:len(x), reverse=True))



In [39]:

    
# make a list with the size of each component
comp_sizes = []
for comp in components:
    comp_sizes.append(len(comp))



In [40]:

    
# plot the histogram of component sizes
hist = plt.hist(comp_sizes, bins=100)



In [41]:

    
# histogram with logarithmic y scale
hist = plt.hist(comp_sizes, bins=100, log=True)
plt.xlabel('component size')
plt.ylabel('number of components')









    Out[41]:





<matplotlib.text.Text at 0x178c01e5748>



In [42]:

    
# sizes of the ten largest components
comp_sizes[:10]









    Out[42]:





[1376, 724, 144, 106, 44, 21, 16, 14, 14, 14]



In [43]:

    
# let's make a new graph which is the subgraph of G corresponding to 
# the largest connected component
# let's find the largest component
largest_comp = components[0]
LCC = G.subgraph(largest_comp)



In [44]:

    
G.number_of_nodes()









    Out[44]:





4310



In [45]:

    
LCC.number_of_nodes()









    Out[45]:





1376



In [46]:

    
# let's plot the degree distribution inside the LCC
degrees = nx.degree(LCC)
degrees









    Out[46]:





{711840142382620672: 1,
 718458272433373185: 1,
 860149966764998656: 1,
 860206486529744899: 1,
 860149750447972353: 1,
 860144534172205057: 1,
 860210476046852102: 1,
 860162059987582977: 1,
 860210765021708291: 1,
 860163112531525634: 1,
 769485015046643722: 1,
 798166907753005061: 5,
 809010547815546885: 20,
 834712609345634308: 2,
 854445306708099081: 1,
 860150148554293249: 1,
 3296903193: 1,
 799031342: 1,
 1021182000: 1,
 3061784627: 1,
 24928313: 1,
 15917118: 2,
 4481835072: 8,
 3223912514: 1,
 36339779: 4,
 705539763349164032: 36,
 702817655733030912: 1,
 768780302214959104: 1,
 835533674347171840: 1,
 838384175698874368: 1,
 838322702117179392: 1,
 835062326676099072: 2,
 22642788: 1,
 1702936681: 4,
 777992630722498560: 2,
 746359431613517824: 2,
 787419801006247936: 1,
 772361408956731392: 4,
 769546771580194816: 1,
 828961332334563329: 1,
 781924257194446849: 2,
 819518438825660417: 1,
 769431558868570112: 1,
 768844352521113601: 1,
 773628725829177345: 3,
 775027582047752193: 1,
 3214647434: 4,
 18243723: 1,
 2225315982: 1,
 984219792: 5,
 947855521: 1,
 454008999: 1,
 110911659: 2,
 1523327148: 1,
 1156366519: 31,
 9068732: 2,
 65085635: 2,
 2776293590: 2,
 700965078: 1,
 3370287333: 1,
 21709050: 2,
 860168211844571136: 1,
 768942281852198912: 1,
 860161753904062464: 1,
 852812552765448192: 2,
 860163723528249345: 1,
 856160729753198593: 1,
 1165721864: 2,
 366649622: 1,
 730644764: 1,
 1294621: 2,
 3226026277: 1,
 1969430822: 1,
 165896498: 2,
 849887850015404032: 1,
 860160120445120512: 1,
 860163082110283776: 1,
 125632822: 8,
 860162958025994240: 1,
 860217205543317504: 1,
 33202502: 1,
 314515786: 1,
 1970053458: 1,
 1544118613: 2,
 19915112: 2,
 3048194411: 2,
 89203052: 2,
 2149187947: 1,
 2423931254: 2,
 2449670522: 1,
 837407586949410816: 1,
 835225809682051072: 2,
 837342315945537536: 1,
 836879110739734528: 3,
 4824596866: 1,
 829496506021457920: 1,
 3021611399: 2,
 836889174338842624: 1,
 744246081337307136: 1,
 831619328185348096: 4,
 835520822714445826: 2,
 837297831438331905: 2,
 834833711107551233: 1,
 835190595626872833: 1,
 835816905940283393: 1,
 101810581: 4,
 838408563894661124: 1,
 837352670499319813: 1,
 69288359: 1,
 62316970: 3,
 428114358: 8,
 363717060: 1,
 117490122: 1,
 4010074576: 1,
 924361172: 14,
 2791350752: 3,
 113050080: 5,
 849525513039347712: 3,
 854081153376157696: 3,
 860166042471071744: 1,
 335348201: 3,
 2483200492: 1,
 54706681: 1,
 2606940668: 1,
 2429854207: 1,
 854580694017867777: 1,
 860165536822345728: 1,
 860160040556052480: 1,
 860165079282499584: 1,
 860156972854689792: 1,
 2437661196: 1,
 2704548373: 1,
 33612317: 3,
 1969390123: 1,
 2405442102: 2,
 53256760: 1,
 1711571520: 6,
 615809625: 2,
 134873695: 1,
 1196671592: 3,
 837296011806658560: 1,
 837316611178647552: 1,
 835368380475527168: 1,
 837366861566201856: 2,
 837291780185935872: 2,
 837339093214318593: 2,
 837377031264423937: 1,
 835485407693393921: 1,
 838306010867585025: 3,
 837339624376782849: 1,
 838366862123618305: 1,
 838319111620939777: 1,
 2862563979: 1,
 281174670: 4,
 837391244540137475: 1,
 413246117: 1,
 448045737: 1,
 14885549: 1,
 468148913: 1,
 3054789297: 1,
 5120691: 8,
 16663249: 6,
 2873475809: 4,
 2375934691: 5,
 18244332: 1,
 46768880: 10,
 16155382: 2,
 828720893496942592: 1,
 712904434217410560: 3,
 705584490928693248: 4,
 829108462244093952: 4,
 829105905807417345: 1,
 15696647: 5,
 123298601: 1,
 31441718: 2,
 445981509: 2,
 2882224972: 7,
 15221598: 1,
 107357029: 1,
 835361598525042688: 1,
 835486032720195584: 1,
 826402120991666176: 1,
 827889855384416256: 1,
 768032918920916992: 1,
 769549485248770048: 1,
 769594000852545536: 1,
 790514314104958976: 1,
 835780708832538624: 1,
 17154954: 2,
 837397411618254849: 2,
 828909635763597313: 1,
 135250831: 2,
 827988948064563203: 1,
 19284888: 3,
 58639268: 1,
 432776112: 2,
 1372029883: 23,
 1969652682: 1,
 1969284056: 1,
 30581721: 1,
 32924635: 1,
 46875612: 3,
 2692236266: 1,
 635298820: 1,
 3750470663: 1,
 537158664: 4,
 292799521: 1,
 29287486: 3,
 1477739592: 2,
 24839262: 1,
 302605412: 1,
 2238579824: 1,
 374318199: 2,
 849710985703297025: 1,
 846014785246367745: 1,
 3226002576: 1,
 3042993302: 1,
 32883868: 2,
 835209866012680192: 2,
 835494912246308864: 3,
 837413998408654848: 2,
 836574071181697024: 1,
 275686563: 4,
 837307605081022465: 1,
 2936587459: 1,
 118359242: 1,
 1246421: 2,
 157787372: 1,
 3037463790: 1,
 130745589: 7,
 713973: 1,
 829550812418318336: 1,
 835315936127131648: 1,
 793300503035641856: 1,
 15492359: 3,
 22537480: 1,
 860151647883755520: 1,
 860163571451166720: 1,
 827280171702984706: 2,
 860162487668178944: 1,
 381560093: 1,
 703148323885023233: 11,
 2482988319: 1,
 161801527: 6,
 164775233: 2,
 2811405634: 1,
 137790790: 2,
 22873424: 3,
 1540089176: 1,
 3386918242: 1,
 1969775978: 1,
 3040454013: 9,
 4861830525: 1,
 849890014846693376: 1,
 854707607994544128: 1,
 849887285109764096: 1,
 849264472124469248: 1,
 849349687136137216: 1,
 771036263155630080: 3,
 781829198545551360: 2,
 849611527569190913: 1,
 854318019803262977: 3,
 849271431540113408: 3,
 849163775634702336: 2,
 838348032693370880: 1,
 854214969201700865: 1,
 849453227565252608: 2,
 837300129493286912: 1,
 860215672005111810: 1,
 79201672: 7,
 846999607297093634: 1,
 854317669474021378: 1,
 23324052: 1,
 849642857107468292: 2,
 54707608: 2,
 835417855210422273: 2,
 838406839536582657: 1,
 769302724991123457: 1,
 855938456123473921: 4,
 838361654391865345: 1,
 830596347455733761: 2,
 835737289443639297: 2,
 1226319277: 1,
 4796745148: 51,
 947758536: 1,
 2551244240: 3,
 226715102: 1,
 17688049: 1,
 860205207392665600: 1,
 848804341880242176: 2,
 240731649: 3,
 3405424143: 1,
 95626773: 2,
 2396718630: 4,
 3186746918: 1,
 28681769: 1,
 229533226: 1,
 42452522: 1,
 316950064: 1,
 860209629707554816: 1,
 298673723: 1,
 4077332060: 1,
 1397237364: 2,
 854269421510119424: 1,
 858776474513494016: 1,
 854365914933940224: 2,
 697522301642919936: 2,
 854738734213943296: 1,
 849594374698749953: 3,
 853538188446584833: 3,
 849306339125940225: 1,
 849417437263089665: 1,
 856565955140440064: 1,
 860160715902078977: 1,
 856201705515569154: 1,
 3324651150: 3,
 30205586: 1,
 2564064923: 1,
 38790830: 3,
 3165497013: 19,
 779970529558134784: 1,
 303154900: 1,
 3161843418: 1,
 854699959605764096: 1,
 856152091546591232: 1,
 854325126166454272: 1,
 849429260347613184: 1,
 849669612388196352: 1,
 85509895: 5,
 3314009881: 1,
 838411656787030016: 1,
 515065635: 1,
 748999989775462400: 1,
 17123109: 1,
 828928147596902400: 1,
 19375950: 3,
 3699394401: 1,
 20678505: 1,
 772661116946280448: 3,
 778329986176577536: 2,
 831364772641828864: 1,
 772588298493685760: 1,
 810508178232373248: 2,
 827820094659162112: 1,
 834746528342405121: 3,
 834274721046917121: 1,
 823291738383118338: 1,
 828890134795972610: 2,
 828886161691176962: 1,
 822501363343093762: 1,
 19425169: 1,
 148031377: 3,
 232294292: 3,
 789586391944589317: 3,
 1103955882: 1,
 17754032: 2,
 6399922: 3,
 555198400: 3,
 2968012739: 1,
 1969776624: 1,
 1969440758: 1,
 16680957: 4,
 2382923779: 2,
 769218140202995712: 1,
 762630650771935232: 1,
 705850200473010178: 2,
 771007619360231424: 2,
 733960139380756480: 3,
 21170191: 3,
 753877229139005440: 1,
 827890482445422592: 1,
 11069462: 6,
 3124389928: 1,
 40937524: 1,
 2381097013: 1,
 15583287: 4,
 555075657: 8,
 1076193356: 1,
 15632463: 2,
 560277587: 1,
 2788370532: 2,
 25053299: 3,
 808320971748872192: 2,
 788061318670651392: 2,
 734696606742810624: 3,
 792791914525429760: 1,
 834998210938736640: 3,
 835579822470533120: 1,
 815887916417355777: 1,
 835677286405967873: 2,
 837353335166537729: 1,
 835097538390855681: 2,
 835433520562638849: 2,
 837390909306245122: 1,
 823888271814103041: 2,
 838378388402409473: 1,
 2893971: 5,
 202590356: 2,
 15960218: 2,
 3346786462: 1,
 4262946977: 1,
 3337390245: 2,
 3065989288: 3,
 465021137: 2,
 38398195: 1,
 35203319: 20,
 728134256187219968: 1,
 800408333022732288: 3,
 722843751148756992: 1,
 51734793: 5,
 201664779: 1,
 72558860: 1,
 2389076270: 1,
 280455470: 1,
 20277582: 3,
 3055593813: 1,
 520579417: 1,
 16451932: 1,
 293177694: 1,
 3223914860: 1,
 2974460282: 1,
 24668544: 3,
 839120727374770176: 3,
 837369664904835072: 1,
 837300983877283840: 1,
 837356366075473920: 1,
 835704189242257408: 2,
 838327272700588032: 1,
 837352382640107520: 1,
 838310224956174336: 1,
 835754851019730945: 3,
 835704940978974722: 1,
 2510072210: 2,
 468117909: 1,
 835097147540451333: 1,
 28207516: 3,
 2471766433: 1,
 272804260: 1,
 768403535356657664: 1,
 838353956589961216: 1,
 969542076: 4,
 99674560: 1,
 1309731290: 9,
 48949722: 1,
 1439713758: 1,
 613329382: 1,
 14109159: 1,
 618891759: 2,
 3320809967: 1,
 14871035: 2,
 815967960934203392: 3,
 6629912: 2,
 87869991: 1,
 3223939633: 1,
 464439857: 1,
 56879674: 3,
 84421182: 1,
 119802433: 16,
 14715458: 4,
 3297929800: 2,
 860152145659777024: 1,
 860208936795983872: 1,
 860209175384772608: 1,
 806009670343282688: 9,
 834472979736252416: 1,
 15755882: 1,
 860156433396092929: 1,
 25152123: 5,
 828948999138922496: 1,
 826769542814846976: 1,
 811209223715639296: 1,
 853594750603186176: 2,
 832010234524815360: 2,
 849634161077997568: 1,
 856198915699462144: 1,
 831983385795055616: 2,
 854302463930880000: 1,
 815674642463674369: 3,
 854278892546379777: 2,
 857167005694251009: 1,
 854494559119822849: 3,
 856190190955745281: 1,
 860166605736726529: 1,
 860167032477798402: 1,
 860209804173856771: 1,
 782538648482881536: 1,
 785472437060247552: 1,
 754289113294053376: 1,
 378743443: 1,
 849471590442160132: 1,
 834795095631208448: 2,
 834817676438622213: 1,
 851765759869145093: 1,
 849278637450940422: 3,
 2469939907: 1,
 487312107: 2,
 149089014: 2,
 860161340354158592: 1,
 860161513822183424: 1,
 860164735261564928: 1,
 559582007: 2,
 100772684: 2,
 2566991: 3,
 489294689: 1,
 3339389805: 2,
 3227962225: 1,
 15919988: 2,
 917187452: 2,
 849200580606259200: 1,
 849893025983721472: 1,
 860153518669393920: 1,
 860212241483919360: 1,
 860162967094128640: 1,
 852460030410584064: 3,
 860165848232849408: 1,
 860152870439710721: 1,
 860160598616797185: 1,
 821766660822626304: 3,
 860168702926483458: 1,
 849234016658223105: 3,
 849498431081115649: 1,
 849342088860889091: 1,
 8768402: 1,
 813442504137863172: 3,
 860164926492602373: 1,
 463907759: 1,
 1703963568: 1,
 851086256: 2,
 20802480: 6,
 237906871: 1,
 3153734587: 1,
 14216123: 1,
 3427261: 1,
 153775043: 1,
 598436843: 1,
 156666863: 2,
 4619684853: 3,
 14650362: 2,
 3225922572: 1,
 3831720977: 1,
 2742938653: 3,
 3266661420: 1,
 2979998773: 2,
 196496463: 2,
 21351506: 1,
 2436238419: 1,
 1970203734: 1,
 1713417312: 1,
 152988768: 1,
 3107384434: 1,
 1969384574: 1,
 843870644999786496: 15,
 849622384323825664: 1,
 849247635684954112: 1,
 856167832958242816: 1,
 849174596272230400: 1,
 860156106747908096: 1,
 860208856877731840: 1,
 860168170979692544: 1,
 860157108121210880: 1,
 860169050462326784: 1,
 860209698695516161: 1,
 860211503051542529: 1,
 860169963268706305: 1,
 860154026394095618: 1,
 860164653565054978: 1,
 534563976: 22,
 854411620621193219: 1,
 354544791: 1,
 2343496862: 1,
 92343472: 1,
 26078393: 8,
 108367046: 1,
 1970162887: 1,
 41135304: 1,
 101584084: 1,
 849245523638968320: 3,
 25742569: 5,
 222039288: 8,
 860164379630739456: 1,
 16313602: 1,
 835475532213006336: 1,
 837327427445600256: 3,
 837402566833090560: 2,
 837308652038008832: 2,
 835320129013559296: 2,
 837299899431530496: 1,
 835608020428140545: 1,
 837326281863737345: 1,
 3054906679: 1,
 3018526009: 11,
 3187952959: 6,
 1671384386: 1,
 860216477152161792: 1,
 860161618516357120: 1,
 860149484935950336: 1,
 855417266359480320: 5,
 849281699963973632: 2,
 849889820126195712: 1,
 854435278374809600: 2,
 856175256930856961: 1,
 860148189139283968: 1,
 860165883855089664: 1,
 849295070142640130: 1,
 860159755016450049: 1,
 856102228666593281: 1,
 856160234783617027: 1,
 32583066: 1,
 3065228698: 2,
 588631454: 1,
 227249569: 2,
 4374719908: 2,
 308850086: 1,
 15003048: 1,
 1969524146: 1,
 59731379: 1,
 3386060212: 3,
 155364796: 1,
 14790085: 1,
 834806338479915008: 1,
 994761: 2,
 1845235153: 4,
 2487610835: 1,
 74780115: 1,
 78704093: 1,
 2449722848: 2,
 39734757: 1,
 192171500: 3,
 720547310: 2,
 1152486902: 3,
 215617024: 1,
 860210783501795328: 1,
 860167429711839232: 1,
 860162726206750720: 1,
 4830752258: 1,
 860214777091571713: 1,
 860170113500168193: 1,
 2357079560: 2,
 70389262: 1,
 316395032: 1,
 153751064: 1,
 104451618: 3,
 19017255: 6,
 186019376: 1,
 20532806: 1,
 213339721: 5,
 153415247: 4,
 2722008668: 4,
 477941341: 3,
 267939430: 4,
 453611119: 1,
 63737462: 1,
 3293187707: 1,
 4642098815: 26,
 14372486: 17,
 254094986: 1,
 330509973: 4,
 3063787169: 1,
 783355554: 1,
 29445797: 2,
 364129964: 1,
 3037597357: 1,
 32878258: 1,
 38186683: 1,
 7712452: 2,
 849714944471859200: 3,
 838312009238183936: 1,
 137236199: 1,
 620343023: 2,
 10194682: 5,
 56921859: 1,
 45100806: 1,
 1605693206: 6,
 83324697: 30,
 268693290: 1,
 3937914677: 2,
 3064942389: 2,
 21442370: 2,
 2903707460: 6,
 3290935135: 1,
 849367227245752320: 3,
 849891967702102016: 2,
 849303668549709824: 1,
 811574858924650496: 3,
 853675687399616512: 3,
 850823800451198976: 2,
 1684762472: 4,
 1281388388: 3,
 857752980023635968: 1,
 3082932069: 12,
 1603547006: 1,
 841437061: 110,
 837305747545407488: 2,
 837289277931274240: 2,
 834644070026854400: 1,
 828446264832442368: 2,
 832007018726703104: 2,
 834817330232311808: 1,
 76402590: 1,
 2863943583: 3,
 2759503783: 1,
 3232411561: 6,
 18911170: 1,
 14741453: 3,
 128700387: 2,
 408440808: 4,
 2798120942: 11,
 3396063214: 2,
 860155224434069504: 1,
 860159002273034241: 1,
 860160255954743296: 1,
 860148473873739779: 1,
 860160063293599744: 1,
 860162586846867458: 1,
 769193767870009344: 1,
 781856833992527872: 2,
 3226046472: 1,
 838410736741322752: 1,
 760025695003217921: 1,
 828962904590389250: 1,
 838374174599090180: 1,
 370724877: 1,
 718706233885077509: 1,
 827941805803057159: 2,
 3258363920: 1,
 856183129744297995: 1,
 835754132472463362: 1,
 835096612607229955: 2,
 2325278731: 1,
 271478805: 1,
 766285942449635332: 3,
 3160313905: 2,
 3765342260: 3,
 12480582: 3,
 738027168895275008: 2,
 838317807385382912: 1,
 836887267209842688: 2,
 2151485544: 1,
 149475451: 16,
 18903165: 5,
 17330304: 1,
 102125698: 1,
 828692314826477569: 4,
 48828552: 1,
 2340409483: 1,
 18944153: 2,
 2801430698: 2,
 807095: 1,
 2803716295: 2,
 2192666827: 2,
 3688992975: 7,
 50016468: 1,
 411021528: 2,
 2390536410: 1,
 2731995378: 2,
 772661607000317952: 4,
 770731919789137920: 1,
 769617356284895232: 1,
 834609484781723648: 4,
 722042028385771520: 1,
 769064149699665921: 1,
 790604182944358401: 19,
 709108992744759297: 7,
 755872984888385536: 1,
 743520279771619328: 1,
 722493162938310657: 1,
 784903714045829121: 3,
 158830865: 1,
 1063637270: 1,
 2281132316: 4,
 46543136: 1,
 2344530218: 17,
 1969230126: 1,
 320893232: 2,
 860148495063363584: 1,
 860169048797130752: 1,
 849668946970193920: 1,
 614609206: 2,
 348074302: 3,
 860160798508888065: 1,
 51810668: 5,
 860168454753497088: 1,
 720398733020508160: 4,
 860160259855233024: 1,
 860165910660657152: 1,
 277115268: 2,
 860152055964381185: 1,
 3049165190: 1,
 34181507: 1,
 713492033852157952: 3,
 1970057612: 1,
 4869689740: 2,
 3220189188: 2,
 833587995928178692: 4,
 3585208755: 3,
 3062272437: 1,
 7344572: 4,
 36057534: 2,
 1074139603: 5,
 243683803: 1,
 793721694741561344: 1,
 849163010665054208: 2,
 860214502381551617: 1,
 835107948066127872: 2,
 811288188056649728: 4,
 718358874617692160: 4,
 826561489452548096: 1,
 835612553925509120: 1,
 837339242737057793: 2,
 832055771437350913: 1,
 835561511057768449: 2,
 837381587281002496: 1,
 837382587819638784: 1,
 302666251: 9,
 834728736356450305: 2,
 835592153141231617: 1,
 826646211495084033: 1,
 837413297037131777: 1,
 826028828602548227: 7,
 2584908306: 1,
 278163992: 1,
 809518216095612933: 1,
 10228272: 6,
 18854457: 32,
 562967099: 1,
 416322109: 2,
 31363679: 2,
 976253545: 1,
 20476532: 2,
 709564705304498176: 59,
 729867108218064897: 2,
 2333708935: 2,
 729842069418561537: 3,
 2711212681: 12,
 723311846359322626: 12,
 61559439: 4,
 16118433: 2,
 2446561964: 2,
 1468060333: 2,
 484160191: 1,
 2475807423: 1,
 3429995201: 3,
 108286674: 14,
 2975134426: 1,
 616280800: 1,
 18854635: 1,
 372904697: 2,
 2556637946: 9,
 769611974917382144: 1,
 39334658: 2,
 770068797524959232: 1,
 60642052: 15,
 824683231391932416: 1,
 739926081059315712: 2,
 833266879020032000: 2,
 768745946624712704: 2,
 22516489: 6,
 627208970: 1,
 837293369671966722: 2,
 755369070376148994: 2,
 837785794509168641: 2,
 835762498192171009: 1,
 827861443982811137: 2,
 2908107542: 7,
 835260336827686917: 1,
 19378989: 1,
 633754418: 7,
 117609281: 2,
 419951447: 4,
 1969451875: 1,
 294376299: 1,
 3175494514: 5,
 800551267596529664: 2,
 2391495565: 1,
 204018600: 3,
 4903121845: 1,
 2599965626: 1,
 1970271164: 1,
 613864381: 2,
 18297831: 1,
 837308448031277056: 1,
 765566849794867200: 1,
 785539732356890624: 2,
 835728403139559424: 2,
 835781518417100800: 1,
 837296885845098496: 1,
 835428288470351872: 2,
 763309943298621441: 1,
 837407373903953921: 1,
 835342160518086657: 1,
 837304344957583361: 1,
 836878926395899905: 1,
 811317623317233665: 1,
 16290827: 5,
 275690505: 1,
 837376721871601666: 2,
 837362429776527364: 1,
 52859938: 3,
 19813425: 4,
 3148395570: 3,
 21328978: 1,
 2586932310: 1,
 92116069: 3,
 860151444229558272: 1,
 4120663169: 2,
 85480579: 2,
 3226145929: 1,
 747227279781040130: 2,
 709864286936403970: 7,
 837288606070890496: 1,
 838351781033828352: 1,
 835220550209396736: 1,
 2307675307: 10,
 832031800495071233: 2,
 18134210: 3,
 422966469: 4,
 2490176709: 1,
 77755590: 1,
 3056121033: 1,
 11859152: 1,
 840991810851872768: 1,
 16946434: 1,
 860166602284597248: 1,
 768792238381858816: 1,
 18994444: 4,
 524883214: 1,
 472159515: 1,
 743839010: 21,
 2217251: 1,
 3131094309: 1,
 3169916217: 7,
 9860432: 1,
 879646050: 2,
 2480993649: 1,
 14562685: 4,
 837333011800567808: 1,
 805766825468276736: 5,
 855476279319068672: 1,
 831998947631693824: 1,
 854925114026098688: 1,
 838358449410932736: 1,
 837215785785360384: 2,
 712551690159915008: 2,
 849418185367486464: 1,
 860161903158493184: 1,
 800414338515251201: 1,
 14800270: 4,
 2722051478: 3,
 45700508: 1,
 15021469: 1,
 835547776364707841: 1,
 838372471086645249: 2,
 849347316125401089: 1,
 20731304: 1,
 65426868: 2,
 18068926: 2,
 367973830: 1,
 4263007693: 8,
 2363848151: 2,
 14349786: 7,
 3367622109: 2,
 2796481666: 2,
 705421634790932480: 1,
 725650403325558784: 11,
 303748621: 1,
 816653: 6,
 3050223119: 1,
 320960023: 2,
 448034346: 1,
 742243485688975360: 1,
 858313168002666496: 3,
 20960846: 3,
 18208354: 1,
 2555754084: 1,
 393033324: 11,
 1969714802: 1,
 6411892: 4,
 849718149972807680: 1,
 860142057859371008: 1,
 860160313911463936: 1,
 3166017158: 3,
 860152862021570561: 1,
 859830452630114305: 1,
 12842652: 1,
 25204382: 1,
 380901025: 3,
 1018107560: 4,
 267310767: 1,
 143136451: 4,
 10458822: 1,
 1969583821: 1,
 7640782: 2,
 2389014229: 1,
 2476562134: 1,
 242964183: 2,
 350418656: 1,
 2485229282: 2,
 73783022: 2,
 858375048612257792: 2,
 849194062213259264: 1,
 ...}



In [47]:

    
degree_array = np.array(list(degrees.values()))
hist = plt.hist(degree_array, bins=100)



In [48]:

    
# using logarithmic scales
hist = plt.hist(degree_array, bins=100, log=True)
plt.xscale('log')



In [49]:

    
# logarithmic scale with logarithmic bins
N, bins, patches = plt.hist(degree_array, bins=np.logspace(0,np.log10(degree_array.max()+1), 20), log=True)
plt.xscale('log')
plt.xlabel('k - degree')
plt.ylabel('number of nodes')









    Out[49]:





<matplotlib.text.Text at 0x178c01c5400>



In [50]:

    
# Degree probability distribution (P(k))

# since we have logarithmic bins, we need to
# take into account the fact that the bins 
# have different lenghts when normalizing
bin_lengths = np.diff(bins) # lenght of each bin

summ = np.sum(N*bin_lengths)
normalized_degree_dist = N/summ

# check normalization:
print(np.sum(normalized_degree_dist*bin_lengths))

hist = plt.bar(bins[:-1], normalized_degree_dist, width=np.diff(bins))
plt.xscale('log')
plt.yscale('log')
plt.xlabel('k (degree)')
plt.ylabel('P(k)')









    



1.0






    Out[50]:





<matplotlib.text.Text at 0x178c2116860>

Exercise: do the same for the Graph comprising only retweet, replies, quote and mentions

Percolation of the Giant Component



In [51]:

    
import random

def getGCsize(G):
    """ returns the size of the largest component of G"""
        
    comps = nx.connected_components(G)
    return max([len(comp) for comp in comps])

Random Attack:



In [52]:

    
# list that will contain the size of the GC as we remove nodes
rnd_attack_GC_sizes = []

# we will take into account the undirected version of the graph
LCCundirected = nx.Graph(LCC)

nodes_list = LCCundirected.nodes()


while len(nodes_list) > 1:
    # add the size of the  current GC
    rnd_attack_GC_sizes.append(getGCsize(LCCundirected))
    
    # pick a random node
    rnd_node = random.choice(nodes_list)
    # remove from graph
    LCCundirected.remove_node(rnd_node)
    # remove from node list
    nodes_list.remove(rnd_node)



In [53]:

    
# convert list to numpy array
rnd_attack_GC_sizes = np.array(rnd_attack_GC_sizes)

# normalize by the initial size of the GC
GC_rnd = rnd_attack_GC_sizes/rnd_attack_GC_sizes[0]

# fraction of removed nodes
q = np.linspace(0,1,num=GC_rnd.size)

plt.plot(q,GC_rnd)
plt.xlabel('q')
plt.ylabel('GC')









    Out[53]:





<matplotlib.text.Text at 0x178c1bb2e10>

High degree attack:



In [54]:

    
# high degree attack
LCCundirected = nx.Graph(LCC)

# list of pairs (node, degree) sorted according the degree
node_deg_dict = nx.degree(LCCundirected)
nodes_sorted = sorted(node_deg_dict, key=node_deg_dict.get)

# list that will contain the size of the GC as we remove nodes
hd_attack_GC_sizes = []

while len(nodes_sorted) > 1:
    
    hd_attack_GC_sizes.append(getGCsize(LCCundirected))
    
    #remove node according to their degree
    node = nodes_sorted.pop()
    LCCundirected.remove_node(node)



In [55]:

    
hd_attack_GC_sizes = np.array(hd_attack_GC_sizes)
GC_hd = hd_attack_GC_sizes/hd_attack_GC_sizes[0]
q = np.linspace(0,1,num=GC_hd.size)

plt.plot(q,GC_rnd, label='random attack')
plt.plot(q,GC_hd, label='High-Degree attack')
plt.xlabel('q')
plt.ylabel('GC')
plt.legend()









    Out[55]:





<matplotlib.legend.Legend at 0x178c163fcc0>

Exercise: implement the High-Degree Adaptative (HDA) attack where at each step the node with the highest degree of the remaining graph is removed.

Save the graph to a GraphML file:

GraphML is file format based on XML useful for exchanging files between softwares.

http://graphml.graphdrawing.org/



In [57]:

    
nx.write_graphml(LCC, 'twitter_lcc_AI2.graphml')

We can now open the file with Gephi to vizualize the graph



In [ ]:

Analysis of a Twitter Social Network