Link Prediction

Definition of Link Prediction
Perform link prediction on dataset
- Jaccard coefficient
- Preferential Attachment



In [1]:

    
import networkx as nx
import matplotlib.pyplot as plt # for plotting graphs
%matplotlib inline



In [2]:

    
GA = nx.read_gexf('../data/ga_graph.gexf')

Link Prediction

The idea of link prediction was first proposed by Liben-Nowell and Kleinberg in 2004 as the following question:

"Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?"

It's an inticing idea and has led to many interesting developments in the network literature. For our example, the question could be rephrased as:

"Given a snapshot of the Grey's Anatomy relationship network, can we infer which new relationships are likely to occur in the near future?"

Sounds awesome, but how does it work?

Jaccard Coefficient

The most popular measures for link prediction analyze the “proximity” of nodes in a network. One way to measure proximity is to see what proportion of neighbors a pair of nodes share. This can be capture succintly with the Jaccard index.

In the context of a network, we're comparing sets of neighbors:

$$ Jaccard = \frac{|\Gamma(u) \cap \Gamma(v)|}{|\Gamma(u) \cup \Gamma(v)|} $$

where $\Gamma(u)$ denotes the set of neighbors of $u$.



In [3]:

    
preds_jc = nx.jaccard_coefficient(GA)



In [4]:

    
pred_jc_dict = {}
for u, v, p in preds_jc:
    pred_jc_dict[(u,v)] = p



In [5]:

    
sorted(pred_jc_dict.items(), key=lambda x:x[1], reverse=True)[:10]









    Out[5]:





[(('kepner', 'mrs. seabury'), 1.0),
 (('colin', 'preston'), 1.0),
 (('denny', 'hank'), 1.0),
 (('finn', 'steve'), 1.0),
 (('tucker', 'ben'), 1.0),
 (('ellis grey', 'susan grey'), 0.5),
 (('derek', 'finn'), 0.5),
 (('mrs. seabury', 'olivia'), 0.5),
 (('addison', 'lexi'), 0.5),
 (('owen', 'preston'), 0.5)]



In [6]:

    
extra_attrs = {'finn':('Finn Dandridge','M','S'),
 'olivia':('Olivia Harper','F','S'),
 'steve':('Steve Murphy','M','S'),
 'torres':('Callie Torres','F','B'),
 'colin':('Colin Marlow','M','S'),
 'grey':('Meredith Grey','F','S'),
 'mrs. seabury':('Dana Seabury','F','S'),
 'altman':('Teddy Altman','F','S'),
 'tucker':('Tucker Jones','M','S'),
 'ben':('Ben Warren','M','S'),
 "o'malley":("George O'Malley",'M','S'),
 'thatch grey':('Thatcher Grey','M','S'),
 'susan grey':('Susan Grey','F','S'),
 'derek':('Derek Shepherd','M','S'),
 'chief':('Richard Webber','M','S'),
 'addison':('Addison Montgomery','F','S'),
 'karev':('Alex Karev','M','S'),
 'hank':('Hank','M','S'),
 'lexi':('Lexie Grey','F','S'),
 'adele':('Adele Webber','F','S'),
 'owen':('Owen Hunt','M','S'),
 'sloan':('Mark Sloan','M','S'),
 'arizona':('Arizona Robbins','F','G'),
 'izzie':('Izzie Stevens','F','S'),
 'preston':('Preston Burke','M','S'),
 'kepner':('April Kepner','M','S'),
 'bailey':('Miranda Bailey','F','S'),
 'ellis grey':('Ellis Grey','F','S'),
 'denny':('Denny Duquette','M','S'),
 'yang':('Cristina Yang','F','S'),
 'nancy':('Nancy Shepherd','F','S'),
 'avery':('Jackson Avery','M','S')}



In [7]:

    
for i in GA.nodes():
    GA.node[i]["full_name"]   = extra_attrs[i][0]
    GA.node[i]["gender"]      = extra_attrs[i][1]
    GA.node[i]["orientation"] = extra_attrs[i][2]



In [8]:

    
GA.node['grey']









    Out[8]:





{'full_name': 'Meredith Grey',
 'gender': 'F',
 'label': 'grey',
 'orientation': 'S'}

Preferential Attachment

The preferential attachement methods mirrors the “rich get richer” -- nodes with more connections will be the ones to be more likely to get future connections.

Essentially, the measure is the product of a node pairs degrees:

$$ PA = |\Gamma(u)| \bullet |\Gamma(v)|$$

where $\Gamma(u)$ denotes the set of neighbors (degree) of $u$.



In [9]:

    
preds_pa = nx.preferential_attachment(GA)



In [10]:

    
pred_pa_dict = {}
for u, v, p in preds_pa:
    pred_pa_dict[(u,v)] = p



In [11]:

    
sorted(pred_pa_dict.items(), key=lambda x:x[1], reverse=True)[:10]









    Out[11]:





[(('sloan', 'karev'), 35),
 (('grey', 'karev'), 28),
 (("o'malley", 'karev'), 28),
 (('yang', 'karev'), 21),
 (("o'malley", 'sloan'), 20),
 (('sloan', 'izzie'), 20),
 (('grey', 'sloan'), 20),
 (('izzie', 'torres'), 16),
 (('grey', 'izzie'), 16),
 (('grey', 'torres'), 16)]

Link Prediction

Link Prediction

Jaccard Coefficient

Preferential Attachment

Other Link Prediction Algorithms