Link Prediction

In this notebook, we demonstrate how to use GraphLab Create to construct a simple Link Prediction classifier. A link prediction classifier is a classifier that can predict the probability of the existence (or non-existence) of a link in a social network (SN). Namely, given a social network, such as Facebook, a link prediction classifier can help to determine if a link between two users exist. Our construction is general and not limited to SN. For example, in a phone call network our classifier can predict whether two subscribers will call each other.

There are many diverse methods to construct a link prediction classifier, such as using the SN topology, using the SN users' personal details or using posts and other published online content.

The main goal of this notebook is to demonstrate how to use GraphLab's SFrame and SGraph to extract various SN topological features. These topological features can later be utilized to perform various prediction tasks, such as link prediction, and predicting students' exam scores.

This notebook is organized as follows: First, we download a link prediction training dataset and constract a graph from it. Next, we will illustrate how GraphLab's SFrame and SGraph objects can be utilized to extract various SN topological features for each link. Finally, we will demonstrate how to use those features inside a link prediction classifier.

Downloading the Dataset

In this notebook, we will use dataset which was published by BGU Social Network Research Group. We will download the Google+ dataset which consists of around 3 million links, and generate a SGraph object containing the social links.



In [1]:

    
import graphlab as gl

# Loading the links dataset into a SFrame object
sf_links = gl.SFrame.read_csv("https://static.turi.com/datasets/bgu_directed_network_googleplus/g_plus_pos_and_neg_links.csv.gz")

# Let's view the data
print sf_links.head(3)

# Creating SGraph object from the SFrame object
g = gl.SGraph().add_edges(sf_links, src_field='src', dst_field='dst')









    



[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-3600 - Server binary: /Library/Python/2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1441218013.log
[INFO] GraphLab Server Version: 1.5.2






    



PROGRESS: Downloading https://static.turi.com/datasets/bgu_directed_network_googleplus/g_plus_pos_and_neg_links.csv.gz to /var/tmp/graphlab-bickson/3600/000000.gz
PROGRESS: Finished parsing file https://static.turi.com/datasets/bgu_directed_network_googleplus/g_plus_pos_and_neg_links.csv.gz
PROGRESS: Parsing completed. Parsed 100 lines in 1.35685 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file https://static.turi.com/datasets/bgu_directed_network_googleplus/g_plus_pos_and_neg_links.csv.gz
PROGRESS: Parsing completed. Parsed 3013792 lines in 1.60591 secs.
+-------+-------+-------+
|  src  |  dst  | class |
+-------+-------+-------+
| 16400 | 60081 |   1   |
| 55441 | 34187 |   1   |
| 59472 | 67007 |   1   |
+-------+-------+-------+
[3 rows x 3 columns]

Let use the SGraph's summary function to get details on the loaded SGraph object



In [2]:

    
g.summary()









    Out[2]:





{'num_edges': 3013792, 'num_vertices': 211187}

Our directed graph has around 3 million edges (referred to as links) and over 200,000 vertices (referred to as users). Those links are roughtly divided into two. Positive links (labeled 1), and Negative links (labeled 0). The positive links are observed links in the social network, while negative links were added at random so our binary classifier could learn from the negative examples as well. Note, that if you like to use your own dataset, you will need to add negative (unobserved) edges at random for our construction to work.

Topological Feature Extraction

Let's create an SFrame obejct with all user ids.



In [3]:

    
users_sf = g.get_vertices()
users_sf.rename({"__id": "id"}) 
users_sf.head(5)









    Out[3]:





    
        id
    
    
        5
    
    
        7
    
    
        8
    
    
        10
    
    
        27
    

[5 rows x 1 columns]

The users sframe can be used for calculating several topological feature for each link in the dataset. We will start by calculating various simple topological features, such as the each user's in-degree (i.e. the user's number of followers) and user's out-degree (i.e. number of users the user is followling).

To calculate each vertex toplogical features, we will first create a SFrame object that contains each user's in-friends (i.e. the users in the network that follows the user), and the user's out-friends (the users in the network which the user follows).



In [4]:

    
# Calculating each vertices in and out degree
out_friends_sf = sf_links.groupby("src", {"out_friends": gl.aggregate.CONCAT("dst")})
out_friends_sf.rename({"src": "id"})
in_friends_sf = sf_links.groupby("dst", {"in_friends": gl.aggregate.CONCAT("src")})
in_friends_sf.rename({"dst": "id"})









    Out[4]:





    
        id
        in_friends
    
    
        211023
        [175821, 191454, 42927,
19425, 20037, 84186, ...
    
    
        21855
        [117830, 202635, 7922,
148051, 98174, 10749, ...
    
    
        88004
        [184267, 165121, 156061,
25801, 82540, 200953, ...
    
    
        79732
        [166236, 156056, 42127,
173758, 44003, 113404, ...
    
    
        63664
        [133355, 135427, 189724,
52481, 74786, 148446, ...
    
    
        127950
        [108832, 124060, 191707,
64471, 18708, 31194, ...
    
    
        7899
        [19009, 14262, 166769,
92858, 45224, 202978, ...
    
    
        25263
        [47241, 145136, 57020,
2642, 47271, 60974, ...
    
    
        130872
        [132401, 79291, 158521,
197742, 6452, 48452, ...
    
    
        208837
        [122872, 14865, 23427,
86661, 108812, 42584, ...
    

[211187 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Using the SFrame join operation, we create a single SFrame which consists of each user's in and out friends.



In [5]:

    
users_sf = users_sf.join(in_friends_sf, on="id", how="outer")
users_sf = users_sf.join(out_friends_sf, on="id", how="outer")

# we replace missing values with empty lists
users_sf = users_sf.fillna('in_friends',[])
users_sf = users_sf.fillna('out_friends',[])
users_sf.head(10)









    Out[5]:





    
        id
        in_friends
        out_friends
    
    
        211023
        [175821, 191454, 42927,
19425, 20037, 84186, ...
        [178359, 191454, 114855,
117085, 95731, 150495, ...
    
    
        21855
        [117830, 202635, 7922,
148051, 98174, 10749, ...
        [41086, 49199, 75658,
55823, 67923, 199185, ...
    
    
        88004
        [184267, 165121, 156061,
25801, 82540, 200953, ...
        [194693, 53575, 36045,
162977, 184267, 82540, ...
    
    
        79732
        [166236, 156056, 42127,
173758, 44003, 113404, ...
        [42127, 56032, 190023,
137866, 56552, 166236, ...
    
    
        63664
        [133355, 135427, 189724,
52481, 74786, 148446, ...
        [189724, 148446, 74702,
42791, 133355, 153655, ...
    
    
        127950
        [108832, 124060, 191707,
64471, 18708, 31194, ...
        [179818, 185573, 85706,
207053, 27056, 44792, ...
    
    
        7899
        [19009, 14262, 166769,
92858, 45224, 202978, ...
        [45224, 82615, 202978,
68792, 53078, 66750, ...
    
    
        25263
        [47241, 145136, 57020,
2642, 47271, 60974, ...
        [91704, 23796, 71943,
32645, 189962, 109815, ...
    
    
        130872
        [132401, 79291, 158521,
197742, 6452, 48452, ...
        [108790, 43254, 114610,
79291, 56216, 104160, ...
    
    
        208837
        [122872, 14865, 23427,
86661, 108812, 42584, ...
        [193657, 203867, 177456,
107268, 12804, 161402, ...
    

[10 rows x 3 columns]

Using the created SFrame with each user in-friends and out-friends, we calculate several simple topological features for each user, such as each user's in-degree (i.e. number of followers) and each user's out-degree (i.e. number of users the user is followling).



In [6]:

    
#out_degree - number of users each vertex is following
users_sf['out_degree'] = users_sf["out_friends"].apply(lambda l: len(l) )

#in_degree - number of users following each vertex
users_sf['in_degree'] = users_sf["in_friends"].apply(lambda l: len(l) )

#all_degree - number of uniuqe users that following or are followed by each user
users_sf['all_friends'] = users_sf[['in_friends', 'out_friends']].apply(lambda r: list(set(r['in_friends']) | set(r['out_friends'])))
users_sf['all_degree'] = users_sf["all_friends"].apply(lambda l: len(l) )

#bi_degree - number of uniuqe users that are both following and followed by each user
users_sf['bi_friends'] = users_sf[['in_friends', 'out_friends']].apply(lambda r: list(set(r['in_friends']) & set(r['out_friends'])))
users_sf['bi_degree'] = users_sf["bi_friends"].apply(lambda l: len(l) )

users_sf.head(10)









    Out[6]:





    
        id
        in_friends
        out_friends
        out_degree
        in_degree
        all_friends
    
    
        211023
        [175821, 191454, 42927,
19425, 20037, 84186, ...
        [178359, 191454, 114855,
117085, 95731, 150495, ...
        8
        7
        [19425.0, 39588.0,
20037.0, 84934.0, ...
    
    
        21855
        [117830, 202635, 7922,
148051, 98174, 10749, ...
        [41086, 49199, 75658,
55823, 67923, 199185, ...
        8
        7
        [206059.0, 117830.0,
5607.0, 75658.0, ...
    
    
        88004
        [184267, 165121, 156061,
25801, 82540, 200953, ...
        [194693, 53575, 36045,
162977, 184267, 82540, ...
        10
        11
        [165121.0, 53575.0,
194693.0, 9350.0, ...
    
    
        79732
        [166236, 156056, 42127,
173758, 44003, 113404, ...
        [42127, 56032, 190023,
137866, 56552, 166236, ...
        9
        9
        [187648.0, 56032.0,
44003.0, 57284.0, ...
    
    
        63664
        [133355, 135427, 189724,
52481, 74786, 148446, ...
        [189724, 148446, 74702,
42791, 133355, 153655, ...
        11
        13
        [52481.0, 135427.0,
133516.0, 74702.0, ...
    
    
        127950
        [108832, 124060, 191707,
64471, 18708, 31194, ...
        [179818, 185573, 85706,
207053, 27056, 44792, ...
        19
        29
        [186883.0, 71174.0,
138631.0, 98699.0, ...
    
    
        7899
        [19009, 14262, 166769,
92858, 45224, 202978, ...
        [45224, 82615, 202978,
68792, 53078, 66750, ...
        11
        10
        [19009.0, 202978.0,
199941.0, 149921.0, ...
    
    
        25263
        [47241, 145136, 57020,
2642, 47271, 60974, ...
        [91704, 23796, 71943,
32645, 189962, 109815, ...
        7
        9
        [71943.0, 32645.0,
47271.0, 47241.0, ...
    
    
        130872
        [132401, 79291, 158521,
197742, 6452, 48452, ...
        [108790, 43254, 114610,
79291, 56216, 104160, ...
        8
        9
        [104160.0, 188507.0,
48452.0, 43254.0, ...
    
    
        208837
        [122872, 14865, 23427,
86661, 108812, 42584, ...
        [193657, 203867, 177456,
107268, 12804, 161402, ...
        7
        17
        [23427.0, 107268.0,
86661.0, 200330.0, ...
    


    
        all_degree
        bi_friends
        bi_degree
    
    
        14
        [191454.0]
        1
    
    
        15
        []
        0
    
    
        18
        [162977.0, 184267.0,
82540.0] ...
        3
    
    
        16
        [166236.0, 42127.0]
        2
    
    
        20
        [133355.0, 189724.0,
148446.0, 74702.0] ...
        4
    
    
        44
        [108832.0, 179740.0,
207053.0, 22318.0] ...
        4
    
    
        18
        [45224.0, 202978.0,
82615.0] ...
        3
    
    
        16
        []
        0
    
    
        16
        [79291.0]
        1
    
    
        24
        []
        0
    

[10 rows x 9 columns]

Now, we have several degree feautres for each user. Lets utilize these users' features and create features for each link in our positive and negative links dataset. Namely, for each link in the data conists of two users u and v, we will create SFrame with each user's degree features.

Note: runining can take several minutes.



In [7]:

    
sf_links = sf_links.join(users_sf, on={"src": "id"}, how="right")
sf_links.rename({"in_friends": "src_in_friends", "out_friends": "src_out_friends",
           "all_friends": "src_all_friends", "all_degree": "src_all_degree",
           "bi_friends": "src_bi_friends", "bi_degree": "src_bi_degree",
           "in_degree": "src_in_degree", "out_degree": "src_out_degree"
           })

sf_links = sf_links.join(users_sf, on={"dst": "id"}, how="right")
sf_links.rename({"in_friends": "dst_in_friends", "out_friends": "dst_out_friends",
           "all_friends": "dst_all_friends", "all_degree": "dst_all_degree",
           "bi_friends": "dst_bi_friends", "bi_degree": "dst_bi_degree",
           "in_degree": "dst_in_degree", "out_degree": "dst_out_degree"})

sf_links.head(10)









    Out[7]:





    
        src
        dst
        class
        src_in_friends
        src_out_friends
        src_out_degree
        src_in_degree
    
    
        16400
        60081
        1
        [398, 164476, 158289,
25877, 45051, 103519, ...
        [60081, 72458, 55255,
23667, 45842, 510, ...
        187
        60
    
    
        55441
        34187
        1
        [47069, 203601, 202891,
74718, 40220, 95893, ...
        [34187, 169679, 20260,
2828, 166652, 104248, ...
        12
        9
    
    
        59472
        67007
        1
        [153627, 170956, 59785,
107075, 42942, 119730, ...
        [67007, 68849, 13940,
66935, 207534, 202531, ...
        101
        102
    
    
        79102
        174248
        1
        [17824, 86736, 40718,
80705, 183565, 206244, ...
        [174248, 58527, 94279,
82240, 150362, 115046, ...
        378
        125
    
    
        135415
        63941
        1
        [25504, 159023, 79854,
85472, 899, 180460, ...
        [63941, 123425, 33739,
6473, 2925, 154425, ...
        86
        54
    
    
        21650
        154098
        1
        [105155, 20292, 166059,
71126, 510, 187171, ...
        [154098, 171451, 145995,
10888, 164476, 122311, ...
        571
        221
    
    
        140358
        152017
        1
        [26754, 196070, 181832,
89802, 127264, 150731, ...
        [152017, 21469, 181065,
45442, 186338, 55170, ...
        142
        169
    
    
        131272
        19556
        1
        [177562, 119155, 4050,
80338, 84221, 113372, ...
        [19556, 142061, 14344,
98752, 191884, 31489, ...
        101
        72
    
    
        49627
        29754
        1
        [54190, 29754, 84452,
126486, 208641, 135083, ...
        [29754, 40361, 126486,
209533, 29482, 105542, ...
        9
        11
    
    
        44402
        82787
        1
        [169958, 17147, 171733,
138035, 118118, 28173, ...
        [82787, 117486, 196181,
107067, 84349, 172619, ...
        307
        533
    


    
        src_all_friends
        src_all_degree
        src_bi_friends
        src_bi_degree
        dst_in_friends
    
    
        [80897.0, 46086.0,
10761.0, 56332.0, ...
        210
        [2304.0, 80897.0,
85769.0, 175755.0, ...
        37
        [16400, 59651, 209476,
58116, 200236, 168261, ...
    
    
        [146608.0, 34187.0,
20260.0, 166652.0, ...
        20
        [202891.0]
        1
        [55441, 88052, 104248,
168497, 30237, 61864, ...
    
    
        [171522.0, 169480.0,
122380.0, 139277.0, ...
        131
        [130689.0, 128259.0,
96214.0, 194310.0, ...
        72
        [59472, 97322, 22318,
106406, 208920, 191232, ...
    
    
        [21504.0, 189443.0,
176133.0, 110598.0, ...
        410
        [172039.0, 89625.0,
150362.0, 151074.0, ...
        93
        [79102, 46738, 92118,
127523, 169659, 67969, ...
    
    
        [147885.0, 148227.0,
85016.0, 123425.0, ...
        101
        [98563.0, 44406.0,
11142.0, 86153.0, 899.0, ...
        39
        [135415, 19114, 120093,
161025, 86153, 48014, ...
    
    
        [18432.0, 192513.0,
188420.0, 79883.0, ...
        608
        [75268.0, 7690.0,
170507.0, 138254.0, ...
        184
        [21650, 191196, 136266,
165000, 154970, 72022, ...
    
    
        [53254.0, 116396.0,
79883.0, 107778.0, ...
        233
        [75138.0, 197419.0,
75141.0, 68999.0, ...
        78
        [140358, 185088, 47848,
172151, 89802, 14902, ...
    
    
        [31489.0, 14344.0,
91145.0, 117258.0, ...
        132
        [176897.0, 159618.0,
36099.0, 14344.0, ...
        41
        [131272, 91512, 120125,
121418, 74268, 192961, ...
    
    
        [208641.0, 118339.0,
84452.0, 154853.0, ...
        18
        [29754.0, 126486.0]
        2
        [49627, 35625, 116496,
57902, 120441, 10251, ...
    
    
        [139264.0, 30727.0,
182281.0, 75787.0, ...
        585
        [139264.0, 80388.0,
30330.0, 209926.0, ...
        255
        [44402, 116002, 138049,
46956, 48378, 42649, ...
    


    
        dst_out_friends
        dst_out_degree
        dst_in_degree
        dst_all_friends
        dst_all_degree
    
    
        [44239, 83549, 58116,
51593, 58974, 22393, ...
        13
        26
        [185339.0, 132768.0,
59651.0, 58116.0, ...
        35
    
    
        [24309, 111920, 3614,
189367, 86360, 67261, ...
        54
        51
        [66824.0, 2825.0,
146698.0, 152605.0, ...
        68
    
    
        [76733, 117581, 101235,
105747, 80, 103174, ...
        475
        81
        [36864.0, 190465.0,
26626.0, 131086.0, ...
        500
    
    
        [92118, 165263, 172529,
27573, 203266, 48146, ...
        57
        148
        [147457.0, 139778.0,
110598.0, 108555.0, ...
        161
    
    
        [124927, 11369, 128442,
160446, 165522, 38165, ...
        100
        32
        [109575.0, 155662.0,
119825.0, 4867.0, ...
        110
    
    
        [136266, 161788, 158610,
97845, 103542, 33709, ...
        33
        20
        [59264.0, 5249.0,
174980.0, 23685.0, ...
        44
    
    
        [99670, 12501, 8152,
34452, 127380, 163437, ...
        249
        192
        [132100.0, 169989.0,
116396.0, 60429.0, ...
        295
    
    
        [191884, 97353, 169522,
143748, 113372, 165445, ...
        38
        38
        [167040.0, 176897.0,
49666.0, 143748.0, ...
        55
    
    
        [12437, 80439, 49627,
10251, 110007, 124514, ...
        9
        10
        [107945.0, 124514.0,
49627.0, 110007.0, ...
        17
    
    
        [203017, 48413, 146265,
18714, 10205, 155203, ...
        12
        99
        [90862.0, 177158.0,
203017.0, 82443.0, ...
        106
    


    
        dst_bi_friends
        dst_bi_degree
    
    
        [147448.0, 121340.0,
59651.0, 58116.0] ...
        4
    
    
        [66824.0, 2825.0,
106891.0, 11411.0, ...
        37
    
    
        [191232.0, 125568.0,
16514.0, 128259.0, ...
        56
    
    
        [39808.0, 67969.0,
203266.0, 205955.0, ...
        44
    
    
        [86153.0, 48014.0,
52880.0, 114321.0, ...
        22
    
    
        [5249.0, 23685.0,
136266.0, 150830.0, ...
        9
    
    
        [132100.0, 135691.0,
103443.0, 3673.0, ...
        146
    
    
        [176897.0, 159618.0,
113372.0, 194085.0, ...
        21
    
    
        [10251.0, 49627.0]
        2
    
    
        [31152.0, 203017.0,
10205.0, 104078.0, ...
        5
    

[10 rows x 19 columns]

Beside adding each link's users u and v degree features, to create a decent link prediction classifier, we also need to add features based on the strength of connection between the users. In this notebook, we will add for each link three simple type of features:

Common-Friends Features - the number of friends both u and v have in common.
Total-Friends Features - the number of friends both u and v have in together.
Jaccard coefficient- the number of Common-Friends divided by the Number of Total-Friends.

Lets define the each feature type function:



In [8]:

    
def common_friends(u, v, u_friends, v_friends):
    u_friends = set(u_friends)
    if v in u_friends:
            u_friends.remove(v)

    v_friends = set(v_friends)
    if u in v_friends:
        v_friends.remove(u)
    return len(u_friends & v_friends)

def total_friends(u, v, u_friends, v_friends):
    u_friends = set(u_friends)
    if v in u_friends:
        u_friends.remove(v)

    v_friends = set(v_friends)
    if u in v_friends:
        v_friends.remove(u)

    return len(u_friends | v_friends)

def jacc_coef(u,v, u_friends, v_friends):
    t = total_friends(u,v,u_friends,v_friends)
    if  t == 0:
        return 0
    return common_friends(u,v,u_friends, v_friends)/ float(t)

Using these three features type we created 12 new features (4 feature for each feature type) that are based on direction of the friendship between u and v and their friends.

Please note that The formal mathematical defintion of the feature presented throught this section can be found in Fire et al. 2014.

The following code block may take a few minutes.



In [9]:

    
sf_links['common_friends'] = sf_links[['src','dst', 'src_all_friends', 'dst_all_friends']].apply(lambda r: common_friends(r['src'], r['dst'],r['src_all_friends'], r['dst_all_friends']))
sf_links['common_bi_friends'] = sf_links[['src','dst', 'src_bi_friends', 'dst_bi_friends']].apply(lambda r: common_friends(r['src'], r['dst'],r['src_bi_friends'], r['dst_bi_friends']))
sf_links['common_in_friends'] = sf_links[['src','dst', 'src_in_friends', 'dst_in_friends']].apply(lambda r: common_friends(r['src'], r['dst'],r['src_in_friends'], r['dst_in_friends']))
sf_links['common_out_friends'] = sf_links[['src','dst', 'src_out_friends', 'dst_out_friends']].apply(lambda r: common_friends(r['src'], r['dst'],r['src_out_friends'], r['dst_out_friends']))

sf_links['total_friends'] = sf_links[['src','dst', 'src_all_friends', 'dst_all_friends']].apply(lambda r: total_friends(r['src'], r['dst'],r['src_all_friends'], r['dst_all_friends']))
sf_links['total_bi_friends'] = sf_links[['src','dst', 'src_bi_friends', 'dst_bi_friends']].apply(lambda r: total_friends(r['src'], r['dst'],r['src_bi_friends'], r['dst_bi_friends']))
sf_links['total_in_friends'] = sf_links[['src','dst', 'src_in_friends', 'dst_in_friends']].apply(lambda r: total_friends(r['src'], r['dst'],r['src_in_friends'], r['dst_in_friends']))
sf_links['total_out_friends'] = sf_links[['src','dst', 'src_out_friends', 'dst_out_friends']].apply(lambda r: total_friends(r['src'], r['dst'],r['src_out_friends'], r['dst_out_friends']))


sf_links['jacc_coef'] = sf_links[['src','dst', 'src_all_friends', 'dst_all_friends']].apply(lambda r: jacc_coef(r['src'], r['dst'],r['src_all_friends'], r['dst_all_friends']))
sf_links['bi_jacc_coef'] = sf_links[['src','dst', 'src_bi_friends', 'dst_bi_friends']].apply(lambda r: jacc_coef(r['src'], r['dst'],r['src_bi_friends'], r['dst_bi_friends']))
sf_links['in_jacc_coef'] = sf_links[['src','dst', 'src_in_friends', 'dst_in_friends']].apply(lambda r: jacc_coef(r['src'], r['dst'],r['src_in_friends'], r['dst_in_friends']))
sf_links['out_jacc_coef'] = sf_links[['src','dst', 'src_out_friends', 'dst_out_friends']].apply(lambda r: jacc_coef(r['src'], r['dst'],r['src_out_friends'], r['dst_out_friends']))
sf_links.head(10)









    Out[9]:





    
        src
        dst
        class
        src_in_friends
        src_out_friends
        src_out_degree
        src_in_degree
    
    
        16400
        60081
        1
        [398, 164476, 158289,
25877, 45051, 103519, ...
        [60081, 72458, 55255,
23667, 45842, 510, ...
        187
        60
    
    
        55441
        34187
        1
        [47069, 203601, 202891,
74718, 40220, 95893, ...
        [34187, 169679, 20260,
2828, 166652, 104248, ...
        12
        9
    
    
        59472
        67007
        1
        [153627, 170956, 59785,
107075, 42942, 119730, ...
        [67007, 68849, 13940,
66935, 207534, 202531, ...
        101
        102
    
    
        79102
        174248
        1
        [17824, 86736, 40718,
80705, 183565, 206244, ...
        [174248, 58527, 94279,
82240, 150362, 115046, ...
        378
        125
    
    
        135415
        63941
        1
        [25504, 159023, 79854,
85472, 899, 180460, ...
        [63941, 123425, 33739,
6473, 2925, 154425, ...
        86
        54
    
    
        21650
        154098
        1
        [105155, 20292, 166059,
71126, 510, 187171, ...
        [154098, 171451, 145995,
10888, 164476, 122311, ...
        571
        221
    
    
        140358
        152017
        1
        [26754, 196070, 181832,
89802, 127264, 150731, ...
        [152017, 21469, 181065,
45442, 186338, 55170, ...
        142
        169
    
    
        131272
        19556
        1
        [177562, 119155, 4050,
80338, 84221, 113372, ...
        [19556, 142061, 14344,
98752, 191884, 31489, ...
        101
        72
    
    
        49627
        29754
        1
        [54190, 29754, 84452,
126486, 208641, 135083, ...
        [29754, 40361, 126486,
209533, 29482, 105542, ...
        9
        11
    
    
        44402
        82787
        1
        [169958, 17147, 171733,
138035, 118118, 28173, ...
        [82787, 117486, 196181,
107067, 84349, 172619, ...
        307
        533
    


    
        src_all_friends
        src_all_degree
        src_bi_friends
        src_bi_degree
        dst_in_friends
    
    
        [80897.0, 46086.0,
10761.0, 56332.0, ...
        210
        [2304.0, 80897.0,
85769.0, 175755.0, ...
        37
        [16400, 59651, 209476,
58116, 200236, 168261, ...
    
    
        [146608.0, 34187.0,
20260.0, 166652.0, ...
        20
        [202891.0]
        1
        [55441, 88052, 104248,
168497, 30237, 61864, ...
    
    
        [171522.0, 169480.0,
122380.0, 139277.0, ...
        131
        [130689.0, 128259.0,
96214.0, 194310.0, ...
        72
        [59472, 97322, 22318,
106406, 208920, 191232, ...
    
    
        [21504.0, 189443.0,
176133.0, 110598.0, ...
        410
        [172039.0, 89625.0,
150362.0, 151074.0, ...
        93
        [79102, 46738, 92118,
127523, 169659, 67969, ...
    
    
        [147885.0, 148227.0,
85016.0, 123425.0, ...
        101
        [98563.0, 44406.0,
11142.0, 86153.0, 899.0, ...
        39
        [135415, 19114, 120093,
161025, 86153, 48014, ...
    
    
        [18432.0, 192513.0,
188420.0, 79883.0, ...
        608
        [75268.0, 7690.0,
170507.0, 138254.0, ...
        184
        [21650, 191196, 136266,
165000, 154970, 72022, ...
    
    
        [53254.0, 116396.0,
79883.0, 107778.0, ...
        233
        [75138.0, 197419.0,
75141.0, 68999.0, ...
        78
        [140358, 185088, 47848,
172151, 89802, 14902, ...
    
    
        [31489.0, 14344.0,
91145.0, 117258.0, ...
        132
        [176897.0, 159618.0,
36099.0, 14344.0, ...
        41
        [131272, 91512, 120125,
121418, 74268, 192961, ...
    
    
        [208641.0, 118339.0,
84452.0, 154853.0, ...
        18
        [29754.0, 126486.0]
        2
        [49627, 35625, 116496,
57902, 120441, 10251, ...
    
    
        [139264.0, 30727.0,
182281.0, 75787.0, ...
        585
        [139264.0, 80388.0,
30330.0, 209926.0, ...
        255
        [44402, 116002, 138049,
46956, 48378, 42649, ...
    


    
        dst_out_friends
        dst_out_degree
        dst_in_degree
        dst_all_friends
        dst_all_degree
    
    
        [44239, 83549, 58116,
51593, 58974, 22393, ...
        13
        26
        [185339.0, 132768.0,
59651.0, 58116.0, ...
        35
    
    
        [24309, 111920, 3614,
189367, 86360, 67261, ...
        54
        51
        [66824.0, 2825.0,
146698.0, 152605.0, ...
        68
    
    
        [76733, 117581, 101235,
105747, 80, 103174, ...
        475
        81
        [36864.0, 190465.0,
26626.0, 131086.0, ...
        500
    
    
        [92118, 165263, 172529,
27573, 203266, 48146, ...
        57
        148
        [147457.0, 139778.0,
110598.0, 108555.0, ...
        161
    
    
        [124927, 11369, 128442,
160446, 165522, 38165, ...
        100
        32
        [109575.0, 155662.0,
119825.0, 4867.0, ...
        110
    
    
        [136266, 161788, 158610,
97845, 103542, 33709, ...
        33
        20
        [59264.0, 5249.0,
174980.0, 23685.0, ...
        44
    
    
        [99670, 12501, 8152,
34452, 127380, 163437, ...
        249
        192
        [132100.0, 169989.0,
116396.0, 60429.0, ...
        295
    
    
        [191884, 97353, 169522,
143748, 113372, 165445, ...
        38
        38
        [167040.0, 176897.0,
49666.0, 143748.0, ...
        55
    
    
        [12437, 80439, 49627,
10251, 110007, 124514, ...
        9
        10
        [107945.0, 124514.0,
49627.0, 110007.0, ...
        17
    
    
        [203017, 48413, 146265,
18714, 10205, 155203, ...
        12
        99
        [90862.0, 177158.0,
203017.0, 82443.0, ...
        106
    


    
        dst_bi_friends
        dst_bi_degree
        common_friends
        common_bi_friends
        common_in_friends
        common_out_friends
    
    
        [147448.0, 121340.0,
59651.0, 58116.0] ...
        4
        6
        0
        3
        1
    
    
        [66824.0, 2825.0,
106891.0, 11411.0, ...
        37
        3
        0
        1
        1
    
    
        [191232.0, 125568.0,
16514.0, 128259.0, ...
        56
        17
        3
        3
        17
    
    
        [39808.0, 67969.0,
203266.0, 205955.0, ...
        44
        67
        8
        36
        27
    
    
        [86153.0, 48014.0,
52880.0, 114321.0, ...
        22
        19
        3
        3
        19
    
    
        [5249.0, 23685.0,
136266.0, 150830.0, ...
        9
        25
        3
        5
        22
    
    
        [132100.0, 135691.0,
103443.0, 3673.0, ...
        146
        134
        55
        88
        100
    
    
        [176897.0, 159618.0,
113372.0, 194085.0, ...
        21
        34
        18
        23
        26
    
    
        [10251.0, 49627.0]
        2
        0
        0
        0
        0
    
    
        [31152.0, 203017.0,
10205.0, 104078.0, ...
        5
        75
        4
        74
        4
    


    
        total_friends
        total_bi_friends
        total_in_friends
        total_out_friends
        jacc_coef
        bi_jacc_coef
    
    
        237
        41
        82
        198
        0.0253164556962
        0.0
    
    
        83
        38
        58
        64
        0.0361445783133
        0.0
    
    
        612
        123
        178
        557
        0.0277777777778
        0.0243902439024
    
    
        502
        129
        236
        407
        0.133466135458
        0.062015503876
    
    
        190
        56
        81
        165
        0.1
        0.0535714285714
    
    
        625
        190
        235
        581
        0.04
        0.0157894736842
    
    
        392
        167
        271
        289
        0.341836734694
        0.329341317365
    
    
        151
        42
        85
        111
        0.225165562914
        0.428571428571
    
    
        33
        2
        19
        16
        0.0
        0.0
    
    
        614
        256
        557
        314
        0.122149837134
        0.015625
    


    
        in_jacc_coef
        out_jacc_coef
    
    
        0.0365853658537
        0.00505050505051
    
    
        0.0172413793103
        0.015625
    
    
        0.0168539325843
        0.0305206463196
    
    
        0.152542372881
        0.0663390663391
    
    
        0.037037037037
        0.115151515152
    
    
        0.0212765957447
        0.0378657487091
    
    
        0.324723247232
        0.346020761246
    
    
        0.270588235294
        0.234234234234
    
    
        0.0
        0.0
    
    
        0.132854578097
        0.0127388535032
    

[10 rows x 31 columns]

Similar to the above method, we can also extract for each link, such as each user PageRank and neighborhood subgraph size. We let the reader to try to add these features (and maybe some additional features) by themselves.

Let us move to the next section, in which we explain how the extracted links' features can be utilized to create a link prediction classifier.

Constructing a Link Prediction Classifier

In order to create a link prediction classifier using our constructed links dataset (sf), lets first randomly split our dataset into training that contains 20% of the dataset, and testing datasets that contains 80% of the dataset.



In [10]:

    
train, test = sf_links.random_split(0.2)

We now can use GraphLab Create's classfication toolkit and create and evaluate a link prediction classifier based only on u and v degree features:



In [11]:

    
degree_features_list = [c for c in train.column_names() if "degree" in c]
print "Degree Features %s" % degree_features_list
cls = gl.classifier.create(train,features=degree_features_list, target="class")
results = cls.evaluate(test)
print results









    



Degree Features ['src_out_degree', 'src_in_degree', 'src_all_degree', 'src_bi_degree', 'dst_out_degree', 'dst_in_degree', 'dst_all_degree', 'dst_bi_degree']
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: The following methods are available for this type of problem.
PROGRESS: BoostedTreesClassifier, RandomForestClassifier, SVMClassifier, LogisticClassifier
PROGRESS: The returned model will be chosen according to validation accuracy.
PROGRESS: Boosted trees classifier:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 572642
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 8
PROGRESS: Number of unpacked features : 8
PROGRESS: Starting Boosted Trees
PROGRESS: --------------------------------------------------------
PROGRESS:   Iter      Accuracy          Elapsed time
PROGRESS:         (training) (validation)
PROGRESS:      0   8.777e-01   8.784e-01        2.43s
PROGRESS:      1   8.788e-01   8.792e-01        4.73s
PROGRESS:      2   8.790e-01   8.795e-01        7.43s
PROGRESS:      3   8.795e-01   8.799e-01       10.22s
PROGRESS:      4   8.800e-01   8.804e-01       15.51s
PROGRESS:      5   8.800e-01   8.808e-01       18.61s
PROGRESS:      6   8.801e-01   8.804e-01       20.28s
PROGRESS:      7   8.804e-01   8.806e-01       22.26s
PROGRESS:      8   8.803e-01   8.807e-01       24.04s
PROGRESS:      9   8.802e-01   8.809e-01       25.56s
PROGRESS: Random forest classifier:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 572642
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 8
PROGRESS: Number of unpacked features : 8
PROGRESS: Starting Boosted Trees
PROGRESS: --------------------------------------------------------
PROGRESS:   Iter      Accuracy          Elapsed time
PROGRESS:         (training) (validation)
PROGRESS:      0   8.778e-01   8.785e-01        1.17s
PROGRESS:      1   8.758e-01   8.761e-01        2.11s
PROGRESS:      2   8.789e-01   8.794e-01        3.06s
PROGRESS:      3   8.786e-01   8.792e-01        4.03s
PROGRESS:      4   8.784e-01   8.789e-01        5.10s
PROGRESS:      5   8.782e-01   8.790e-01        5.96s
PROGRESS:      6   8.782e-01   8.789e-01        6.85s
PROGRESS:      7   8.782e-01   8.790e-01        7.77s
PROGRESS:      8   8.782e-01   8.790e-01        9.28s
PROGRESS:      9   8.783e-01   8.794e-01       10.30s
PROGRESS: SVM:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 572642
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 8
PROGRESS: Number of unpacked features : 8
PROGRESS: Number of coefficients    : 9
PROGRESS: Starting L-BFGS
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 3        | 0.000002  | 1.483738     | 0.500084          | 0.498494            |
PROGRESS: | 2         | 6        | 5.000000  | 2.059202     | 0.759162          | 0.760997            |
PROGRESS: | 3         | 7        | 5.000000  | 2.363243     | 0.788224          | 0.788502            |
PROGRESS: | 4         | 10       | 0.500000  | 2.767786     | 0.818641          | 0.819813            |
PROGRESS: | 5         | 11       | 0.500000  | 2.999682     | 0.820097          | 0.821170            |
PROGRESS: | 6         | 12       | 0.500000  | 3.238497     | 0.820701          | 0.821931            |
PROGRESS: | 10        | 17       | 1.000000  | 5.005246     | 0.815630          | 0.816536            |
PROGRESS: +-----------+----------+-----------+--------------+-------------------+---------------------+
PROGRESS: Logistic regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 572642
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 8
PROGRESS: Number of unpacked features : 8
PROGRESS: Number of coefficients    : 9
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: | 1         | 2        | 2.047448     | 0.758268          | 0.758779            |
PROGRESS: | 2         | 3        | 3.694924     | 0.788889          | 0.790752            |
PROGRESS: | 3         | 4        | 5.178241     | 0.806286          | 0.807202            |
PROGRESS: | 4         | 5        | 6.195404     | 0.814799          | 0.816106            |
PROGRESS: | 5         | 6        | 6.890665     | 0.817518          | 0.819085            |
PROGRESS: | 6         | 7        | 7.453886     | 0.817960          | 0.819713            |
PROGRESS: | 7         | 8        | 8.026372     | 0.817986          | 0.819746            |
PROGRESS: | 8         | 9        | 8.587899     | 0.817986          | 0.819746            |
PROGRESS: +-----------+----------+--------------+-------------------+---------------------+
PROGRESS: Model selection based on validation accuracy:
PROGRESS: ---------------------------------------------
PROGRESS: BoostedTreesClassifier          : 0.880912190117
PROGRESS: RandomForestClassifier          : 0.879422765035
PROGRESS: SVMClassifier                   : 0.816536
PROGRESS: LogisticClassifier              : 0.819746
PROGRESS: ---------------------------------------------
PROGRESS: Selecting BoostedTreesClassifier based on validation set performance.
{'confusion_matrix': Columns:
	target_label	int
	predicted_label	int
	count	int

Rows: 4

Data:
+--------------+-----------------+---------+
| target_label | predicted_label |  count  |
+--------------+-----------------+---------+
|      0       |        1        |  62376  |
|      0       |        0        | 1143095 |
|      1       |        0        |  227131 |
|      1       |        1        |  978335 |
+--------------+-----------------+---------+
[4 rows x 3 columns]
, 'accuracy': 0.8799193010850138}

We get pretty good accuracy of 0.88. Let us add also the link based features and use the Boosted-Trees classifier to create and evaluate a link predicdiction classifier.



In [12]:

    
link_features_list = ['common_friends', 'common_in_friends', 'common_out_friends', 'common_bi_friends', 'total_friends', 'total_in_friends', 'total_out_friends', 'total_bi_friends',
'jacc_coef', 'bi_jacc_coef', 'in_jacc_coef', 'out_jacc_coef']
cls = gl.classifier.boosted_trees_classifier.create(train,target="class", features=degree_features_list + link_features_list)
results = cls.evaluate(test)
print results









    



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: Boosted trees classifier:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 572759
PROGRESS: Number of classes           : 2
PROGRESS: Number of feature columns   : 20
PROGRESS: Number of unpacked features : 20
PROGRESS: Starting Boosted Trees
PROGRESS: --------------------------------------------------------
PROGRESS:   Iter      Accuracy          Elapsed time
PROGRESS:         (training) (validation)
PROGRESS:      0   9.208e-01   9.185e-01        2.91s
PROGRESS:      1   9.238e-01   9.219e-01        5.34s
PROGRESS:      2   9.241e-01   9.221e-01        7.61s
PROGRESS:      3   9.334e-01   9.323e-01        9.99s
PROGRESS:      4   9.370e-01   9.349e-01       12.45s
PROGRESS:      5   9.386e-01   9.369e-01       14.79s
PROGRESS:      6   9.384e-01   9.370e-01       17.03s
PROGRESS:      7   9.393e-01   9.379e-01       19.54s
PROGRESS:      8   9.414e-01   9.405e-01       21.95s
PROGRESS:      9   9.424e-01   9.414e-01       24.39s
{'confusion_matrix': Columns:
	target_label	int
	predicted_label	int
	count	int

Rows: 4

Data:
+--------------+-----------------+---------+
| target_label | predicted_label |  count  |
+--------------+-----------------+---------+
|      0       |        1        |   9645  |
|      0       |        0        | 1195826 |
|      1       |        0        |  128186 |
|      1       |        1        | 1077280 |
+--------------+-----------------+---------+
[4 rows x 3 columns]
, 'accuracy': 0.9428309408333773}

Using the additional link features, we got considerbly better accuracy of around 0.94. We can try to further improve the accuracy by adding additional features or by increasing the size of the training dataset.

id	in_friends
211023	[175821, 191454, 42927, 19425, 20037, 84186, ...
21855	[117830, 202635, 7922, 148051, 98174, 10749, ...
88004	[184267, 165121, 156061, 25801, 82540, 200953, ...
79732	[166236, 156056, 42127, 173758, 44003, 113404, ...
63664	[133355, 135427, 189724, 52481, 74786, 148446, ...
127950	[108832, 124060, 191707, 64471, 18708, 31194, ...
7899	[19009, 14262, 166769, 92858, 45224, 202978, ...
25263	[47241, 145136, 57020, 2642, 47271, 60974, ...
130872	[132401, 79291, 158521, 197742, 6452, 48452, ...
208837	[122872, 14865, 23427, 86661, 108812, 42584, ...

id	in_friends	out_friends
211023	[175821, 191454, 42927, 19425, 20037, 84186, ...	[178359, 191454, 114855, 117085, 95731, 150495, ...
21855	[117830, 202635, 7922, 148051, 98174, 10749, ...	[41086, 49199, 75658, 55823, 67923, 199185, ...
88004	[184267, 165121, 156061, 25801, 82540, 200953, ...	[194693, 53575, 36045, 162977, 184267, 82540, ...
79732	[166236, 156056, 42127, 173758, 44003, 113404, ...	[42127, 56032, 190023, 137866, 56552, 166236, ...
63664	[133355, 135427, 189724, 52481, 74786, 148446, ...	[189724, 148446, 74702, 42791, 133355, 153655, ...
127950	[108832, 124060, 191707, 64471, 18708, 31194, ...	[179818, 185573, 85706, 207053, 27056, 44792, ...
7899	[19009, 14262, 166769, 92858, 45224, 202978, ...	[45224, 82615, 202978, 68792, 53078, 66750, ...
25263	[47241, 145136, 57020, 2642, 47271, 60974, ...	[91704, 23796, 71943, 32645, 189962, 109815, ...
130872	[132401, 79291, 158521, 197742, 6452, 48452, ...	[108790, 43254, 114610, 79291, 56216, 104160, ...
208837	[122872, 14865, 23427, 86661, 108812, 42584, ...	[193657, 203867, 177456, 107268, 12804, 161402, ...

all_degree	bi_friends	bi_degree
14	[191454.0]	1
15	[]	0
18	[162977.0, 184267.0, 82540.0] ...	3
16	[166236.0, 42127.0]	2
20	[133355.0, 189724.0, 148446.0, 74702.0] ...	4
44	[108832.0, 179740.0, 207053.0, 22318.0] ...	4
18	[45224.0, 202978.0, 82615.0] ...	3
16	[]	0
16	[79291.0]	1
24	[]	0

src	dst	class	src_in_friends	src_out_friends	src_out_degree	src_in_degree
16400	60081	1	[398, 164476, 158289, 25877, 45051, 103519, ...	[60081, 72458, 55255, 23667, 45842, 510, ...	187	60
55441	34187	1	[47069, 203601, 202891, 74718, 40220, 95893, ...	[34187, 169679, 20260, 2828, 166652, 104248, ...	12	9
59472	67007	1	[153627, 170956, 59785, 107075, 42942, 119730, ...	[67007, 68849, 13940, 66935, 207534, 202531, ...	101	102
79102	174248	1	[17824, 86736, 40718, 80705, 183565, 206244, ...	[174248, 58527, 94279, 82240, 150362, 115046, ...	378	125
135415	63941	1	[25504, 159023, 79854, 85472, 899, 180460, ...	[63941, 123425, 33739, 6473, 2925, 154425, ...	86	54
21650	154098	1	[105155, 20292, 166059, 71126, 510, 187171, ...	[154098, 171451, 145995, 10888, 164476, 122311, ...	571	221
140358	152017	1	[26754, 196070, 181832, 89802, 127264, 150731, ...	[152017, 21469, 181065, 45442, 186338, 55170, ...	142	169
131272	19556	1	[177562, 119155, 4050, 80338, 84221, 113372, ...	[19556, 142061, 14344, 98752, 191884, 31489, ...	101	72
49627	29754	1	[54190, 29754, 84452, 126486, 208641, 135083, ...	[29754, 40361, 126486, 209533, 29482, 105542, ...	9	11
44402	82787	1	[169958, 17147, 171733, 138035, 118118, 28173, ...	[82787, 117486, 196181, 107067, 84349, 172619, ...	307	533

src_all_friends	src_all_degree	src_bi_friends	src_bi_degree	dst_in_friends
[80897.0, 46086.0, 10761.0, 56332.0, ...	210	[2304.0, 80897.0, 85769.0, 175755.0, ...	37	[16400, 59651, 209476, 58116, 200236, 168261, ...
[146608.0, 34187.0, 20260.0, 166652.0, ...	20	[202891.0]	1	[55441, 88052, 104248, 168497, 30237, 61864, ...
[171522.0, 169480.0, 122380.0, 139277.0, ...	131	[130689.0, 128259.0, 96214.0, 194310.0, ...	72	[59472, 97322, 22318, 106406, 208920, 191232, ...
[21504.0, 189443.0, 176133.0, 110598.0, ...	410	[172039.0, 89625.0, 150362.0, 151074.0, ...	93	[79102, 46738, 92118, 127523, 169659, 67969, ...
[147885.0, 148227.0, 85016.0, 123425.0, ...	101	[98563.0, 44406.0, 11142.0, 86153.0, 899.0, ...	39	[135415, 19114, 120093, 161025, 86153, 48014, ...
[18432.0, 192513.0, 188420.0, 79883.0, ...	608	[75268.0, 7690.0, 170507.0, 138254.0, ...	184	[21650, 191196, 136266, 165000, 154970, 72022, ...
[53254.0, 116396.0, 79883.0, 107778.0, ...	233	[75138.0, 197419.0, 75141.0, 68999.0, ...	78	[140358, 185088, 47848, 172151, 89802, 14902, ...
[31489.0, 14344.0, 91145.0, 117258.0, ...	132	[176897.0, 159618.0, 36099.0, 14344.0, ...	41	[131272, 91512, 120125, 121418, 74268, 192961, ...
[208641.0, 118339.0, 84452.0, 154853.0, ...	18	[29754.0, 126486.0]	2	[49627, 35625, 116496, 57902, 120441, 10251, ...
[139264.0, 30727.0, 182281.0, 75787.0, ...	585	[139264.0, 80388.0, 30330.0, 209926.0, ...	255	[44402, 116002, 138049, 46956, 48378, 42649, ...

dst_out_friends	dst_out_degree	dst_in_degree	dst_all_friends	dst_all_degree
[44239, 83549, 58116, 51593, 58974, 22393, ...	13	26	[185339.0, 132768.0, 59651.0, 58116.0, ...	35
[24309, 111920, 3614, 189367, 86360, 67261, ...	54	51	[66824.0, 2825.0, 146698.0, 152605.0, ...	68
[76733, 117581, 101235, 105747, 80, 103174, ...	475	81	[36864.0, 190465.0, 26626.0, 131086.0, ...	500
[92118, 165263, 172529, 27573, 203266, 48146, ...	57	148	[147457.0, 139778.0, 110598.0, 108555.0, ...	161
[124927, 11369, 128442, 160446, 165522, 38165, ...	100	32	[109575.0, 155662.0, 119825.0, 4867.0, ...	110
[136266, 161788, 158610, 97845, 103542, 33709, ...	33	20	[59264.0, 5249.0, 174980.0, 23685.0, ...	44
[99670, 12501, 8152, 34452, 127380, 163437, ...	249	192	[132100.0, 169989.0, 116396.0, 60429.0, ...	295
[191884, 97353, 169522, 143748, 113372, 165445, ...	38	38	[167040.0, 176897.0, 49666.0, 143748.0, ...	55
[12437, 80439, 49627, 10251, 110007, 124514, ...	9	10	[107945.0, 124514.0, 49627.0, 110007.0, ...	17
[203017, 48413, 146265, 18714, 10205, 155203, ...	12	99	[90862.0, 177158.0, 203017.0, 82443.0, ...	106

dst_bi_friends	dst_bi_degree
[147448.0, 121340.0, 59651.0, 58116.0] ...	4
[66824.0, 2825.0, 106891.0, 11411.0, ...	37
[191232.0, 125568.0, 16514.0, 128259.0, ...	56
[39808.0, 67969.0, 203266.0, 205955.0, ...	44
[86153.0, 48014.0, 52880.0, 114321.0, ...	22
[5249.0, 23685.0, 136266.0, 150830.0, ...	9
[132100.0, 135691.0, 103443.0, 3673.0, ...	146
[176897.0, 159618.0, 113372.0, 194085.0, ...	21
[10251.0, 49627.0]	2
[31152.0, 203017.0, 10205.0, 104078.0, ...	5

total_friends	total_bi_friends	total_in_friends	total_out_friends	jacc_coef	bi_jacc_coef
237	41	82	198	0.0253164556962	0.0
83	38	58	64	0.0361445783133	0.0
612	123	178	557	0.0277777777778	0.0243902439024
502	129	236	407	0.133466135458	0.062015503876
190	56	81	165	0.1	0.0535714285714
625	190	235	581	0.04	0.0157894736842
392	167	271	289	0.341836734694	0.329341317365
151	42	85	111	0.225165562914	0.428571428571
33	2	19	16	0.0	0.0
614	256	557	314	0.122149837134	0.015625

in_jacc_coef	out_jacc_coef
0.0365853658537	0.00505050505051
0.0172413793103	0.015625
0.0168539325843	0.0305206463196
0.152542372881	0.0663390663391
0.037037037037	0.115151515152
0.0212765957447	0.0378657487091
0.324723247232	0.346020761246
0.270588235294	0.234234234234
0.0	0.0
0.132854578097	0.0127388535032

Link Prediction

Downloading the Dataset

Topological Feature Extraction

Constructing a Link Prediction Classifier

Further Readings