Part 1: Triangle Type Frequency

The frequency on triangle types according to social balance theory


In [4]:
from snpp.utils.data import load_csv_network
from snpp.utils.graph import get_triangles
from itertools import combinations
from collections import Counter
from tqdm import tqdm
import pandas as pd

In [5]:
g = load_csv_network('data/soc-sign-slashdot.txt').to_undirected()

In [6]:
g = g.to_undirected()

In [7]:
tris = get_triangles(g)


100%|██████████| 77350/77350 [00:25<00:00, 3069.49it/s] 

In [22]:
freqs = Counter()
for tri in tqdm(tris):
    neg_cnt = sum(1 for u, v in combinations(tri, 2) 
                    if g[u][v]['sign'] == -1)
    freqs[neg_cnt] += 1    
print(freqs)


100%|██████████| 548054/548054 [00:05<00:00, 106135.38it/s]
Counter({0: 395623, 2: 75290, 1: 65323, 3: 11818})


In [23]:
vals = np.array(list(freqs.values()))
cols = list(map(lambda c: "{} triangle(s)".format(c),
            freqs.keys()))
df = pd.DataFrame(data=[vals / np.sum(vals) * 100],
                  columns=cols)
print(df)


   0 triangle(s)  1 triangle(s)  2 triangle(s)  3 triangle(s)
0      72.186865      11.919081      13.737697       2.156357

Conclusion

There are very few weakly-balanced triangles (~2.15%).

So weak social balance is not suitable here.

In other words, strong social balance is sufficient.

Part 2: Triangle-free edges

edges in none of the triangles v.s. #edges in at least one triangle

Why? if there are many triangle-free edges,

in other words, if the clustering-coefficient is low

then triangle-based method is not suitable


In [14]:
freqs = Counter()
for u, v in tqdm(g.edges_iter()):
    if set(g.adj[u]).intersection(set(g.adj[v])):
        freqs['#edge-without-triangle'] += 1
    else:
        freqs['#edge-with-triangle'] += 1

print(freqs)
vals = np.array(list(freqs.values()))
print(vals / np.sum(vals))


468554it [00:10, 46271.12it/s] 
Counter({'#edge-without-triangle': 237386, '#edge-with-triangle': 231168})
[ 0.50663531  0.49336469]

Conclusion

There are quite triangle-free edges (more than half of the edges), thus triangle-based method is not suitable.

This indicates higher order cycles can be considered. However, enumerating length-n cycles is time-consuming.