In this problem, we study the problem of finding dense communities in networks. Assume $G$ = ($V,E$) is an undirected graph (e.g., representing a social network).
Our goal is to find an induced subgraph of $G$ whose density is not much smaller than $\rho^\ast(G)$. Such a set is very densely connected, and hence may indicate a community in the network represented by $G$. Also, since the graphs of interest are usually very large in practice, we would like the algorithm to be highly scalable. We consider the following algorithm:
Require: G=(V,E) and ε>0
S ̃,S ← V
while S = ̸= ∅ do
A(S) := {i ∈ S | deg_S(i) ≤ 2(1+ε)ρ(S)}
S ← S \ A(S)
if ρ(S) > ρ(S ̃) then
S ̃ ← S
end if
end while
return S ̃
The basic idea in the algorithm is that the nodes with low degrees do not contribute much to the density of a dense subgraph, hence they can be removed without significantly influencing the density.
An undirected graph specified as a list of edges, is provided at 'livejournal-undirected.txt.zip.' The data set consists of 499923 vertices and 7794290 edges. Treat each line in the file as an undirected edge connecting the two given vertices. Implement the algorithm described above using a streaming model. For this problem, let the density of an empty set be 0.
In [17]:
%pylab inline
import pylab
In [1]:
def initialVertices(numNodes):
V = set();
for i in range(numNodes):
V.add(i);
return V;
In [2]:
def initializeDeg(deg, S):
for s in S:
deg[s] = 0;
In [3]:
def computeDegree(S, edgesfile):
fedges = open(edgesfile, 'r');
deg = {};
initializeDeg(deg, S)
for line in fedges:
split = map(int, line.split())
if split[0] in S and split[1] in S:
deg[split[0]] += 1;
deg[split[1]] += 1;
return deg;
In [4]:
def computeDensitySet(S, edgesfile):
fedges = open(edgesfile, 'r');
num = 0;
for line in fedges:
split = map(int, line.split())
if split[0] in S and split[1] in S:
num += 1;
return float(num) / len(S)
In [5]:
def computeASet(S, deg, maxDens):
A = set();
for s in S:
if deg[s] <= maxDens:
A.add(s);
return A;
In [6]:
def findDenseCommunity(V, edgesfile, eps):
tS = V;
densTS = computeDensitySet(tS, edgesfile);
S = V;
densS = densTS;
numIter = 0;
while len(S) > 0:
numIter += 1;
maxDens = 2 * (1 + eps) * densS;
deg = computeDegree(S, edgesfile);
A = computeASet(S, deg, maxDens);
S = S - A;
if len(S) == 0:
break;
densS = computeDensitySet(S, edgesfile);
if densS > densTS:
tS = S;
densTS = densS;
return tS, numIter;
In [7]:
def writeMapToFile(values, filename):
f = open(filename, 'w');
for k in values.keys():
f.write(k);
f.write(' ');
f.write(str(values[k]));
f.write('\n');
In [8]:
def writeVectorToFile(vec, filename):
f = open(filename, 'w');
for i in range(len(vec)):
f.write(str(i));
f.write(' ');
f.write(str(vec[i]));
f.write('\n');
In [9]:
def countEdges(edgesfile, S):
fedges = open(edgesfile, 'r');
res = 0;
for line in fedges:
split = map(int, line.split())
if split[0] in S and split[1] in S:
res += 1;
return res;
In [10]:
def findDenseCommunityModified(V, edgesfile, eps):
tS = V;
densTS = computeDensitySet(tS, edgesfile);
S = V;
densS = densTS;
numIter = 0;
sizeS = [];
eVec = [];
rhoVec = [];
while len(S) > 0:
rhoVec.append(densS);
sizeS.append(len(S));
eVec.append(countEdges(edgesfile, S));
numIter += 1;
maxDens = 2 * (1 + eps) * densS;
deg = computeDegree(S, edgesfile);
A = computeASet(S, deg, maxDens);
S = S - A;
if len(S) == 0:
break;
densS = computeDensitySet(S, edgesfile);
if densS > densTS:
tS = S;
densTS = densS;
return tS, numIter, rhoVec, eVec, sizeS;
In [11]:
edgesfile = 'livejournal-undirected.txt';
numNodes = 499923;
numEdges = 7794290;
V = initialVertices(numNodes);
In [12]:
epsVec = [0.1, 0.5, 1, 2];
iterVec = {}
for i in range(len(epsVec)):
tS, numIter = findDenseCommunity(V, edgesfile, epsVec[i]);
iterVec[epsVec[i]] = numIter;
print(iterVec)
In [13]:
eps = 0.05;
tS, numIter, rhoVec, eVec, sizeS = findDenseCommunityModified(V, edgesfile, eps);
writeVectorToFile(rhoVec, 'rhoVecii');
writeVectorToFile(eVec, 'eVecii');
writeVectorToFile(sizeS, 'sizeSii');
print(numIter);
In [45]:
rhoVii = pylab.loadtxt('rhoVecii')
eVii = pylab.loadtxt('eVecii')
sizSii = pylab.loadtxt('sizeSii')
f2 = pylab.figure(figsize=(9,18))
p1 = f2.add_subplot(311)
p1.plot(rhoVii[:,0], rhoVii[:,1])
p1.set_xlabel("Iteration #")
p1.set_ylabel("rho(S)")
p2 = f2.add_subplot(312)
p2.plot(eVii[:,0], eVii[:,1])
p2.set_xlabel("Iteration #")
p2.set_ylabel("|S|")
p3 = f2.add_subplot(313)
p3.plot(sizSii[:,0], sizSii[:,1])
p3.set_xlabel("Iteration #")
p3.set_ylabel("|E(S)|")
Out[45]:
The algorithm above only describes how to find one dense component (or community). It is also possible to find multiple components by running the above algorithm to find the first dense component and then deleting all vertices (and edges) belonging to that component. To find the next dense component, run the same algorithm on the modified graph. Plot separate graphs showing $\rho(\tilde{S_j})$, |$E[\tilde{S_j}]$| and |$\tilde{S_j}$| as a function of $j$ where $j$ is the current community that has been found. You can stop if 20 communities have already been found. Have $\epsilon$ = 0.05.
Note: the simulation can take over an hour because you have to run the algorithm to find communities 20 times.
In [14]:
eps = 0.05;
numCommunities = 20;
In [15]:
def findSeveralDenseCommunities(V, edgesfile, eps, numCommunities):
rhoVec = [];
sizeTS = [];
eVec = [];
for i in range(numCommunities):
print(str(i));
tS, numIter = findDenseCommunity(V, edgesfile, eps);
sizeTS.append(len(tS));
eVec.append(countEdges(edgesfile, tS));
rhoVec.append(computeDensitySet(tS, edgesfile));
V = V - tS;
return rhoVec, eVec, sizeTS;
In [16]:
rhoVec, eVec, sizeTS = findSeveralDenseCommunities(V, edgesfile, eps, numCommunities);
writeVectorToFile(rhoVec, 'rhoVeciii');
writeVectorToFile(eVec, 'eVeciii');
writeVectorToFile(sizeTS, 'sizeTSiii');
In [46]:
rhoViii = pylab.loadtxt('rhoVeciii')
eViii = pylab.loadtxt('eVeciii')
sTSii = pylab.loadtxt('sizeTSiii')
f3 = pylab.figure(figsize=(9,18))
p1 = f3.add_subplot(311)
p1.plot(rhoViii[:,0], rhoViii[:,1])
p1.set_xlabel("Community #")
p1.set_ylabel("rho(S~)")
p2 = f3.add_subplot(312)
p2.plot(eViii[:,0], eViii[:,1])
p2.set_xlabel("Community #")
p2.set_ylabel("|S~|")
p3 = f3.add_subplot(313)
p3.plot(sTSii[:,0], sTSii[:,1])
p3.set_xlabel("Community #")
p3.set_ylabel("|E(S~)|")
Out[46]:
In [ ]: