In [1]:
from IPython.display import Image
Image(filename='pagerank1.jpeg')
Out[1]:
Suppose we compute PageRank with a β of 0.7, and we introduce the additional constraint that the sum of the PageRanks of the three pages must be 3, to handle the problem that otherwise any multiple of a solution will also be a solution. Compute the PageRanks a, b, and c of the three pages A, B, and C, respectively.
In [2]:
import numpy as np
# Adjacency matrix
# m1 = [ 0, 0, 0]
# [0.5, 0, 0]
# [0.5, 1, 1]
m1 = np.matrix([[0, 0, 0],[0.5, 0, 0],[0.5, 1, 1]])
beta = 0.7
# r = beta * m1 * r + ((1-beta)/N)
def r_p(r):
return beta * m1 * r + np.matrix([0.1,0.1,0.1]).T
r = np.matrix([1.0/3,1.0/3,1.0/3]).T
for i in range(1000):
r = r_p(r)
print "Final PageRank: \n" + str(r*3)
In [3]:
a = r[0] * 3
b = r[1] * 3
c = r[2] * 3
print 'a = ', a
print 'b = ', b
print 'c = ', c
print 'a + b = ', a + b
print 'b + c = ', b + c
print 'a + c = ', a + c
In [4]:
Image(filename='pagerank2.jpeg')
Out[4]:
Suppose we compute PageRank with β=0.85. Write the equations for the PageRanks a, b, and c of the three pages A, B, and C, respectively. Then, identify the correct equations representing a, b and c.
In [5]:
import numpy as np
# Adjacency matrix
# m2 = [ 0, 0, 1]
# [0.5, 0, 0]
# [0.5, 1, 0]
m2 = np.matrix([[0, 0, 1],[0.5, 0, 0],[0.5, 1, 0]])
beta =0.85
def r_p(r):
return beta * m2 * r + np.matrix([0.05,0.05,0.05]).T
r = np.matrix([1.0/3,1.0/3,1.0/3]).T
for i in range(1000):
r = r_p(r)
print "Final PageRank: \n" + str(r)
In [6]:
a = r[0]
b = r[1]
c = r[2]
print "0.95a = ", 0.95*a, "= 0.9c + 0.05b = ", 0.9*c + 0.05*b
print "0.95b = ", 0.95*b, "= 0.475a + 0.05c = ", 0.475*a + 0.05*c
print "0.95c = ", 0.95*c, "= 0.9b + 0.475a = ", 0.9*b + 0.475*a
In [7]:
Image(filename='pagerank2.jpeg')
Out[7]:
Assuming no "taxation," compute the PageRanks a, b, and c of the three pages A, B, and C, using iteration, starting with the "0th" iteration where all three pages have rank a = b = c = 1. Compute as far as the 5th iteration, and also determine what the PageRanks are in the limit.
In [8]:
import numpy as np
# Adjacency matrix
# m3 = [ 0, 0, 1]
# [0.5, 0, 0]
# [0.5, 1, 0]
m3 = np.matrix([[0, 0, 1],[0.5, 0, 0],[0.5, 1, 0]])
beta = 1
r = np.matrix([1,1,1]).T
for i in range(50):
r = m3.dot(r)
print i+1
print r
In [9]:
print "Final PageRank: \n" + str(r)
Consider the link graph below. First, construct the L, the link matrix, as discussed in the HITS algorithm. Then do the following:
Now, identify the final estimates.
In [10]:
Image(filename='pagerank4.jpg')
Out[10]:
In [11]:
import numpy as np
# Function to normalize all values so that the largest value is 1
def norm(Matrix):
return Matrix/float(Matrix.max())
def estimate(L,h):
# To estimate of the authority vector a = LTh
#a = L.T*h
a = np.dot(L.T, h)
# Normalize a by dividing all values so the largest value is 1
a = norm(a)
# To estimate of the hubbiness vector h = La
#h = L*a
h = np.dot(L, a)
# Normalize h by dividing all values so the largest value is 1
h = norm(h)
return a,h
# The vector h is (the transpose of) [1,1,1,1]
h = np.matrix([1,1,1,1]).T
# The link graph: 1->2; 1->3; 2->1; 3->4; 4->3
L = np.matrix([[0,1,1,0],
[1,0,0,0],
[0,0,0,1],
[0,0,1,0]])
In [12]:
# After step 1
a,h = estimate(L,h)
print "After step 1:"
print "authority:", np.round(a.T, decimals=3)
print "hubbiness:", np.round(h.T, decimals=3)
In [13]:
# After step 2 (repeat of step 1)
a,h = estimate(L,h)
print "Final estimate:"
print "authority:", np.round(a.T, decimals=3)
print "hubbiness:", np.round(h.T, decimals=3)
Compute the Topic-Specific PageRank for the following link topology. Assume that pages selected for the teleport set are nodes 1 and 2 and that in the teleport set, the weight assigned for node 1 is twice that of node 2. Assume further that the teleport probability, (1 - beta), is 0.3. Which of the following statements is correct?
In [14]:
Image(filename='pagerank4.jpg')
Out[14]:
In [15]:
import numpy as np
A = np.matrix([[ 0.0, 0.5, 0.5, 0.0 ],
[ 1.0, 0.0, 0.0, 0.0 ],
[ 0.0, 0.0, 0.0, 1.0 ],
[ 0.0, 0.0, 1.0, 0.0 ],]).T
w = 1.0/3.0
B = np.matrix([[2*w, 2*w, 2*w, 2*w],
[w, w, w, w],
[0, 0, 0, 0],
[0, 0, 0, 0]])
beta = 0.7
In [16]:
r = np.ones((A.shape[0], 1)) / A.shape[0]
for i in range(50):
r = beta * np.dot(A, r) + (1 - beta) * np.dot(B, r)
print i+1
print r
In [17]:
Image(filename='pagerank5.jpeg')
Out[17]:
There, k "second-tier" nodes act as intermediaries. The target page t has only to link to the k second-tier pages, and each of those pages links to m/k of the m supporting pages. Each of the supporting pages links only to t (although most of these links are not shown). Suppose the taxation parameter is β = 0.85, and x is the amount of PageRank supplied from outside to the target page. Let n be the total number of pages in the Web. Finally, let y be the PageRank of target page t. If we compute the formula for y in terms of k, m, and n, we get a formula with the form
y = ax + bm/n + ck/n
Note: To arrive at this form, it is necessary at the last step to drop a low-order term that is a fraction of 1/n. Determine coefficients a, b, and c, remembering that β is fixed at 0.85. Then, identify the value of these coefficients.
In [18]:
import numpy as np
import math
beta = 0.85
a = 1.0/ (1 - np.power(beta, 3))
b = beta / (1.0 + beta + np.power(beta, 2))
c = np.power(beta, 2)/ (1.0 + beta + np.power(beta, 2))
print 'a = %f , b = %f , c = %f' % (a, b, c)
In [ ]: