In [3]:
import numpy as np
import math
In [19]:
## Q2 Solution.
def hash(x):
return math.fmod(3 * x + 2, 11)
for i in xrange(1,12):
print hash(i)
This question involves three different Bloom-filter-like scenarios. Each scenario involves setting to 1 certain bits of a 10-bit array, each bit of which is initially 0.
Scenario A:
Scenario B:
Scenario C:
Let a, b, and c be the expected number of bits set to 1 under scenarios A, B, and C, respectively. Which of the following correctly describes the relationships among a, b, and c?
In [14]:
## Q3 Solution.
prob = 1.0 / 10
a = (1 - prob)**4
print a
b = (1 - ( 1 - (1 - prob)**2) )**2
print b
c = (1 - (1.0 /10 * 1.0 / 9))
print c
support = 2 => {2,4,6,8, ...} & {3, 6, 9,...}
In [18]:
## Q5 Solution.
vec1 = np.array([2, 1, 1])
vec2 = np.array([10, -7, 1])
print vec1.dot(vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
In [21]:
## Q6 Solution.
# probability that they agree at one particular band
p1 = 0.6**2
print (1 - p1)**3
In [26]:
## Q7 Solution.
p1 = 1 - (1 - .9)**3
p2 = 1 - (1 - .1)**3
print "new LSH is (.4, .6, {}, {})-sensitive family".format(p1, p2)
Suppose the Web consists of four pages A, B, C, and D, that form a chain A-->B-->C-->D
We wish to compute the PageRank of each of these pages, but since D is a "dead end," we will "teleport" from D with probability 1 to one of the four pages, each with equal probability. We do not teleport from pages A, B, or C. Assuming the sum of the PageRanks of the four pages is 1, what is the PageRank of page B, correct to two decimal places?
In [41]:
## Q9 Solution.
M = np.array([[0, 0, 0, .25],
[1, 0, 0, .25],
[0, 1, 0, .25],
[0, 0, 1, .25]])
r = np.array([.25, .25, .25, .25])
for i in xrange(30):
r = M.dot(r)
print r
In [44]:
## Q10 Solution.
print 1 - (1 - .3)*(1 - .4)
In [48]:
##Q12
L = np.array([[-.25, -.5, -.76, -.29, -.03, -.07, -.01],
[-.05, -.1, -.15, .20, .26, .51, .77 ]]).T
print L
V = np.array([[6.74, 0],[0, 5.44]])
print V
R = np.array([[-.57, -.11, -.57, -.11, -.57],
[-.09, 0.70, -.09, .7, -.09]])
print R
print L.dot(V).dot(R)
Recall that the power iteration does r=X·r until converging, where X is a nxn matrix and n is the number of nodes in the graph.
Using the power iteration notation above, what is matrix X value when solving topic sensitive Pagerank with teleport set {0,1} for the following graph? Use beta=0.8. (Recall that the teleport set contains the destination nodes used when teleporting).
In [53]:
X = 0.8 * np.array([[1.0/3, 0, 0],
[1.0/3, 0, 0],
[1.0/3, 1, 0]])
X += 0.2 * np.array([[.5, .5, .5],
[.5, .5, .5],
[ 0, 0, 0]])
print X
Suppose we run K-means clustering over the following set of points in 2-d space using the L1 distance metric:
Recall that the L1 distance between two points is the sum of their distances along each dimension,
In [68]:
from scipy.cluster.vq import kmeans,vq
def L1(p1, p2):
return abs(p1[0] - p2[0]) + abs(p1[1] - p2[1])
c1 = [(1,1), (4,4)]
points = np.array([[1, 1], [2, 1], [2, 2], [3, 3], [4, 2], [2, 4], [4, 4]]
)
# for point in points:
# minDist = 9999
# minIdx = None
# for point in points:
# print points
All advertisers have a budget of 3, and ties are broken in favor of the advertiser with the lower index (e.g., A1 beats A2). Queries appear in the following order:
Q1, Q2, Q3, Q3, Q1, Q2, Q3, Q1, Q4, Q1
Which advertiser’s budget is exhausted first?
A 2 4P Q R S T
B 3 1 2
C 5 5
D 4 3 2 E 4 5 1
</pre>
In [69]:
## Solution
vec1 = np.array([0, 1, -1, 0, 0])
vec2 = np.array([0, 1, 0, 0, -1])
print vec1.dot(vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
The Utility Matrix below captures the ratings of 5 users (A,B,C,D,E) for 5 movies (P,Q,R,S,T). Each known rating is a number between 1 and 5, and blanks represent unknown ratings. Let (U,M) denote the rating of movie M by user U. We evaluate a Recommender System by withholding the ratings (A,P), (B,Q), and (C,S). The recommender system estimates (A,P)=1, (B,Q)=4, and (C,S)=5. What is the RMSE of the Recommender System, rounded to 2 decimal places?
P Q R S TA 2 4
B 3 1 2
C 5 5
D 4 3 2 E 4 5 1
</pre>
In [71]:
## Solution
print "RMSE = {}".format(math.sqrt(2.0 / 3))
In [ ]: