After Age of Search (see the Page Rank notebook), we are now in the Age of Recommendation. This notebook is about Netflix Recommendations using Simon Funk's algorithm as implemented in IncrementalSVD.jl by Aaron Windsor.
weight: 1 kg -> 1 kg $\pm$ 0.000000001 kg
running: 100 m -> 42,195 m ili 100 m < 10 sek
mathematics: exam -> state competition -> Olympiad
search, recommending: good -> excellent
google (and others)
NetFlix, Amazon Prime, PickBox, ... - on-line streaming of movies and shows
SVD decomposition $M=U\Sigma V^T$ is approximated by a low-rank matrix (for example. $rank=25$)
The approximation matrix is full and gives enough good information.
Prize for efficient approximation algorithm was $\$$ 1.000.000.
In [1]:
# using Pkg
# Pkg.add(PackageSpec(url="https://github.com/aaw/IncrementalSVD.jl"))
# or
# pkg> add https://github.com/aaw/IncrementalSVD.jl
In [1]:
using IncrementalSVD
In [2]:
varinfo(IncrementalSVD)
Out[2]:
In [3]:
rating_set = load_small_movielens_dataset()
Out[3]:
In [4]:
propertynames(rating_set)
Out[4]:
In [5]:
# The format is (user, movie, mark)
rating_set.training_set
Out[5]:
In [6]:
rating_set.test_set
Out[6]:
In [7]:
# Users and their IDs
rating_set.user_to_index
Out[7]:
In [8]:
# Movies and their IDs
rating_set.item_to_index
Out[8]:
In [19]:
# We can extract the titles ...
keys(rating_set.item_to_index)
Out[19]:
In [20]:
# or codes
values(rating_set.item_to_index)
Out[20]:
In [23]:
# Which movies did the user "3000" grade?
user_ratings(rating_set, "3000")
Out[23]:
In [24]:
# Let us find the exact title and code for "Blade Runner"
for k in keys(rating_set.item_to_index)
if occursin("Blade",k)
println(k)
end
end
In [19]:
# Did the user "3000" grade "Blade Runner" ?
for k in user_ratings(rating_set,"3000")
if occursin("Blade",k[1][2])
println(k)
end
end
In [25]:
# How did the user "3000" grade "Sling Blade" ?
for k in user_ratings(rating_set,"3000")
if occursin("Blade",k[1][2])
println(k)
end
end
In [21]:
get(rating_set.item_to_index,"Blade Runner (1982)",0)
Out[21]:
In [26]:
# This takes about half a minute
model = train(rating_set, 25);
In [27]:
propertynames(model)
Out[27]:
In [28]:
model.U
Out[28]:
In [29]:
model.S
Out[29]:
In [30]:
model.V
Out[30]:
In [38]:
similar_items(model, "Friday the 13th (1980)",max_results=20)
Out[38]:
In [37]:
# Take a look at the function
@which similar_items(model, "Friday the 13th (1980)")
Out[37]:
In [39]:
similar_items(model, "Citizen Kane (1941)")
Out[39]:
In [41]:
similar_users(model,"3000",max_results=20)
Out[41]:
In [42]:
# What is the opinion of user "3000" about "Blade Runner (1982)"
# in the approximate model (no true mark) ?
get_predicted_rating(model, "3000", "Blade Runner (1982)")
Out[42]:
In [43]:
# What is the opinion of user "3000" about "Citizen Kane (1941)"
# (no true mark!) ?
IncrementalSVD.get_predicted_rating(model, "3000", "Citizen Kane (1941)")
Out[43]:
In [44]:
# What is the opinion of user "3000" about "Sling Blade (1996)"
# in the approximate model (true mark 5.0) ?
IncrementalSVD.get_predicted_rating(model, "3000", "Sling Blade (1996)")
Out[44]:
In [45]:
# What is the opinion of user "3000" about "Time to Kill, A (1996)")
# in the approximate model (true mark 1.0) ?
IncrementalSVD.get_predicted_rating(model, "3000", "Time to Kill, A (1996)")
Out[45]:
In [ ]: