Building a song recommender

Fire up GraphLab Create


In [1]:
import graphlab

Load music data


In [2]:
song_data = graphlab.SFrame('song_data.gl/')


[INFO] This non-commercial license of GraphLab Create is assigned to j.ryan.rembert@gmail.com and will expire on October 13, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-2685 - Server binary: /Users/jrrembert/venvs/dato-env/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1447645794.log
[INFO] GraphLab Server Version: 1.6.1

Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.


In [ ]:
song_data.head()

In [ ]:
graphlab.canvas.set_target('ipynb')

In [ ]:
song_data['song'].show()

In [ ]:
len(song_data)

Count number of unique users in the dataset


In [ ]:
users = song_data['user_id'].unique()

In [ ]:
len(users)

Create a song recommender


In [ ]:
train_data,test_data = song_data.random_split(.8,seed=0)

Simple popularity-based recommender


In [ ]:
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id='user_id',
                                                         item_id='song')

Use the popularity model to make some predictions

A popularity model makes the same prediction for all users, so provides no personalization.


In [ ]:
popularity_model.recommend(users=[users[0]])

In [ ]:
popularity_model.recommend(users=[users[1]])

Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user.


In [ ]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.


In [ ]:
personalized_model.recommend(users=[users[0]])

In [ ]:
personalized_model.recommend(users=[users[1]])

We can also apply the model to find similar songs to any song in the dataset


In [ ]:
personalized_model.get_similar_items(['With Or Without You - U2'])

In [ ]:
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])

Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves.


In [ ]:
if graphlab.version[:3] >= "1.6":
    model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
    graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
    %matplotlib inline
    model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)

The curve shows that the personalized model provides much better performance.


In [ ]: