Building a song recommender

Fire up GraphLab Create



In [2]:

    
import graphlab

Load music data



In [3]:

    
song_data = graphlab.SFrame('song_data.gl/')

Explore data

Music data shows how many times a user listened to a song, as well as the details of the song.



In [4]:

    
song_data.head(5)









    Out[4]:





    
        user_id
        song_id
        listen_count
        title
        artist
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOAKIMP12A8C130995
        1
        The Cove
        Jack Johnson
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBBMDR12A8C13253B
        2
        Entre Dos Aguas
        Paco De Lucia
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBXHDL12A81C204C0
        1
        Stronger
        Kanye West
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBYHAJ12A6701BF1D
        1
        Constellations
        Jack Johnson
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SODACBL12A8C13C273
        1
        Learn To Fly
        Foo Fighters
    


    
        song
    
    
        The Cove - Jack Johnson
    
    
        Entre Dos Aguas - Paco De
Lucia ...
    
    
        Stronger - Kanye West
    
    
        Constellations - Jack
Johnson ...
    
    
        Learn To Fly - Foo
Fighters ...
    

[5 rows x 6 columns]

Showing the most popular songs in the dataset



In [5]:

    
graphlab.canvas.set_target('ipynb')



In [6]:

    
song_data['song'].show()



In [7]:

    
len(song_data)









    Out[7]:





1116609

Count number of unique users in the dataset



In [ ]:

    
users = song_data['user_id'].unique()



In [ ]:

    
len(users)

Create a song recommender



In [ ]:

    
train_data,test_data = song_data.random_split(.8,seed=0)

Simple popularity-based recommender



In [ ]:

    
popularity_model = graphlab.popularity_recommender.create(train_data,
                                                         user_id='user_id',
                                                         item_id='song')

Use the popularity model to make some predictions

A popularity model makes the same prediction for all users, so provides no personalization.



In [ ]:

    
popularity_model.recommend(users=[users[0]])



In [ ]:

    
popularity_model.recommend(users=[users[1]])

Build a song recommender with personalization

We now create a model that allows us to make personalized recommendations to each user.



In [ ]:

    
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')

Applying the personalized model to make song recommendations

As you can see, different users get different recommendations now.



In [ ]:

    
personalized_model.recommend(users=[users[0]])



In [ ]:

    
personalized_model.recommend(users=[users[1]])

We can also apply the model to find similar songs to any song in the dataset



In [ ]:

    
personalized_model.get_similar_items(['With Or Without You - U2'])



In [ ]:

    
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])

Quantitative comparison between the models

We now formally compare the popularity and the personalized models using precision-recall curves.



In [ ]:

    
if graphlab.version[:3] >= "1.6":
    model_performance = graphlab.compare(test_data, [popularity_model, personalized_model], user_sample=0.05)
    graphlab.show_comparison(model_performance,[popularity_model, personalized_model])
else:
    %matplotlib inline
    model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=.05)

The curve shows that the personalized model provides much better performance.



In [ ]:

user_id	song_id	listen_count	title	artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOAKIMP12A8C130995	1	The Cove	Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBBMDR12A8C13253B	2	Entre Dos Aguas	Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBXHDL12A81C204C0	1	Stronger	Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBYHAJ12A6701BF1D	1	Constellations	Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SODACBL12A8C13C273	1	Learn To Fly	Foo Fighters