In [2]:

    
import graphlab as gl



In [3]:

    
song_data = gl.SFrame('song_data.gl/')









    



[INFO] This non-commercial license of GraphLab Create is assigned to iliassweb@gmail.comand will expire on September 22, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-6590 - Server binary: /home/zax/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1445494515.log
[INFO] GraphLab Server Version: 1.6.1

Explore data



In [5]:

    
song_data.head()









    Out[5]:





    
        user_id
        song_id
        listen_count
        title
        artist
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOAKIMP12A8C130995
        1
        The Cove
        Jack Johnson
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBBMDR12A8C13253B
        2
        Entre Dos Aguas
        Paco De Lucia
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBXHDL12A81C204C0
        1
        Stronger
        Kanye West
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOBYHAJ12A6701BF1D
        1
        Constellations
        Jack Johnson
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SODACBL12A8C13C273
        1
        Learn To Fly
        Foo Fighters
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SODDNQT12A6D4F5F7E
        5
        Apuesta Por El Rock 'N'
Roll ...
        Héroes del Silencio
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SODXRTY12AB0180F3B
        1
        Paper Gangsta
        Lady GaGa
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOFGUAY12AB017B0A8
        1
        Stacked Actors
        Foo Fighters
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOFRQTD12A81C233C0
        1
        Sehr kosmisch
        Harmonia
    
    
        b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
        SOHQWYZ12A6D4FA701
        1
        Heaven's gonna burn your
eyes ...
        Thievery Corporation
feat. Emiliana Torrini ...
    


    
        song
    
    
        The Cove - Jack Johnson
    
    
        Entre Dos Aguas - Paco De
Lucia ...
    
    
        Stronger - Kanye West
    
    
        Constellations - Jack
Johnson ...
    
    
        Learn To Fly - Foo
Fighters ...
    
    
        Apuesta Por El Rock 'N'
Roll - Héroes del ...
    
    
        Paper Gangsta - Lady GaGa
    
    
        Stacked Actors - Foo
Fighters ...
    
    
        Sehr kosmisch - Harmonia
    
    
        Heaven's gonna burn your
eyes - Thievery ...
    

[10 rows x 6 columns]



In [4]:

    
gl.canvas.set_target('ipynb')



In [6]:

    
song_data['song'].show()



In [7]:

    
len(song_data)









    Out[7]:





1116609

Count number of users



In [34]:

    
users = song_data['user_id'].unique()



In [9]:

    
len(users)









    Out[9]:





66346

Built a recommender system

Split the data into train and test data



In [6]:

    
train_data, test_data = song_data.random_split(.8, seed=0)

Simple popularity-based recommender



In [11]:

    
popularity_model = gl.popularity_recommender.create(train_data,
                                                    user_id='user_id',
                                                   item_id='song'
                                                   )









    



PROGRESS: Recsys training: model = popularity
PROGRESS: Warning: Ignoring columns song_id, listen_count, title, artist;
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.34965s
PROGRESS: 893580 observations to process; with 9952 unique items.

Use the popularity model to make some predictions



In [14]:

    
popularity_model.recommend(users=[users[0]])









    Out[14]:





    
        user_id
        song
        score
        rank
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Sehr kosmisch - Harmonia
        4754.0
        1
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Undo - Björk
        4227.0
        2
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        You're The One - Dwight
Yoakam ...
        3781.0
        3
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Dog Days Are Over (Radio
Edit) - Florence + The ...
        3633.0
        4
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Revelry - Kings Of Leon
        3527.0
        5
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Horn Concerto No. 4 in E
flat K495: II. Romance ...
        3161.0
        6
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Secrets - OneRepublic
        3148.0
        7
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Fireflies - Charttraxx
Karaoke ...
        2532.0
        8
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Tive Sim - Cartola
        2521.0
        9
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Drop The World - Lil
Wayne / Eminem ...
        2053.0
        10
    

[10 rows x 4 columns]

User 1 recommendation



In [16]:

    
popularity_model.recommend(users=[users[1]])









    Out[16]:





    
        user_id
        song
        score
        rank
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Sehr kosmisch - Harmonia
        4754.0
        1
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Undo - Björk
        4227.0
        2
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        You're The One - Dwight
Yoakam ...
        3781.0
        3
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Dog Days Are Over (Radio
Edit) - Florence + The ...
        3633.0
        4
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Revelry - Kings Of Leon
        3527.0
        5
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Horn Concerto No. 4 in E
flat K495: II. Romance ...
        3161.0
        6
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Secrets - OneRepublic
        3148.0
        7
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Hey_ Soul Sister - Train
        2538.0
        8
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Fireflies - Charttraxx
Karaoke ...
        2532.0
        9
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Tive Sim - Cartola
        2521.0
        10
    

[10 rows x 4 columns]

Reommender with personalization



In [18]:

    
personalized_model = gl.item_similarity_recommender.create(train_data,
                                                          user_id = 'user_id',
                                                           item_id = 'song'
                                                          )









    



PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Ignoring columns song_id, listen_count, title, artist;
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.39029s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 1.85622         |
PROGRESS: | 2000            | 1.93614         |
PROGRESS: | 3000            | 2.0068          |
PROGRESS: | 4000            | 2.08326         |
PROGRESS: | 5000            | 2.16111         |
PROGRESS: | 6000            | 2.23223         |
PROGRESS: | 7000            | 2.30307         |
PROGRESS: | 8000            | 2.37842         |
PROGRESS: | 9000            | 2.46742         |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 2.76698s

Apply thepersonalized model to make recommendation



In [19]:

    
personalized_model.recommend(users= [users[0]])









    Out[19]:





    
        user_id
        song
        score
        rank
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Cuando Pase El Temblor -
Soda Stereo ...
        0.0194504525792
        1
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Fireflies - Charttraxx
Karaoke ...
        0.0145048191769
        2
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Love Is A Losing Game -
Amy Winehouse ...
        0.0142992063828
        3
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Marry Me - Train
        0.0141649731998
        4
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Secrets - OneRepublic
        0.0136169436052
        5
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Sehr kosmisch - Harmonia
        0.0134355710515
        6
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        No Dejes Que... -
Caifanes ...
        0.0134191754754
        7
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Y solo se me ocurre
amarte (Unplugged) - ...
        0.0133210385369
        8
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        Te Hacen Falta Vitaminas
- Soda Stereo ...
        0.0129302853556
        9
    
    
        c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
        OMG - Usher featuring
will.i.am ...
        0.0128012717244
        10
    

[10 rows x 4 columns]



In [20]:

    
personalized_model.recommend(users=[users[1]])









    Out[20]:





    
        user_id
        song
        score
        rank
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Riot In Cell Block Number
Nine - Dr Feelgood ...
        0.0375
        1
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Sei Lá Mangueira -
Elizeth Cardoso ...
        0.0331632653061
        2
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        The Stallion - Ween
        0.0322580645161
        3
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Rain - Subhumans
        0.0314716312057
        4
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        West One (Shine On Me) -
The Ruts ...
        0.0307080895662
        5
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Back Against The Wall -
Cage The Elephant ...
        0.0301204819277
        6
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Life Less Frightening -
Rise Against ...
        0.0284431137725
        7
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        A Beggar On A Beach Of
Gold - Mike And The ...
        0.0230024907156
        8
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Audience Of One - Rise
Against ...
        0.0193938442211
        9
    
    
        279292bb36dbfc7f505e36ebf
038c81eb1d1d63e ...
        Blame It On The Boogie -
The Jacksons ...
        0.0189873417722
        10
    

[10 rows x 4 columns]



In [21]:

    
personalized_model.get_similar_items(['The Stallion - Ween'])









    



PROGRESS: Getting similar items completed in 0.001846






    Out[21]:





    
        song
        similar
        score
        rank
    
    
        The Stallion - Ween
        Blame It On The Boogie -
The Jacksons ...
        0.179104477612
        1
    
    
        The Stallion - Ween
        Absence of Fear - War Of
Ages ...
        0.129032258065
        2
    
    
        The Stallion - Ween
        Faint Resemblance - Rise
Against ...
        0.121739130435
        3
    
    
        The Stallion - Ween
        Entertainment - Rise
Against ...
        0.118055555556
        4
    
    
        The Stallion - Ween
        Halfway There - Rise
Against ...
        0.115384615385
        5
    
    
        The Stallion - Ween
        To The Core - Rise
Against ...
        0.115044247788
        6
    
    
        The Stallion - Ween
        Long Forgotten Sons -
Rise Against ...
        0.112426035503
        7
    
    
        The Stallion - Ween
        Riot In Cell Block Number
Nine - Dr Feelgood ...
        0.111764705882
        8
    
    
        The Stallion - Ween
        Great Awakening - Rise
Against ...
        0.0887096774194
        9
    
    
        The Stallion - Ween
        Hairline Fracture - Rise
Against ...
        0.0866141732283
        10
    

[10 rows x 4 columns]



In [22]:

    
personalized_model.get_similar_items(['Chan Chan (Live) - Buena Vista Social Club'])









    



PROGRESS: Getting similar items completed in 0.003067






    Out[22]:





    
        song
        similar
        score
        rank
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Murmullo - Buena Vista
Social Club ...
        0.188118811881
        1
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        La Bayamesa - Buena Vista
Social Club ...
        0.187192118227
        2
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Amor de Loca Juventud -
Buena Vista Social Club ...
        0.184834123223
        3
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Diferente - Gotan Project
        0.0214592274678
        4
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Mistica - Orishas
        0.0205761316872
        5
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Hotel California - Gipsy
Kings ...
        0.019305019305
        6
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Nací Orishas - Orishas
        0.0191570881226
        7
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Le Moulin - Yann Tiersen
        0.0187969924812
        8
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Gitana - Willie Colon
        0.0187969924812
        9
    
    
        Chan Chan (Live) - Buena
Vista Social Club ...
        Criminal - Gotan Project
        0.018779342723
        10
    

[10 rows x 4 columns]

Quantitative comparison between the models

Precision-Recall



In [27]:

    
import matplotlib
%matplotlib inline
model_performance = gl.recommender.util.compare_models(test_data,
                                                      [popularity_model, personalized_model],
                                                      user_sample=0.05)









    



compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
PROGRESS: recommendations finished on 1000/2931 queries. users per second: 10665.1
PROGRESS: recommendations finished on 2000/2931 queries. users per second: 12174.8

Precision and recall summary statistics by cutoff





    



[WARNING] Model trained without a target. Skipping RMSE computation.






    



+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0296827021494 | 0.00775176338124 |
|   2    | 0.0279767997271 | 0.0148200410227  |
|   3    | 0.0260434436484 | 0.0203007756488  |
|   4    | 0.0242238143978 | 0.0248677324419  |
|   5    | 0.0225861480723 | 0.0295442066936  |
|   6    | 0.0216080973502 | 0.0346141254023  |
|   7    | 0.0204220889994 | 0.0379537678668  |
|   8    | 0.0198311156602 | 0.0422943455953  |
|   9    | 0.0187649266462 | 0.0448458380393  |
|   10   |  0.017911975435 | 0.0478489032481  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
PROGRESS: recommendations finished on 1000/2931 queries. users per second: 1549.2
PROGRESS: recommendations finished on 2000/2931 queries. users per second: 1538.02

Precision and recall summary statistics by cutoff





    



[WARNING] Model trained without a target. Skipping RMSE computation.






    



+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.195496417605 | 0.0650005691075 |
|   2    |  0.159501876493 | 0.0996978063636 |
|   3    |  0.141248720573 |  0.126881945626 |
|   4    |  0.12572500853  |  0.146405349361 |
|   5    |  0.114227226203 |  0.162845861349 |
|   6    |  0.104912998976 |  0.177262898309 |
|   7    | 0.0971876980065 |  0.189555715144 |
|   8    | 0.0907540088707 |  0.201852444534 |
|   9    | 0.0846127601501 |  0.21178175335  |
|   10   | 0.0805527123849 |  0.222249525371 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]



In [28]:

    
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots()

pr_curves_by_model = [res['precision_recall_overall'] for res in model_performance]

pr_curve = pr_curves_by_model[0].sort('recall')
ax.plot(list(pr_curve['recall']), list(pr_curve['precision']),
        'blue', label='M1')

pr_curve = pr_curves_by_model[1].sort('recall')
ax.plot(list(pr_curve['recall']), list(pr_curve['precision']),
        'green', label='M2')

ax.set_title('Precision-Recall Averaged Over Users')
ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
ax.legend()

fig.show()









    



/home/zax/anaconda/lib/python2.7/site-packages/matplotlib/figure.py:387: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure
  "matplotlib is currently using a non-GUI backend, "

Counting unique users for some artists

Kanye Wesy



In [35]:

    
Kanye_West = song_data[song_data['artist']=='Kanye West']



In [38]:

    
Kanye_West['user_id'].unique().show()

Kanye_West unique users



In [41]:

    
Kanye_West_users=Kanye_West['user_id'].unique()



In [42]:

    
len(Kanye_West_users)









    Out[42]:





2522

Foo Fighters unique users



In [45]:

    
Foo_Fighters_users = song_data[song_data['artist']=='Foo Fighters']['user_id'].unique()



In [47]:

    
len(Foo_Fighters_users)









    Out[47]:





2055

Taylor Swift



In [53]:

    
Taylor_Swift_users = song_data[song_data['artist']=='Taylor Swift']['user_id'].unique()



In [55]:

    
len(Taylor_Swift_users)









    Out[55]:





3246

Lady GaGa



In [56]:

    
Lady_GaGa_users = song_data[song_data['artist']=='Lady GaGa']['user_id'].unique()



In [58]:

    
len(Lady_GaGa_users)









    Out[58]:





2928

Groupby-aggregate

Most popular artist



In [8]:

    
groupby_artist = song_data.groupby(key_columns='artist', operations={'total_count': gl.aggregate.SUM('listen_count')})

Sorting groupby_artist



In [10]:

    
groupby_artist









    Out[10]:





    
        artist
        total_count
    
    
        The Dells
        274
    
    
        Lil Jon / The East Side
Boyz ...
        197
    
    
        Tom Petty And The
Heartbreakers ...
        2867
    
    
        Blackstreet
        747
    
    
        Ratatat
        3727
    
    
        Shotta
        82
    
    
        Airscape
        130
    
    
        Mecano
        172
    
    
        Moimir Papalescu & The
Nihilists ...
        177
    
    
        Brad Paisley
        2731
    

[3375 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [19]:

    
groupby_artist.sort('total_count')









    Out[19]:





    
        artist
        total_count
    
    
        William Tabbert
        14
    
    
        Reel Feelings
        24
    
    
        Beyoncé feat. Bun B and
Slim Thug ...
        26
    
    
        Diplo
        30
    
    
        Boggle Karaoke
        30
    
    
        harvey summers
        31
    
    
        Nâdiya
        36
    
    
        Kanye West / Talib Kweli
/ Q-Tip / Common / ...
        38
    
    
        Aneta Langerova
        38
    
    
        Jody Bernal
        38
    

[3375 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [20]:

    
groupby_artist.sort('total_count', ascending=False)









    Out[20]:





    
        artist
        total_count
    
    
        Kings Of Leon
        43218
    
    
        Dwight Yoakam
        40619
    
    
        Björk
        38889
    
    
        Coldplay
        35362
    
    
        Florence + The Machine
        33387
    
    
        Justin Bieber
        29715
    
    
        Alliance Ethnik
        26689
    
    
        OneRepublic
        25754
    
    
        Train
        25402
    
    
        The Black Keys
        22184
    

[3375 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Split the data into train and test data



In [22]:

    
train_data,test_data = song_data.random_split(.8, seed=0)



In [23]:

    
subset_test_data = test_data['user_id'].unique()[0:10000]



In [25]:

    
personalized_model = gl.item_similarity_recommender.create(train_data,
                                                   user_id = 'user_id',
                                                   item_id = 'song')









    



PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Ignoring columns song_id, listen_count, title, artist;
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.23434s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 1.68233         |
PROGRESS: | 2000            | 1.7462          |
PROGRESS: | 3000            | 1.80827         |
PROGRESS: | 4000            | 1.87111         |
PROGRESS: | 5000            | 1.95594         |
PROGRESS: | 6000            | 2.02921         |
PROGRESS: | 7000            | 2.10021         |
PROGRESS: | 8000            | 2.17565         |
PROGRESS: | 9000            | 2.2453          |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 2.54794s



In [27]:

    
# 1 recommendation for each of these users
recommneded_song = personalized_model.recommend(subset_test_data, k=1)









    



PROGRESS: recommendations finished on 1000/10000 queries. users per second: 1714.06
PROGRESS: recommendations finished on 2000/10000 queries. users per second: 1800.46
PROGRESS: recommendations finished on 3000/10000 queries. users per second: 1833.78
PROGRESS: recommendations finished on 4000/10000 queries. users per second: 1833.45
PROGRESS: recommendations finished on 5000/10000 queries. users per second: 1828.53
PROGRESS: recommendations finished on 6000/10000 queries. users per second: 1833.59
PROGRESS: recommendations finished on 7000/10000 queries. users per second: 1828.71
PROGRESS: recommendations finished on 8000/10000 queries. users per second: 1835.12
PROGRESS: recommendations finished on 9000/10000 queries. users per second: 1838.31
PROGRESS: recommendations finished on 10000/10000 queries. users per second: 1838.18



In [ ]:

    
recommneded_song = song_data.groupby(key_column = 'song')

user_id	song_id	listen_count	title	artist
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOAKIMP12A8C130995	1	The Cove	Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBBMDR12A8C13253B	2	Entre Dos Aguas	Paco De Lucia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBXHDL12A81C204C0	1	Stronger	Kanye West
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOBYHAJ12A6701BF1D	1	Constellations	Jack Johnson
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SODACBL12A8C13C273	1	Learn To Fly	Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SODDNQT12A6D4F5F7E	5	Apuesta Por El Rock 'N' Roll ...	Héroes del Silencio
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SODXRTY12AB0180F3B	1	Paper Gangsta	Lady GaGa
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOFGUAY12AB017B0A8	1	Stacked Actors	Foo Fighters
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOFRQTD12A81C233C0	1	Sehr kosmisch	Harmonia
b80344d063b5ccb3212f76538 f3d9e43d87dca9e ...	SOHQWYZ12A6D4FA701	1	Heaven's gonna burn your eyes ...	Thievery Corporation feat. Emiliana Torrini ...

user_id	song	score	rank
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Sehr kosmisch - Harmonia	4754.0	1
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Undo - Björk	4227.0	2
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	You're The One - Dwight Yoakam ...	3781.0	3
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Dog Days Are Over (Radio Edit) - Florence + The ...	3633.0	4
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Revelry - Kings Of Leon	3527.0	5
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Horn Concerto No. 4 in E flat K495: II. Romance ...	3161.0	6
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Secrets - OneRepublic	3148.0	7
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Fireflies - Charttraxx Karaoke ...	2532.0	8
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Tive Sim - Cartola	2521.0	9
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Drop The World - Lil Wayne / Eminem ...	2053.0	10

user_id	song	score	rank
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Sehr kosmisch - Harmonia	4754.0	1
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Undo - Björk	4227.0	2
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	You're The One - Dwight Yoakam ...	3781.0	3
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Dog Days Are Over (Radio Edit) - Florence + The ...	3633.0	4
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Revelry - Kings Of Leon	3527.0	5
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Horn Concerto No. 4 in E flat K495: II. Romance ...	3161.0	6
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Secrets - OneRepublic	3148.0	7
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Hey_ Soul Sister - Train	2538.0	8
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Fireflies - Charttraxx Karaoke ...	2532.0	9
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Tive Sim - Cartola	2521.0	10

user_id	song	score	rank
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Cuando Pase El Temblor - Soda Stereo ...	0.0194504525792	1
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Fireflies - Charttraxx Karaoke ...	0.0145048191769	2
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Love Is A Losing Game - Amy Winehouse ...	0.0142992063828	3
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Marry Me - Train	0.0141649731998	4
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Secrets - OneRepublic	0.0136169436052	5
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Sehr kosmisch - Harmonia	0.0134355710515	6
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	No Dejes Que... - Caifanes ...	0.0134191754754	7
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Y solo se me ocurre amarte (Unplugged) - ...	0.0133210385369	8
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	Te Hacen Falta Vitaminas - Soda Stereo ...	0.0129302853556	9
c66c10a9567f0d82ff31441a9 fd5063e5cd9dfe8 ...	OMG - Usher featuring will.i.am ...	0.0128012717244	10

user_id	song	score	rank
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Riot In Cell Block Number Nine - Dr Feelgood ...	0.0375	1
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Sei Lá Mangueira - Elizeth Cardoso ...	0.0331632653061	2
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	The Stallion - Ween	0.0322580645161	3
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Rain - Subhumans	0.0314716312057	4
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	West One (Shine On Me) - The Ruts ...	0.0307080895662	5
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Back Against The Wall - Cage The Elephant ...	0.0301204819277	6
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Life Less Frightening - Rise Against ...	0.0284431137725	7
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	A Beggar On A Beach Of Gold - Mike And The ...	0.0230024907156	8
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Audience Of One - Rise Against ...	0.0193938442211	9
279292bb36dbfc7f505e36ebf 038c81eb1d1d63e ...	Blame It On The Boogie - The Jacksons ...	0.0189873417722	10

song	similar	score	rank
The Stallion - Ween	Blame It On The Boogie - The Jacksons ...	0.179104477612	1
The Stallion - Ween	Absence of Fear - War Of Ages ...	0.129032258065	2
The Stallion - Ween	Faint Resemblance - Rise Against ...	0.121739130435	3
The Stallion - Ween	Entertainment - Rise Against ...	0.118055555556	4
The Stallion - Ween	Halfway There - Rise Against ...	0.115384615385	5
The Stallion - Ween	To The Core - Rise Against ...	0.115044247788	6
The Stallion - Ween	Long Forgotten Sons - Rise Against ...	0.112426035503	7
The Stallion - Ween	Riot In Cell Block Number Nine - Dr Feelgood ...	0.111764705882	8
The Stallion - Ween	Great Awakening - Rise Against ...	0.0887096774194	9
The Stallion - Ween	Hairline Fracture - Rise Against ...	0.0866141732283	10

song	similar	score	rank
Chan Chan (Live) - Buena Vista Social Club ...	Murmullo - Buena Vista Social Club ...	0.188118811881	1
Chan Chan (Live) - Buena Vista Social Club ...	La Bayamesa - Buena Vista Social Club ...	0.187192118227	2
Chan Chan (Live) - Buena Vista Social Club ...	Amor de Loca Juventud - Buena Vista Social Club ...	0.184834123223	3
Chan Chan (Live) - Buena Vista Social Club ...	Diferente - Gotan Project	0.0214592274678	4
Chan Chan (Live) - Buena Vista Social Club ...	Mistica - Orishas	0.0205761316872	5
Chan Chan (Live) - Buena Vista Social Club ...	Hotel California - Gipsy Kings ...	0.019305019305	6
Chan Chan (Live) - Buena Vista Social Club ...	Nací Orishas - Orishas	0.0191570881226	7
Chan Chan (Live) - Buena Vista Social Club ...	Le Moulin - Yann Tiersen	0.0187969924812	8
Chan Chan (Live) - Buena Vista Social Club ...	Gitana - Willie Colon	0.0187969924812	9
Chan Chan (Live) - Buena Vista Social Club ...	Criminal - Gotan Project	0.018779342723	10

artist	total_count
The Dells	274
Lil Jon / The East Side Boyz ...	197
Tom Petty And The Heartbreakers ...	2867
Blackstreet	747
Ratatat	3727
Shotta	82
Airscape	130
Mecano	172
Moimir Papalescu & The Nihilists ...	177
Brad Paisley	2731

artist	total_count
William Tabbert	14
Reel Feelings	24
Beyoncé feat. Bun B and Slim Thug ...	26
Diplo	30
Boggle Karaoke	30
harvey summers	31
Nâdiya	36
Kanye West / Talib Kweli / Q-Tip / Common / ...	38
Aneta Langerova	38
Jody Bernal	38

artist	total_count
Kings Of Leon	43218
Dwight Yoakam	40619
Björk	38889
Coldplay	35362
Florence + The Machine	33387
Justin Bieber	29715
Alliance Ethnik	26689
OneRepublic	25754
Train	25402
The Black Keys	22184

Explore data

Count number of users

Built a recommender system

Split the data into train and test data

Simple popularity-based recommender

Use the popularity model to make some predictions

User 1 recommendation

Reommender with personalization

Apply thepersonalized model to make recommendation

Quantitative comparison between the models

Precision-Recall

Counting unique users for some artists

Kanye Wesy

Kanye_West unique users

Foo Fighters unique users

Taylor Swift

Lady GaGa

Groupby-aggregate

Most popular artist

Sorting groupby_artist

Split the data into train and test data

Recommend song to 10000 test users