notebook.community

Edit and run



In [5]:

    
import graphlab









    









    



A newer version of GraphLab Create (v1.6.1) is available! Your current version is v1.6.

You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.



In [6]:

    
song_data = graphlab.SFrame('song_data.gl/')









    



[INFO] This non-commercial license of GraphLab Create is assigned to zhanglh13@fudan.edu.cnand will expire on September 21, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-20440 - Server binary: c:\home\courses\machine learning - uw\dato\lib\site-packages\graphlab\unity_server.exe - Server log: C:\Users\linghao\AppData\Local\Temp\graphlab_server_1443785484.log.0
[INFO] GraphLab Server Version: 1.6



In [8]:

    
users = song_data['user_id'].unique()



In [14]:

    
kanye_listeners = song_data[song_data['artist'] == 'Kanye West']
foo_listeners = song_data[song_data['artist'] == 'Foo Fighters']
taylor_listeners = song_data[song_data['artist'] == 'Taylor Swift']
lady_listeners = song_data[song_data['artist'] == 'Lady GaGa']



In [15]:

    
print len(kanye_listeners)
print len(foo_listeners)
print len(taylor_listeners)
print len(lady_listeners)



In [18]:

    
print len(kanye_listeners['user_id'].unique())
print len(foo_listeners['user_id'].unique())
print len(taylor_listeners['user_id'].unique())
print len(lady_listeners['user_id'].unique())



In [7]:

    
listen_count = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')})



In [11]:

    
listen_count.sort('total_count', ascending=False)[0]









    Out[11]:





{'artist': 'Kings Of Leon', 'total_count': 43218L}



In [25]:

    
listen_count.sort('total_count', ascending=True)[0]









    Out[25]:





{'artist': 'William Tabbert', 'total_count': 14L}



In [12]:

    
train_data, test_data = song_data.random_split(.8, seed=0)



In [13]:

    
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')









    



PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Ignoring columns song_id, listen_count, title, artist;
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.04606s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 2.25813         |
PROGRESS: | 2000            | 2.34013         |
PROGRESS: | 3000            | 2.41514         |
PROGRESS: | 4000            | 2.49414         |
PROGRESS: | 5000            | 2.56915         |
PROGRESS: | 6000            | 2.64215         |
PROGRESS: | 7000            | 2.71916         |
PROGRESS: | 8000            | 2.79916         |
PROGRESS: | 9000            | 2.91217         |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 3.34019s



In [14]:

    
subset_test_users = test_data['user_id'].unique()[0:10000]



In [16]:

    
recommendations = personalized_model.recommend(subset_test_users, k=1)









    



PROGRESS: recommendations finished on 1000/10000 queries. users per second: 2083.22
PROGRESS: recommendations finished on 2000/10000 queries. users per second: 2120.77
PROGRESS: recommendations finished on 3000/10000 queries. users per second: 2176.94
PROGRESS: recommendations finished on 4000/10000 queries. users per second: 2153.89
PROGRESS: recommendations finished on 5000/10000 queries. users per second: 2113.15
PROGRESS: recommendations finished on 6000/10000 queries. users per second: 2086.11
PROGRESS: recommendations finished on 7000/10000 queries. users per second: 2091.93
PROGRESS: recommendations finished on 8000/10000 queries. users per second: 2075.11
PROGRESS: recommendations finished on 9000/10000 queries. users per second: 2057.5
PROGRESS: recommendations finished on 10000/10000 queries. users per second: 1982.83



In [17]:

    
recommend_count = recommendations.groupby(key_columns='song', operations={'count': graphlab.aggregate.COUNT})



In [18]:

    
recommend_count.sort('count', ascending=False)









    Out[18]:





    
        song
        count
    
    
        Undo - Björk
        431
    
    
        Secrets - OneRepublic
        383
    
    
        Revelry - Kings Of Leon
        232
    
    
        You're The One - Dwight
Yoakam ...
        169
    
    
        Fireflies - Charttraxx
Karaoke ...
        123
    
    
        Hey_ Soul Sister - Train
        105
    
    
        Horn Concerto No. 4 in E
flat K495: II. Romance ...
        98
    
    
        Sehr kosmisch - Harmonia
        73
    
    
        OMG - Usher featuring
will.i.am ...
        58
    
    
        Dog Days Are Over (Radio
Edit) - Florence + The ...
        54
    

[3133 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

song	count
Undo - Björk	431
Secrets - OneRepublic	383
Revelry - Kings Of Leon	232
You're The One - Dwight Yoakam ...	169
Fireflies - Charttraxx Karaoke ...	123
Hey_ Soul Sister - Train	105
Horn Concerto No. 4 in E flat K495: II. Romance ...	98
Sehr kosmisch - Harmonia	73
OMG - Usher featuring will.i.am ...	58
Dog Days Are Over (Radio Edit) - Florence + The ...	54