In [5]:
import graphlab



A newer version of GraphLab Create (v1.6.1) is available! Your current version is v1.6.

You can use pip to upgrade the graphlab-create package. For more information see https://dato.com/products/create/upgrade.

In [6]:
song_data = graphlab.SFrame('song_data.gl/')


[INFO] This non-commercial license of GraphLab Create is assigned to zhanglh13@fudan.edu.cnand will expire on September 21, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-20440 - Server binary: c:\home\courses\machine learning - uw\dato\lib\site-packages\graphlab\unity_server.exe - Server log: C:\Users\linghao\AppData\Local\Temp\graphlab_server_1443785484.log.0
[INFO] GraphLab Server Version: 1.6

In [8]:
users = song_data['user_id'].unique()

In [14]:
kanye_listeners = song_data[song_data['artist'] == 'Kanye West']
foo_listeners = song_data[song_data['artist'] == 'Foo Fighters']
taylor_listeners = song_data[song_data['artist'] == 'Taylor Swift']
lady_listeners = song_data[song_data['artist'] == 'Lady GaGa']

In [15]:
print len(kanye_listeners)
print len(foo_listeners)
print len(taylor_listeners)
print len(lady_listeners)


3775
3429
6227
4129

In [18]:
print len(kanye_listeners['user_id'].unique())
print len(foo_listeners['user_id'].unique())
print len(taylor_listeners['user_id'].unique())
print len(lady_listeners['user_id'].unique())


2522
2055
3246
2928

In [7]:
listen_count = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')})

In [11]:
listen_count.sort('total_count', ascending=False)[0]


Out[11]:
{'artist': 'Kings Of Leon', 'total_count': 43218L}

In [25]:
listen_count.sort('total_count', ascending=True)[0]


Out[25]:
{'artist': 'William Tabbert', 'total_count': 14L}

In [12]:
train_data, test_data = song_data.random_split(.8, seed=0)

In [13]:
personalized_model = graphlab.item_similarity_recommender.create(train_data,
                                                                user_id='user_id',
                                                                item_id='song')


PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Ignoring columns song_id, listen_count, title, artist;
PROGRESS:     To use one of these as a target column, set target = <column_name>
PROGRESS:     and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 893580 observations with 66085 users and 9952 items.
PROGRESS:     Data prepared in: 1.04606s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 9952 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 2.25813         |
PROGRESS: | 2000            | 2.34013         |
PROGRESS: | 3000            | 2.41514         |
PROGRESS: | 4000            | 2.49414         |
PROGRESS: | 5000            | 2.56915         |
PROGRESS: | 6000            | 2.64215         |
PROGRESS: | 7000            | 2.71916         |
PROGRESS: | 8000            | 2.79916         |
PROGRESS: | 9000            | 2.91217         |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 3.34019s

In [14]:
subset_test_users = test_data['user_id'].unique()[0:10000]

In [16]:
recommendations = personalized_model.recommend(subset_test_users, k=1)


PROGRESS: recommendations finished on 1000/10000 queries. users per second: 2083.22
PROGRESS: recommendations finished on 2000/10000 queries. users per second: 2120.77
PROGRESS: recommendations finished on 3000/10000 queries. users per second: 2176.94
PROGRESS: recommendations finished on 4000/10000 queries. users per second: 2153.89
PROGRESS: recommendations finished on 5000/10000 queries. users per second: 2113.15
PROGRESS: recommendations finished on 6000/10000 queries. users per second: 2086.11
PROGRESS: recommendations finished on 7000/10000 queries. users per second: 2091.93
PROGRESS: recommendations finished on 8000/10000 queries. users per second: 2075.11
PROGRESS: recommendations finished on 9000/10000 queries. users per second: 2057.5
PROGRESS: recommendations finished on 10000/10000 queries. users per second: 1982.83

In [17]:
recommend_count = recommendations.groupby(key_columns='song', operations={'count': graphlab.aggregate.COUNT})

In [18]:
recommend_count.sort('count', ascending=False)


Out[18]:
song count
Undo - Björk 431
Secrets - OneRepublic 383
Revelry - Kings Of Leon 232
You're The One - Dwight
Yoakam ...
169
Fireflies - Charttraxx
Karaoke ...
123
Hey_ Soul Sister - Train 105
Horn Concerto No. 4 in E
flat K495: II. Romance ...
98
Sehr kosmisch - Harmonia 73
OMG - Usher featuring
will.i.am ...
58
Dog Days Are Over (Radio
Edit) - Florence + The ...
54
[3133 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.