Last.fm

Author : Kartik Jagdale (https://github.com/kartikjagdale)

Last.fm is a music discovery service that gives you personalised recommendations based on the music you listento.

Here we are going to do some machine learning and data anlysis on the dataset of last.fm inorder to recommend the next songs to the user.

We are going to use NearestNeighbors Algorithm to predict next songs that user will like to hear

Note: Dataset retrieved Last.fm [LastFM_Matrix.csv] contaning 1257 records and 285 Songs



In [4]:

    
# First Import some essential Libraries
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity # For calculating similarity matrix
from sklearn.neighbors import NearestNeighbors



In [5]:

    
DIR_PATH = os.getcwd() #Get currect directory

lfm = pd.read_csv(DIR_PATH + "//LastFM_Matrix.csv") #Load dataset
lfm.head() #Display Head of the dataset









    Out[5]:






  
    
      
      user
      a perfect circle
      abba
      ac/dc
      adam green
      aerosmith
      afi
      air
      alanis morissette
      alexisonfire
      ...
      timbaland
      tom waits
      tool
      tori amos
      travis
      trivium
      u2
      underoath
      volbeat
      yann tiersen
    
  
  
    
      0
        1
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      1
       33
       0
       0
       0
       1
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      2
       42
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      3
       51
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      4
       62
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
  

5 rows × 286 columns

lets get all/some names of songs and user coloumn in the dataset



In [6]:

    
songs = pd.DataFrame(lfm.columns)
songs.head(10)









    Out[6]:






  
    
      
      0
    
  
  
    
      0
                    user
    
    
      1
        a perfect circle
    
    
      2
                    abba
    
    
      3
                   ac/dc
    
    
      4
              adam green
    
    
      5
               aerosmith
    
    
      6
                     afi
    
    
      7
                     air
    
    
      8
       alanis morissette
    
    
      9
            alexisonfire

Now let's import only songs and make a new DataFrame



In [7]:

    
lfm_songs = lfm.drop("user",axis =1) #drop user column
lfm_songs.head() # Show Head









    Out[7]:






  
    
      
      a perfect circle
      abba
      ac/dc
      adam green
      aerosmith
      afi
      air
      alanis morissette
      alexisonfire
      alicia keys
      ...
      timbaland
      tom waits
      tool
      tori amos
      travis
      trivium
      u2
      underoath
      volbeat
      yann tiersen
    
  
  
    
      0
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      1
       0
       0
       0
       1
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      2
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      3
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
    
      4
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
      ...
       0
       0
       0
       0
       0
       0
       0
       0
       0
       0
    
  

5 rows × 285 columns



In [8]:

    
lfm_songs.shape #gives out total rows and columns









    Out[8]:





(1257, 285)

Calculate cosine_similarity in order to get Similarity Matrix



In [9]:

    
data_similarity = cosine_similarity(lfm_songs.T) #
data_similarity









    Out[9]:





array([[ 1.        ,  0.        ,  0.01791723, ...,  0.06506   ,
         0.05216405,  0.        ],
       [ 0.        ,  1.        ,  0.05227877, ...,  0.        ,
         0.02536731,  0.        ],
       [ 0.01791723,  0.05227877,  1.        , ...,  0.02039967,
         0.13084898,  0.        ],
       ..., 
       [ 0.06506   ,  0.        ,  0.02039967, ...,  1.        ,
         0.        ,  0.        ],
       [ 0.05216405,  0.02536731,  0.13084898, ...,  0.        ,
         1.        ,  0.02969569],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.02969569,  1.        ]])

Now we have obtained data similarity matrix now lets use K-nearest neighbour algo and predict the recommendations but first we will label the matrix



In [10]:

    
type(data_similarity)









    Out[10]:





numpy.ndarray

Lets convert it ito DataFrame



In [11]:

    
data_similarity_df = pd.DataFrame(data_similarity, columns=(lfm_songs.columns), index=(lfm_songs.columns))



In [12]:

    
data_similarity_df.head()# similarity Matrix









    Out[12]:






  
    
      
      a perfect circle
      abba
      ac/dc
      adam green
      aerosmith
      afi
      air
      alanis morissette
      alexisonfire
      alicia keys
      ...
      timbaland
      tom waits
      tool
      tori amos
      travis
      trivium
      u2
      underoath
      volbeat
      yann tiersen
    
  
  
    
      a perfect circle
       1.000000
       0.000000
       0.017917
       0.051554
       0.062776
       0.000000
       0.051755
       0.060718
       0
       0.000000
      ...
       0.047338
       0.081200
       0.394709
       0.125553
       0.030359
       0.111154
       0.024398
       0.06506
       0.052164
       0.000000
    
    
      abba
       0.000000
       1.000000
       0.052279
       0.025071
       0.061056
       0.000000
       0.016779
       0.029527
       0
       0.000000
      ...
       0.000000
       0.000000
       0.000000
       0.061056
       0.029527
       0.000000
       0.094916
       0.00000
       0.025367
       0.000000
    
    
      ac/dc
       0.017917
       0.052279
       1.000000
       0.113154
       0.177153
       0.067894
       0.075730
       0.038076
       0
       0.088333
      ...
       0.044529
       0.067894
       0.058241
       0.039367
       0.000000
       0.087131
       0.122398
       0.02040
       0.130849
       0.000000
    
    
      adam green
       0.051554
       0.025071
       0.113154
       1.000000
       0.056637
       0.000000
       0.093386
       0.000000
       0
       0.025416
      ...
       0.000000
       0.146516
       0.083789
       0.056637
       0.082169
       0.025071
       0.022011
       0.00000
       0.023531
       0.088045
    
    
      aerosmith
       0.062776
       0.061056
       0.177153
       0.056637
       1.000000
       0.000000
       0.113715
       0.100056
       0
       0.061898
      ...
       0.052005
       0.029735
       0.025507
       0.068966
       0.033352
       0.000000
       0.214423
       0.00000
       0.057307
       0.000000
    
  

5 rows × 285 columns



In [13]:

    
data_similarity_df.index.is_unique # check if there is no repeated songs









    Out[13]:





True

Now we will use NearestNeighbors Algorithm and apply to similarity matrix to get the recommendation



In [14]:

    
neigh = NearestNeighbors(n_neighbors=285)
neigh.fit(data_similarity_df) # Fit the data









    Out[14]:





NearestNeighbors(algorithm='auto', leaf_size=30, metric='minkowski',
         metric_params=None, n_neighbors=285, p=2, radius=1.0)



In [15]:

    
#Copy the predicted data to a new DataFrame
model = pd.DataFrame(neigh.kneighbors(data_similarity_df, return_distance=False))
model.head() #gives you integer values instead of song names









    Out[15]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ...
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
    
  
  
    
      0
       0
       277
        81
        70
       189
       206
       108
       235
       264
        80
      ...
       216
       147
        60
        90
       159
       254
       261
        57
        32
       218
    
    
      1
       1
       221
        88
       165
       174
       175
        83
       208
       113
       103
      ...
       230
        33
       213
       172
        19
        79
       162
       150
       125
       241
    
    
      2
       2
       128
       172
        36
       190
        75
       182
       116
       258
       140
      ...
       218
        39
       263
       248
        57
        68
       179
       261
        17
        32
    
    
      3
       3
       255
       267
        25
       276
        47
        84
       104
       266
        59
      ...
       213
        11
        90
        20
       238
        79
        92
       162
       150
       125
    
    
      4
       4
       281
       157
       158
       115
        93
       106
        78
       103
       262
      ...
       253
        10
        19
       162
        22
       241
        39
       125
        20
       150
    
  

5 rows × 285 columns



In [16]:

    
final_model = pd.DataFrame(data_similarity_df.columns[model], index=data_similarity_df.index)#gives names with respect to songs



In [17]:

    
final_model.head() #preview final Model









    Out[17]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ...
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
    
  
  
    
      a perfect circle
       a perfect circle
                  tool
               dredg
            deftones
       nine inch nails
        porcupine tree
        godsmack
                staind
       the smashing pumpkins
            dream theater
      ...
       red hot chili peppers
             katy perry
             coldplay
              ensiferum
              leona lewis
                 the kooks
       the pussycat dolls
       christina aguilera
                 beyonce
                 rihanna
    
    
      abba
                   abba
       robbie williams
       elvis presley
             madonna
       michael jackson
                  mika
           duffy
                 queen
             groove coverage
            frank sinatra
      ...
                    slipknot
           billy talent
            rammstein
              metallica
           arctic monkeys
                 disturbed
              linkin park
        killswitch engage
               in flames
        system of a down
    
    
      ac/dc
                  ac/dc
           iron maiden
           metallica
       black sabbath
               nirvana
       die toten hosen
       motorhead
            hammerfall
               the offspring
             judas priest
      ...
                     rihanna
             bloc party
            the shins
       the decemberists
       christina aguilera
       death cab for cutie
             modest mouse
       the pussycat dolls
             arcade fire
                 beyonce
    
    
      adam green
             adam green
        the libertines
         the strokes
        babyshambles
             tom waits
           bright eyes
         editors
       franz ferdinand
                 the streets
                cocorosie
      ...
                   rammstein
            amon amarth
            ensiferum
         as i lay dying
          subway to sally
                 disturbed
              equilibrium
              linkin park
       killswitch engage
               in flames
    
    
      aerosmith
              aerosmith
                    u2
        led zeppelin
       lenny kravitz
          guns n roses
          eric clapton
         genesis
          dire straits
               frank sinatra
       the rolling stones
      ...
                 the killers
       all that remains
       arctic monkeys
            linkin park
                   atreyu
          system of a down
               bloc party
                in flames
          as i lay dying
       killswitch engage
    
  

5 rows × 285 columns

The above model gives us all 285 Recommendation, but we want only Top 10 recommendation, so lets modify the DataFrame a bit



In [18]:

    
top10 = final_model[list(final_model.columns[:11])]



In [19]:

    
top10.head()









    Out[19]:






  
    
      
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
    
  
  
    
      a perfect circle
       a perfect circle
                  tool
               dredg
            deftones
       nine inch nails
        porcupine tree
        godsmack
                staind
       the smashing pumpkins
            dream theater
                         opeth
    
    
      abba
                   abba
       robbie williams
       elvis presley
             madonna
       michael jackson
                  mika
           duffy
                 queen
             groove coverage
            frank sinatra
                   hans zimmer
    
    
      ac/dc
                  ac/dc
           iron maiden
           metallica
       black sabbath
               nirvana
       die toten hosen
       motorhead
            hammerfall
               the offspring
             judas priest
               bloodhound gang
    
    
      adam green
             adam green
        the libertines
         the strokes
        babyshambles
             tom waits
           bright eyes
         editors
       franz ferdinand
                 the streets
                cocorosie
       queens of the stone age
    
    
      aerosmith
              aerosmith
                    u2
        led zeppelin
       lenny kravitz
          guns n roses
          eric clapton
         genesis
          dire straits
               frank sinatra
       the rolling stones
                   deep purple

Now lets put our results in CSV File called top10



In [20]:

    
top10.to_csv("top10.csv",index_label = "Index") # store data in csv file

Now lets read the CSV File to check if its saved or not



In [21]:

    
pd.read_csv("top10").head()









    Out[21]:






  
    
      
      Index
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
    
  
  
    
      0
       a perfect circle
       a perfect circle
                  tool
               dredg
            deftones
       nine inch nails
        porcupine tree
        godsmack
                staind
       the smashing pumpkins
            dream theater
                         opeth
    
    
      1
                   abba
                   abba
       robbie williams
       elvis presley
             madonna
       michael jackson
                  mika
           duffy
                 queen
             groove coverage
            frank sinatra
                   hans zimmer
    
    
      2
                  ac/dc
                  ac/dc
           iron maiden
           metallica
       black sabbath
               nirvana
       die toten hosen
       motorhead
            hammerfall
               the offspring
             judas priest
               bloodhound gang
    
    
      3
             adam green
             adam green
        the libertines
         the strokes
        babyshambles
             tom waits
           bright eyes
         editors
       franz ferdinand
                 the streets
                cocorosie
       queens of the stone age
    
    
      4
              aerosmith
              aerosmith
                    u2
        led zeppelin
       lenny kravitz
          guns n roses
          eric clapton
         genesis
          dire straits
               frank sinatra
       the rolling stones
                   deep purple

`Conclude`

To conclude we have created a model which recommends next song user will like to hear by using last.fm data.

Further we can now use this model to make an API and use it in our Website or WebApp to recommend songs to the user.

Github Link : https://github.com/kartikjagdale/Last.fm-Song-Recommender



In [21]:

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	0
0	user
1	a perfect circle
2	abba
3	ac/dc
4	adam green
5	aerosmith
6	afi
7	air
8	alanis morissette
9	alexisonfire

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	a perfect circle	abba	ac/dc	adam green	aerosmith	afi	air	alanis morissette	alicia keys	...	timbaland	tom waits	tool	tori amos	travis	trivium	u2	underoath	volbeat	yann tiersen
a perfect circle	1.000000	0.000000	0.017917	0.051554	0.062776	0.000000	0.051755	0.060718	0.000000	...	0.047338	0.081200	0.394709	0.125553	0.030359	0.111154	0.024398	0.06506	0.052164	0.000000
abba	0.000000	1.000000	0.052279	0.025071	0.061056	0.000000	0.016779	0.029527	0.000000	...	0.000000	0.000000	0.000000	0.061056	0.029527	0.000000	0.094916	0.00000	0.025367	0.000000
ac/dc	0.017917	0.052279	1.000000	0.113154	0.177153	0.067894	0.075730	0.038076	0.088333	...	0.044529	0.067894	0.058241	0.039367	0.000000	0.087131	0.122398	0.02040	0.130849	0.000000
adam green	0.051554	0.025071	0.113154	1.000000	0.056637	0.000000	0.093386	0.000000	0.025416	...	0.000000	0.146516	0.083789	0.056637	0.082169	0.025071	0.022011	0.00000	0.023531	0.088045
aerosmith	0.062776	0.061056	0.177153	0.056637	1.000000	0.000000	0.113715	0.100056	0.061898	...	0.052005	0.029735	0.025507	0.068966	0.033352	0.000000	0.214423	0.00000	0.057307	0.000000

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
a perfect circle	a perfect circle	tool	dredg	deftones	nine inch nails	porcupine tree	godsmack	staind	the smashing pumpkins	dream theater	...	red hot chili peppers	katy perry	coldplay	ensiferum	leona lewis	the kooks	the pussycat dolls	christina aguilera	beyonce	rihanna
abba	abba	robbie williams	elvis presley	madonna	michael jackson	mika	duffy	queen	groove coverage	frank sinatra	...	slipknot	billy talent	rammstein	metallica	arctic monkeys	disturbed	linkin park	killswitch engage	in flames	system of a down
ac/dc	ac/dc	iron maiden	metallica	black sabbath	nirvana	die toten hosen	motorhead	hammerfall	the offspring	judas priest	...	rihanna	bloc party	the shins	the decemberists	christina aguilera	death cab for cutie	modest mouse	the pussycat dolls	arcade fire	beyonce
adam green	adam green	the libertines	the strokes	babyshambles	tom waits	bright eyes	editors	franz ferdinand	the streets	cocorosie	...	rammstein	amon amarth	ensiferum	as i lay dying	subway to sally	disturbed	equilibrium	linkin park	killswitch engage	in flames
aerosmith	aerosmith	u2	led zeppelin	lenny kravitz	guns n roses	eric clapton	genesis	dire straits	frank sinatra	the rolling stones	...	the killers	all that remains	arctic monkeys	linkin park	atreyu	system of a down	bloc party	in flames	as i lay dying	killswitch engage

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150

	user	adam green	...
0	1	0	...
1	33	1	...
2	42	0	...
3	51	0	...
4	62	0	...

	adam green	...
0	0	...
1	1	...
2	0	...
3	0	...
4	0	...

	0	1	2	3	4	5	6	7	8	9	...	275	276	277	278	279	280	281	282	283	284
0	0	277	81	70	189	206	108	235	264	80	...	216	147	60	90	159	254	261	57	32	218
1	1	221	88	165	174	175	83	208	113	103	...	230	33	213	172	19	79	162	150	125	241
2	2	128	172	36	190	75	182	116	258	140	...	218	39	263	248	57	68	179	261	17	32
3	3	255	267	25	276	47	84	104	266	59	...	213	11	90	20	238	79	92	162	150	125
4	4	281	157	158	115	93	106	78	103	262	...	253	10	19	162	22	241	39	125	20	150