Last.fm is a music discovery service that gives you personalised recommendations based on the music you listento.
Here we are going to do some machine learning and data anlysis on the dataset of last.fm inorder to recommend the next songs to the user.
We are going to use NearestNeighbors Algorithm
to predict next songs that user will like to hear
Note: Dataset retrieved Last.fm [LastFM_Matrix.csv] contaning 1257 records and 285 Songs
In [4]:
# First Import some essential Libraries
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity # For calculating similarity matrix
from sklearn.neighbors import NearestNeighbors
In [5]:
DIR_PATH = os.getcwd() #Get currect directory
lfm = pd.read_csv(DIR_PATH + "//LastFM_Matrix.csv") #Load dataset
lfm.head() #Display Head of the dataset
Out[5]:
lets get all/some names of songs and user coloumn in the dataset
In [6]:
songs = pd.DataFrame(lfm.columns)
songs.head(10)
Out[6]:
Now let's import only songs and make a new DataFrame
In [7]:
lfm_songs = lfm.drop("user",axis =1) #drop user column
lfm_songs.head() # Show Head
Out[7]:
In [8]:
lfm_songs.shape #gives out total rows and columns
Out[8]:
Calculate cosine_similarity
in order to get Similarity Matrix
In [9]:
data_similarity = cosine_similarity(lfm_songs.T) #
data_similarity
Out[9]:
Now we have obtained data similarity matrix now lets use K-nearest neighbour algo and predict the recommendations but first we will label the matrix
In [10]:
type(data_similarity)
Out[10]:
Lets convert it ito DataFrame
In [11]:
data_similarity_df = pd.DataFrame(data_similarity, columns=(lfm_songs.columns), index=(lfm_songs.columns))
In [12]:
data_similarity_df.head()# similarity Matrix
Out[12]:
In [13]:
data_similarity_df.index.is_unique # check if there is no repeated songs
Out[13]:
Now we will use NearestNeighbors Algorithm
and apply to similarity matrix to get the recommendation
In [14]:
neigh = NearestNeighbors(n_neighbors=285)
neigh.fit(data_similarity_df) # Fit the data
Out[14]:
In [15]:
#Copy the predicted data to a new DataFrame
model = pd.DataFrame(neigh.kneighbors(data_similarity_df, return_distance=False))
model.head() #gives you integer values instead of song names
Out[15]:
In [16]:
final_model = pd.DataFrame(data_similarity_df.columns[model], index=data_similarity_df.index)#gives names with respect to songs
In [17]:
final_model.head() #preview final Model
Out[17]:
The above model gives us all 285 Recommendation, but we want only Top 10 recommendation, so lets modify the DataFrame a bit
In [18]:
top10 = final_model[list(final_model.columns[:11])]
In [19]:
top10.head()
Out[19]:
Now lets put our results in CSV
File called top10
In [20]:
top10.to_csv("top10.csv",index_label = "Index") # store data in csv file
Now lets read the CSV
File to check if its saved or not
In [21]:
pd.read_csv("top10").head()
Out[21]:
To conclude we have created a model which recommends next song user will like to hear by using last.fm data.
Further we can now use this model to make an API
and use it in our Website or WebApp to recommend songs to the user.
Github Link : https://github.com/kartikjagdale/Last.fm-Song-Recommender
In [21]: