Debugging the recommender systems part 2

In this notebook we are going to test different variations of the recommeder system in order to see which parameters or modifications to the algortihm help to improve its accuracy. The test are performed over the Fourcity TripAdvisor Dataset.

We are going to analyze the results of different test cases:


In [1]:
import sys
sys.path.append('/Users/fpena/UCC/Thesis/projects/yelp/source/python')
import time
from etl import ETLUtils
from evaluation.mean_absolute_error import MeanAbsoluteError
from evaluation.root_mean_square_error import RootMeanSquareError
from recommenders.adjusted_weighted_sum_recommender import AdjustedWeightedSumRecommender
from recommenders.similarity.single_similarity_matrix_builder import SingleSimilarityMatrixBuilder

def perform_cross_validation(reviews, recommender, num_folds):

    start_time = time.time()
    split = 1 - (1/float(num_folds))
    total_mean_absolute_error = 0.
    total_mean_square_error = 0.
    total_coverage = 0.
    num_cycles = 0

    for i in xrange(0, num_folds):
        # print('Num cycles:', i)
        start = float(i) / num_folds
        train, test = ETLUtils.split_train_test(reviews, split=split, shuffle_data=False, start=start)
        recommender.load(train)
        _, errors, num_unknown_ratings = predict_rating_list(recommender, test)
        mean_absolute_error = MeanAbsoluteError.compute_list(errors)
        root_mean_square_error = RootMeanSquareError.compute_list(errors)
        num_samples = len(test)
        coverage = float((num_samples - num_unknown_ratings) / num_samples)

        if mean_absolute_error is not None:
            total_mean_absolute_error += mean_absolute_error
            total_mean_square_error += root_mean_square_error
            total_coverage += coverage
            num_cycles += 1
        else:
            print('Mean absolute error is None!!!')


    final_mean_absolute_error = total_mean_absolute_error / num_cycles
    final_root_squared_error = total_mean_square_error / num_cycles
    final_coverage = total_coverage / num_cycles
    execution_time = time.time() - start_time

    # print('Final mean absolute error: %f' % final_mean_absolute_error)
    # print('Final root mean square error: %f' % final_root_squared_error)
    # print('Final coverage: %f' % final_coverage)
    # print("--- %s seconds ---" % execution_time)

    result = {
        'MAE': final_mean_absolute_error,
        'RMSE': final_root_squared_error,
        'Coverage': final_coverage,
        'Execution time': execution_time
    }

    return result

def predict_rating_list(predictor, reviews):
    """
    For each one of the reviews this method predicts the rating for the
    user and item contained in the review and also returns the error
    between the predicted rating and the actual rating the user gave to the
    item

    :param predictor: the object used to predict the rating that will be given
     by a the user to the item contained in each review
    :param reviews: a list of reviews (the test data)
    :return: a tuple with a list of the predicted ratings and the list of
    errors for those predictions
    """
    predicted_ratings = []
    errors = []
    num_unknown_ratings = 0.

    for review in reviews:

        user_id = review['user_id']
        item_id = review['offering_id']
        predicted_rating = predictor.predict_rating(user_id, item_id)
        actual_rating = review['overall_rating']

        # print(user_id, item_id, predicted_rating)

        error = None

        # print('Predicted rating', predicted_rating)

        if predicted_rating is not None:
            error = abs(predicted_rating - actual_rating)
        else:
            num_unknown_ratings += 1

        predicted_ratings.append(predicted_rating)
        errors.append(error)

    return predicted_ratings, errors, num_unknown_ratings

In [2]:
# We load the dataset
file_path = '/Users/fpena/tmp/filtered_reviews_multi_non_sparse_shuffled.json'
reviews = ETLUtils.load_json_file(file_path)

# We set the number of folds to perform cross-validation
num_folds = 5

In [3]:
sim_matrix_builder = SingleSimilarityMatrixBuilder('euclidean')
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 31.982892990112305, 'MAE': 0.8317007573477152, 'Coverage': 0.795192291146864, 'RMSE': 1.0799606235587949}

In [4]:
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 29.14725089073181, 'MAE': 0.8425623710213761, 'Coverage': 0.795192291146864, 'RMSE': 1.0905084316087541}

In [5]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 36.9143340587616, 'MAE': 1.0055819107710042, 'Coverage': 0.11069431300008606, 'RMSE': 1.2985063560574599}

In [6]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 39.12229084968567, 'MAE': 1.0064615011122213, 'Coverage': 0.11069431300008606, 'RMSE': 1.2990208838735255}

Consider only similarities based on two or more common items


In [7]:
sim_matrix_builder._min_common_items = 2

In [8]:
sim_matrix_builder._similarity_metric = 'euclidean'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 20.482923984527588, 'MAE': 0.92726772331640761, 'Coverage': 0.26506925922739394, 'RMSE': 1.1833355369778542}

In [9]:
sim_matrix_builder._similarity_metric = 'euclidean'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 22.972856998443604, 'MAE': 0.93037424513630496, 'Coverage': 0.26506925922739394, 'RMSE': 1.1858269139961131}

In [10]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 17.226240873336792, 'MAE': 1.0055819107710042, 'Coverage': 0.11069431300008606, 'RMSE': 1.2985063560574599}

In [11]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 16.79896903038025, 'MAE': 1.0064615011122213, 'Coverage': 0.11069431300008606, 'RMSE': 1.2990208838735255}

Consider only similarities based on three or more common items


In [12]:
sim_matrix_builder._min_common_items = 3

In [13]:
sim_matrix_builder._similarity_metric = 'euclidean'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 15.401348114013672, 'MAE': 0.94803251038255743, 'Coverage': 0.06500215090768305, 'RMSE': 1.1789930775780015}

In [14]:
sim_matrix_builder._similarity_metric = 'euclidean'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 15.888621091842651, 'MAE': 0.94825341677990094, 'Coverage': 0.06500215090768305, 'RMSE': 1.1792000620569696}

In [15]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = None
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 16.67840600013733, 'MAE': 1.0583364721367245, 'Coverage': 0.0358857437838768, 'RMSE': 1.3042785211062962}

In [16]:
sim_matrix_builder._similarity_metric = 'pearson'
awsr = AdjustedWeightedSumRecommender(sim_matrix_builder)
awsr._num_neighbors = 5
result = perform_cross_validation(reviews, awsr, num_folds)
print(result)


{'Execution time': 15.447357892990112, 'MAE': 1.0583364721367245, 'Coverage': 0.0358857437838768, 'RMSE': 1.3042785211062962}

As we can see, the best results are obtained when using euclidean distance. Better results are obtained when using all the neighbours and not just the neighbours that have rated more than two (or three) items in common. Better results are also obtained when using all the neighbours and not just the Top-5 more similar users.