The purpose of this Data Science Experience (DSX) project is to show how recommendations can be done with Apache Spark and integrated into a web application. This project uses a randomly (but biased) generated dataset with approximately two thousand movies and 500,000 ratings. The ratings have been generated randomly
The overall web application architecture can be seen here:
There is a live demo web application available here: https://movie-recommend-demo.mybluemix.net
Below you can see a screenshot from the demo web application where the logged in user has searched for movies with 'harry' in the title and is then rating a movie.
The project is split into a number of different notebooks that focus on specific steps.
In this notebook, we perform some basic exploratory analysis of the ratings dataset before we jump into machine learning.
Here we use Spark's Machine Learning Library (MLlib) to train a machine learning model on the data.
In this notebook, we simulate a new user's movie ratings and then use those ratings to predice movies for them.
The Apache spark trained model is designed to be built as a batch process. This notebook we investigate how we can augment the batch generated model with ratings for new users so that we can provide recommendations without having to wait for the next batch run.
If you haven't setup your own instance of the demo web application with Cloudant and Compose Redis, this step will walk you through that process.
In this notebook we install the latest Spark Cloudant library to use Cloudant as a source of rating data and as a destination for the generated recommendations.
In this notebook, we walk through setting up a web application where users can rate movies and receive recommendations using the Cloudant Datastore Recommender.
If you have any questions about this project, please contact me at chris.snow@uk.ibm.com
In [ ]:
In [ ]: