ReadWriteWeb: Netflix is offering $1 million to the team that can improve its recommendation algorithm by 10%. It's been over 2 years now, with the leading company at 9.63%. There is some skepticism, though, that 10% will be reached anytime soon, because now the contestants are making only incremental progress. Do you expect the 10% mark to be reached soon?
Satnam: Netflix's recommendation engine, Cinematch, uses an item-to-item algorithm (similar to Amazon's) with a number of heuristics. Given that Netflix' recommendation system has been very successful in the real world, it is pretty impressive that teams have been able to improve on it by as much as 9.63%. Of course, the Netflix competition doesn't take into account speed of implementation or the scalability of the approach. It simply focuses on the quality of recommendations in terms of closing the gap between user rating and predicted rating. So, it isn't clear whether Netflix will be able to leverage all of the innovation coming out of this competition. Also, the Netflix data doesn't contain much information to allow for a content-based approach; it's for this reason that teams are focusing on collaborative-based techniques.
The challenges to reaching the 10% mark are:
Skewed data: The data set for the competition consists of more than 100 million anonymous movie ratings, using a scale of one to five stars, made by 480,000 users for 17,770 movies. Note that the user-item data set for this problem is sparsely populated, with nearly 99% of user-item entries being zero. The distribution of movies per user is skewed. The median number of ratings per user is 93. About 10% of users rated 16 or fewer movies, while 25% of users rated 36 or fewer. Two users rated as many as 17,000 movies. Similarly, the ratings per movie are also skewed: almost half the user base rated one popular movie (Miss Congeniality); about 25% of movies had 190 or fewer ratings; and a handful of movies were rated fewer than 10 times.
The approach: The winning team, BellKor, spent more than 2,000 combined hours poring over data to find the winning solution. The winning solution was a linear combination of 107 sets of predictions. Many of the algorithms involved either the nearest-neighbor method (k-NN) or latent factor models, such as SVD/factorization and Restricted Boltzmann Machines (RBMs).
The winning solution uses k-NN to predict the rating for a user, using both the Pearson-r correlation and cosine methods to compute the similarities, with corrections to remove item-specific and user-specific biases. Latent semantic models are also widely used in the winning solution.
The BellKor team found it important to use a variety of models that compensated for each other's shortcomings. No one model alone could have gotten the BellKor team to the top of the competition. The combined set of models achieved an improvement of 8.43% over Cinematch, while the best model -- a hybrid of k-NN applied to output from RBMs -- improved the result by 6.43%. The biggest improvement by LSI methods was 5.1%, with the best pure k-NN model scoring below that. (K for the k-NN methods was in the range of 20 to 50.) The BellKor team also applied a number of heuristics to further improve the results.
The BellKor team demonstrates a number of guidelines for building a winning solution to this kind of competition:
The final solution will be along the same lines, combining multiple models with heuristics. Contestants will probably reach the magic 10% mark in the next year or two.
ReadWriteWeb: Some people think the 10% mark can't be reached with algorithms alone, but that the "human" element is required. For example, ClerkDogs is a service that hires actual former video-store clerks to "create a database that is much richer and deeper than the collaborative filtering engines." It's a similar approach to that of Pandora, which has 50 employees who listen to and tag songs. How far do you think algorithms can go in making recommendations?
Satnam: Recommendation systems are not perfect. A number of elements go into making successful ones, including approach, the speed of computing results, heuristics, the exploration and exploitation of coefficients, and so on. But it has been shown in the real world that the more personalized you can make recommendations, the higher the click-through rate, the stickier the application, and the lower the bounce rate.
Using humans to form a rich database for recommendations may work for small applications, but it would probably be too expensive to scale. I don't see them competing against each other, human versus machine. Even with human/expert recommendations, one first needs to find a human/expert with tastes similar to those of the user, especially if you want to go after the long tail.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Thanks for the interview. It was good enough to convince me to buy the book :)
We've been working on a recommendation system for local search for a while now and found that while we can achieve good results, improvements come very slow.
One of the challenges of recommendation systems is that you need to also filter based on how practical the recommendation is. For movies or RSS feeds this isn't a big deal, everyone has access (for the most part). However in local search if a CF algorithm makes a recommendation for a restaurant a 1000 miles a way it's pretty useless. We've had to build in additional logic to balance the practical nature of the recommendation with the pure mathematical accuracy.
I hope the book covers it, but I've been looking for better, standard, frameworks for testing CF algorithms. I know I could build one (and I have) but it seems that as these systems get more popular it would be great to have a standard way to compare algorithms.
Great, thought-provoking interview - thanks.
Regarding the observation - "The key point about this particular recommendation engine is its strong use of an ontology, similar in concept to tags, to develop a common vocabulary for items and users." We apply our ontology in what seems like a somewhat similar way at Jinni, for movie and TV show recommendations. Very curious to try out NextBio!
It is important to remember that the problem the "Netflix Prize" is trying to solve is not synonymous with the "problem of how to make a good recommendation application."
Netflix has narrowly defined their problem as: Can we predict with 10% more "accuracy" how people will rate movies they have already seen?
But, an important part of recommendation discovery is helping people find things they don't already know about. Not to mention the fact that ratings are one of the most tedious and inaccurate methods of eliciting preference information from viewers.
Algorithm building by scientists is essential, but in no way will it alone help the recommendation world build better applications.
Human + computer hybrid systems are very powerful--the trick is how to make the most of the small amount of extremely valuable human input.
/Michael Papish
CEO, MediaUnbound
Another aspect of recommendation systems that we've been looking at before is not just how much a person likes an 'item' but WHY they like it. While our focus is on businesses, primarily restaurants and local services, the concept applies quite well to movies as well.
If you think about it there are often a lot of movies you like for a very specific reason, e.g. you may like a certain actor or director and thus like a movie you ordinarily wouldn't. I see this all the time with my wife who will watch a movie with Brad Pitt she would never watch otherwise.
We (theSUGGESTR.com) plan on implementing some features soon to try to help us determine the WHY of a person's rating, which hopefully will lead us to better (and more) suggestions.
While I applaud Netflix for inspiring some great efforts in the machine learning community--and at an impressively low price, even if they do have to pay out the $1M--I still think everyone is solving the wrong problem.
For more details:
http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/
Great, thought-provoking interview - thanks.
What a great book. We used its concepts alot when we built http://www.dynamicalsoftware.com/cogenuity which is a challenge based collective intelligence platform where organizations can direct some of the smartest brains on the web to solving their problems.
even if they do have to pay out the $1M--I still think everyone is solving the wrong problem.
intelligence platform where organizations can direct some of the smartest brains on the web to solving their problems.