Earlier this week we posted a Guide to Recommender Systems, as part of our series on recommendation technologies. In this post we look at some of the challenges in building or deploying a recommender system. And yes, Napoleon Dynamite is one of them.
This week an event called Recked was held in Amsterdam, aimed at engineers interested in these systems. The event was hosted by Wakoopa and Strands (we've embedded the presentations below). In those presentations, there were some hints at the problems that these companies have to overcome to build an effective recommender system.
Perhaps the biggest issue facing recommender systems is that they need a lot of data to effectively make recommendations. It's no coincidence that the companies most identified with having excellent recommendations are those with a lot of consumer user data: Google, Amazon, Netflix, Last.fm. As illustrated in the slide below from Strands' presentation at Recked, a good recommender system firstly needs item data (from a catalog or other form), then it must capture and analyze user data (behavioral events), and then the magic algorithm does its work. The more item and user data a recommender system has to work with, the stronger the chances of getting good recommendations. But it can be a chicken and egg problem - to get good recommendations, you need a lot of users, so you can get a lot of data for the recommendations.

This issue was pointed out in ReadWriteWeb's comments by Paul Edmunds, CEO of 'intelligent recommendations' company Clicktorch. Paul commented that systems are usually "biased towards the old and have difficulty showing new".
An example of this was blogged by David Reinke of StyleHop, a resource and community for fashion enthusiasts. David noted that "past behavior [of users] is not a good tool because the trends are always changing" (emphasis ours). Clearly an algorithmic approach will find it difficult if not impossible to keep up with fashion trends. Most fashion-challenged people - I fall into that category - rely on trusted fashion-conscious friends and family to recommend new clothes to them.
David Reinke went on to say that "item recommendations don't work because there are simply too many product attributes in fashion and each attribute (think fit, price, color, style, fabric, brand, etc) has a different level of importance at different times for the same consumer." He did point out though that social recommenders may be able to 'solve' this problem.
Again suggested by Paul Edmunds, the issue here is that while today I have a particular intention when browsing e.g. Amazon - tomorrow I might have a different intention. A classic example is that one day I will be browsing Amazon for new books for myself, but the next day I'll be on Amazon searching for a birthday present for my sister (actually I got her a gift card, but that's beside the point).
On the topic of user preferences, recommender systems may also incorrectly label users - a la this classic Wall St Journal story from 2002, If TiVo Thinks You Are Gay, Here's How to Set It Straight.
In our post on the Netflix Prize, about the $1 Million prize offered by Netflix for a third party to deliver a collaborative filtering algorithm that will improve Netflix's own recommendations algorithm by 10%, we noted that there was an issue with eccentric movies. The type of movie that people either love or hate, such as Napoleon Dynamite. These type of items are difficult to make recommendations on, because the user reaction to them tends to be diverse and unpredictable.
Music is full of these items. Would you have guessed that this author is a fan of both Metallica and The Carpenters? I doubt Last.fm would make that recommendation.
We're stating the obvious here, but the below slide from Strands' presentation at Recked illustrates that it takes a lot of variables to do even the simplest recommendations (and we imagine the below variables only scratch the surface):

So far only a handful of companies have really gotten recommendations to a high level of user satisfaction - Amazon, Netflix (although of course they are looking for a 10% improvement on their algorithm), Google are some names that spring to mind. But for those select few success stories, there are hundreds of other websites and apps that are still struggling to find the magic formula for recommending new products or content to their users. Indeed we at ReadWriteWeb would love to get readers clicking around our site more to discover other content, and we've tried several plug-ins and methods to achieve this - but we're not satisfied yet.
There are many other issues that can happen with recommender systems - some offer up too many 'lowest common denominator' recommendations, some don't support The Long Tail enough and just recommend obvious items, outliers can be a problem, and so on.
Let us know your thoughts and what other problems recommendation technologies face as they continue to ramp up.
We also invite you to explore using our custom ReadWriteWeb Resources:
Also, we've embedded the three presentations (Strands, Wakoopa and Reccoon) from Recked below:
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Couldn't agree more about the lack of data point. Very few free data sources exist with large quantities of useful data to process. Many of these sites require the user to do a good amount of work to before the ratings become accurate. This is particularly hard for newer / smaller startups.
I also think that the point of unpredictable items is a big one, this is made more problematic in that most recommender system work just on the raw data - they rarely look at the WHY behind the rating. Take Netflix, it knows you love a movie but I don't believe its algorithm tries to figure out WHY you love it - is it the actor, the genre, the producer, the score. I think Netflix may use some information but most sites don't go beyond the user,item,rating data points.
Another area that I see for improvement is the actual algorithms themselves. From what I've seen of most algorithms don't have the concept of confidence level and those that do use it as threshold. Basically when the algorithm computes the similarity of two items they rarely factor in how many common raters the two items have. So if item 1 and item 2 have 2 users that rated both a 5 it will have a higher correlation that item 3 and item 4 that have 20 users in common if any ratings are not the same. Now implementing a threshold is easy, but weighting based on confidence isn't commonly done (at least in the algorithms I've found in the past).
1. recommender systems typically don't work as well for people whose tastes fall out of the mainstream. partly that's due purely to numbers. by definition, "out of the mainstream" means there are fewer people in that category. but also, recommender systems don't really work for people with "ecclectic" tastes. it's easy to recommend something to someone with narrow tastes. but people with broad horizons are disappointed by recommendation systems (see point 2)
2. the goal of a recommendation varies wildly by product. case in point...
- when i want a recommendation for an electric razor, i want something VERY SIMILAR to what I've been searching on, just BETTER
- when i want a recommendation for a MOVIE or MUSIC, i may or may not want a recommendation similar to what I've been searching on. sometimes i want something TOTALLY DIFFERENT from what i've been listening to. eg. i get tired of Pandora because it always plays the same stuff. itunes GENIUS system recommends the same stuff over and over.
human tastes and desires are transient, ephemeral, unpredictable, chaotic, fractal. the more sophisticated and ecclectic the taste, the harder to match.
With our recommendation engine (in our "Gift Lists" system) we recognized the issues involved and decided not to try and reverse-engineer the human psyche.
Even if you like Metallica and The Carpenters we provide results in line with your tastes (as a bonus, we strip out things you already have).
Obviously it's not perfect, but we've asked a lot of people how good it is, and so far it's pretty dang spot on.
The problem kayvaan brings up of "sophisticated" and "eclectic" taste are not that hard to overcome. At StyleHop.com we let users identify their fashion peers and show them the top ranked styles by that group. A women in suburban Des Moines may want to see the top ranked styles of women in her neigborhood. When her mood is different, she may not want to look like anyone else on her block and select women living in the East Village as her fashion peers. Whether eclectic or mainstream we surface a mix of products that meets her expectations.
For that small proportion of folks who truly have a style (or musical taste) all their own, they probably don't even like the premise of recommender systems. In my experience this small percentage of the population view the search process as creative expression and tools that automate that expression are not so interesting to them.
Richard regarding your last comment "Indeed we at ReadWriteWeb would love to get readers clicking around our site more to discover other content, and we've tried several plug-ins and methods to achieve this - but we're not satisfied yet."
i suggest you guys try www.outbrain.com i am sure you won't regret it (disclosure i am an investor)
David;
Why do you put sophisticated and ecclectic in "quotes"? ;)
I like the idea of search as creative expression.
Anyway, you say that the problem of ecclectic taste is not hard to overcome. I would argue that also depends on the product category. I personally think music is the most difficult category. Partially because musical styles vary so incredibly. I don't think I necessarily have a musical taste that's way out there -- it's just that it's super broad.
I like almost every genre, as long as the music is good (well-played, emotive, original, etc.). There's not a recommendation engine out there that I've found that can consistantly recommend music that's good but also varied.
If I enter music of one genre, the recommendations are within the genre or in nearby genres. The engine won't recommend bluegrass if I enter a hip-hop search.
Yet I love bluegrass and hip-hop and would love recommendations in either genre.
When it comes down to it, I think what I'm saying is recommendation engines don't work not when a person has a taste "all their own" but rather when a person doesn't have a genre-defined "taste".
So - I like music of any genre that's well-played, emotive, original. What do YOU recommend? ;)
I read an excellent post about the subject data vs algorithm, I could only "recommend" it : http://anand.typepad.com/datawocky/2008/03/more-data-usual.html
Hi,
There are some problems with recommender systems that are hard to solve generally (eclectic tastes).
The bootstrapping problem however arises mostly, because recommenders use purely external relations to the items and don't try to analyze/understand the actual content - for example movie, text or music.
For example: systems based on views&likes have troubles recommending stuff for "new content" - they can only observe external relationships which at the time of content creation don't yet exist.
However when you have enough 'rating' data (likes, views, anything), the algorithms for recommendations are pretty standardized across industry (take a look at Taste - http://taste.sourceforge.net/ if you want to play with your own ideas - full Java framework for development of recommender systems, with many of the algorithms already in there)
@richard: you might want to try Zemanta's related articles feature - there is a MovableType plug-in, but more importantly API can be used to do any kind of placement you want. You can setup the feeds you want recommendations from. So links point to your properties and the news sources that you cherish enough.
Andraz Tori, Zemanta
Hey Richard,
Really nice post!
At mSpoke, one of the things we've focused on is eliminating the challenges around "Changing User Preferences."
Basically, most systems require you to go back and change the actually data making the recommendations (ie Amazon remove items from your history) or give it additional new attention data (ie the Tivo example you referenced) as an input.
At mSpoke, our recommendation engine (http://www.mspoke.com/engine.html) actually expose the preferences to the user so they can go back and adjust them directly. While we typically work with content, to extend your example if the system learned you were interested in cooking gadgets from shopping for your wife, sister and mother .. instead of having to make manual adjustments to all 3 you could just click an option (when seeing future recommendations) that said 'don't show me cooking gadgets.'
Also, one other comment in response to your comment "Indeed we at ReadWriteWeb would love to get readers clicking around our site more to discover other content, and we've tried several plug-ins and methods to achieve this - but we're not satisfied yet." You should actually look a at the related content system NewsGator has built for publishers (http://www.newsgatorwidgets.com/relatedcontent.aspx) Full disclosure, some of our metadata at mSpoke is used in this product - but that said I think they've built a system that really finally achieves this obvious use case for recommendation engines.
-- Sean Ammirati
CEO, mSpoke
---
Note: I do contribute to RWW, but to the extent you can segment perspectives ... this comment is from my perspective as CEO of mSpoke -- not a RWW contributor.
Hi Kayvaan.
You might enjoy one of my old posts titled "All Women Want to Dress the Same" (http://tinyurl.com/d4rrhc)
Essentially my argument is that very few women dress in a way that's unique and all their own. Would love your feedback if you have any.
I think for a recommendation system to be good at what it does, it's fine to ask the user to do some basic legwork up front. In the music example, give the recommenders a chance by telling them first whether you are in the mood for blues or punk, for instance. Then, based on your previously stated preferences, it's likely the algorithm can show you a range of of things you might like.
Humans have to provide a level of specificity for computers to do their jobs. I don't think it's ever realistic for music algorithms to do cross-genre recos well. In my mind that would be like going to a kitchen recommender system and saying you really like these spoons and this coffee pot and this type of olive oil. Would we expect a recommender to be able to then say, "well you love this dishwasher"? I know it's a really silly example....my point is that recommender systems should focus on helping consumers search and sort more quickly. The problem on the web is too much choice and not enough filtering. Rather than build these tools for folks that like discovery, build them for folks looking for answers.
Richard,
First off, thank you for spawning one of the first informed discussions of recommender systems I've seen in the past few years. Your recognition of the massive investment Amazon has made to achieve their success is unique. Most people recognize their accomplishment as great, but few have dug into "how."
All of the points raised by you and the commenters above are also quite interesting--reminiscent of the halls of the Amazon Personalization team.
The grand map (mess!) of retail data have always been a challenge both in their shape (i.e., tons of power-curves) and their form (i.e., multitudes of semi-orthogonal dimensions).
I want to highlight three concepts which have not been addressed thus far and which are critical in differentiating high-quality and effective recommendations from the cruft. Not that the problem wasn't complex enough, but these are real--and are the reason that only a few companies have been able to break through the noise and recognize the full value of personalization.
1. The Consumer Experience
No one has commented thus far on this. Amazing, that just because we're talking about complex algorithms, we forget that the core consumer is well, a consumer. UI matters. A lot! Even simple things like explaining why a recommendation is made, or making recommendations navigable features instead of advertisements have been fundamentally ignored by most vendors.
2. Adaptive Algorithms
The data we look at are fundamental and critical, but they aren't sufficient. How we use the data even is not sufficient. We must learn from Reinforcement Learning. Algorithms like collaborative filtering are a thing of the past. With the next generation of technologies, such as Ensemble Learning, (which is what we use @ richrelevance) we can build systems that can address many of the problems highlighed above, for example identifying "eccentric" recommendations in real-time.
3. The Human Element--Intuition!
So many people treat recommendations like a pure science problem. Let's not forget that these are business tools that we're building--and we've got a great set of resources, the retailer's expertise, to leverage in making great recommendations. Intuition can never be replicated by a machine! At {rr}, we've invested thousands of man-hours in developing merchandising reports and controls.
I wrote a brief post on our blog for anyone interested...
Again, Richard, thanks again for starting a great conversation. And to those of you out there building recommendation engines--keep up the great work :)
--Selly
The kind of recommender system I'd love to find is one where all registered members of a social network can submit "tips" and vote at regular intervals (say monthly, then weekly as the size of the community grows) on their top 10 favorites - new that month. Tips would be tagged and have macro-categories + members can self-tag. Contributors of most popular tips each month would win prizes and badges for their profile in the member directory and more visibility on the site. That way it would seem the cream of the content could rise to the top...... hopefully
Anyone seen a site like that or have insights into the difficulty of this kind of recomender option?
Thanks all for your comments, fascinating topic!
Also to the folks who suggested apps I should check out for RWW (Ouriel, Sean, Andraz), thanks I will look at them all.
Dave Sellinger said...
With the next generation of technologies, such as Ensemble Learning
Dave, researches on various topics related to recommender systems which have been published in the computing literatures including Ensemble Learning are quite advanced. There are almost limitless algorithms and various methodologies that are available for implementers in their fingertips already available in the literatures.
New methodologies will continue to emerge and old ones will continue to improve because all that is matter is the retrieval accuracy to increase regardless whether it is Ensemble Learning, Support Vector, Multidimensional Reduction, Soft-Computing Hybrids, blah, blah, blah, the only that matters is the that is more accurate on the day compared to existing methods. No method can sustain a superiority forever, because new methods are being published (or developed if they're proprietary) and old ones are being abandoned. So, we will see accuracy of retrieval getting better over time.
No one has commented on two key features missing in most recommendation systems: transparency and control. No matter how well you think you can read my mind and guess what's in my heart, you're still guessing, and you'll never do as well as you could if you involved me in the process. But, in order for me to help you help me, you need to help me understand and guide what you are doing.
I blogged about this a few months ago:
http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/
Can't agree more Daniel. Explanation and feedback based on explanation is an almost open commercial area of recommender systems, although there are already scientific work done. But still a great idea for the commercial application.
Problem , imo, is "are people yet aware to recognize such an improvement as demandable?"
@ Daniel (#15)
Transparency and control may not have been explicitly mentioned in this post, or in the comments, but those are two things that Amazon nailed quite a while ago.
Whenever I log into Amazon and check my recommendations, every product has a "Why is this recommended for you link?" and I can control which of my recent purchases/things I reviewed/looked at/added to my wish list are used to make recommendations, and which aren't. Simple and very effective.
I think there are two broad types of recommendation systems: those that encourage specificity (eg, with the goal to get you to buy more by presenting you with things that exactly match your taste), and those which encourage serendipity. www.Everything2.com does a great job of the latter. Their wiki-like content is deep-linked, and the recommendations at the bottom of each article are based primarily on use paths through the site. It is incredibly effective at keeping users on the site. I think blogs and other sites that are more content- than task-oriented could benefit greatly from this type of recommendation system.
Richard,
I don't think Changing User Preferences is a problem of recommendation, that's a search problem. Buying either a gift or a book is based on searching request, recommender system can not predict your demand, all it can do is recommending books to you firstly if it knows that you love reading.
Or I am misunderstanding your idea, your point is that recommendation on different fields(movie,music,wood,wine,clothes etc.) is still a chanllenge?