<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2009://1.13476-</id>
  <updated>2009-11-23T17:43:38Z</updated>
  <title>Comments for Netflix Prize: Will the $1 Million be Won in 2009?</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=13476" title="Netflix Prize: Will the $1 Million be Won in 2009?" />
    <published>2009-01-22T02:06:51Z</published>
    <updated>2009-01-22T18:24:18Z</updated>
    <title>Netflix Prize: Will the $1 Million be Won in 2009?</title>
    <summary>We&apos;re starting a new series here on ReadWriteWeb about recommendation engines. We identified recommendations as one of 5 trends to watch at the start of 2008; and that&apos;s even more so at the beginning of 2009. We also have a page dedicated to recommendation technologies in our stock presentation entitled What&apos;s Next on the Web?....</summary>
    <author>
      <name>Richard MacManus</name>
      <uri>http://www.readwriteweb.com</uri>
    </author>
    
    <category term="NYT" />
    
    <category term="Recommendation" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img src="http://www.readwriteweb.com/images/netflix_prize09b.jpg" />We're starting a new series here on ReadWriteWeb about recommendation engines. We identified recommendations as one of <a href="http://www.readwriteweb.com/archives/toolkit-08.php">5 trends to watch</a> at the start of 2008; and that's even more so at the beginning of 2009. We also have a page dedicated to recommendation technologies in our stock presentation entitled <a href="http://www.slideshare.net/ricmac/readwriteweb-presentation-dec08-presentation">What's Next on the Web?</a>. In this post we look at <a href="http://www.netflix.com/">Netflix</a>; and in particular update you on the $1 million challenge that Netflix set in order to find 'the next big thing' in recommendations.</p>
]]>
      <![CDATA[<h2>Quick Refresher on Recommendation Technologies</h2>

<p>Before we check in with the Netflix Prize, let's refresh our knowledge of recommendations technology as it pertains to the Web. Basically the idea is that given a set of ratings for a particular user, along with those of the whole user base, a recommendation system will come up with new items that the user may like. Personalization is the driving force behind this, because the more new things a retailer or service can offer a user, the more chance the user will buy / like them.</p>

<p>In his influential post of 2 years ago, <a href="http://www.readwriteweb.com/archives/recommendation_engines.php">The Art, Science and Business of Recommendation Engines</a>, Alex Iskold suggested 4 approaches to recommendations:</p>
<ul>
  <li>Personalized recommendation - recommend things based on the individual's past behavior</li>
  <li>Social recommendation - recommend things based on the past behavior of similar users</li>
  <li>Item recommendation - recommend things based on the item itself</li>
  <li>A combination of the three approaches above</li>
</ul>

<p>The two Internet companies that have been most prominent in using recommendations are Amazon.com and Netflix. Others, such as Google, have used it as well - but more as a background enabling technology. </p>

<h2>Netflix Prize</h2>
<p>The <a href="http://www.netflixprize.com/">Netflix Prize</a> is a competition that Netflix - the U.S. online movie rental service - began on October 2, 2006. Its aim is to &quot;substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences.&quot; A prize of $1,0000,000 was put up by Netflix for a third party to come up with a collaborative filtering algorithm that will improve Netflix's own recommendations algorithm (called Cinematch) by a baseline of 10%. The contest has been going for over 2 years now, with no grand prize winner yet. However the latest <a href="http://www.netflixprize.com/leaderboard">leaderboard</a> shows that a group called  <em>BellKor in BigChaos</em> is closing in on the magical 10% - as of writing they are at 9.63. There are currently 7 competitors who have gone over 9%, the second best being PragmaticTheory with 9.46%.</p>
<p><img src="http://www.readwriteweb.com/images/netflix_prize_leaderboard_jan09.png" /></p>
<p><a href="http://www.commendo.at/prize08/team.html">BellKor in BigChaos</a> is a partnership between a group of current and ex AT&amp;T researchers (two of them still working at AT&amp;T Labs in New Jersey) and a company called Commendo Research from Austria. They were the recipient's of Netflix's 2008 Progress Prize, with a 9.44% improvement over Netflix's Cinematch algorithm. Netflix is awarding a $50,000 progress prize  every year until the 10% goal is met. </p>
<p>The New York Times had <a href="http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=2&ref=magazine&pagewanted=all">an extensive profile</a> of the Netflix Prize in November. The piece notes that Netflix's current algorithm, Cinematch, was introduced in 2000 and has since gone on to be a driver for 60% of Netflix's rentals. What's more, it's also a boon for The Long Tail, because as NYT stated &quot;it also often steers a customer's attention away from big-grossing hits toward smaller, independent movies.&quot; 70% of what Netflix customers order are from the long tail - &quot;older movies or small, independent ones.&quot; In 2006, Netflix noted that  Cinematch's improving performance had plateaued. So it released data for third parties to try and come up with improvements to Netflix's own recommendation engine. As of November 2008, the data was made up of 17,770 movies with ratings by 480,189 users. </p>
<p>Netflix is also busy on other fronts to tap into 'the wisdom of the crowds' - in late September 2008 it <a href="http://www.readwriteweb.com/archives/netflix_api_launches_tomorrow.php">released its much anticipated API</a>, available at <a href="http://developer.netflix.com/">developer.netflix.com</a>. An example of the type of application this may encourage is <a href="http://feedflix.com/">Feedflix</a>, a third party app that we <a href="http://www.readwriteweb.com/archives/feedflix.php">profiled last April</a>. It offers a variety of useful data that may help Netflix users select better movies.</p>
<p><img src="http://www.readwriteweb.com/images/feedflix-stats.jpg" /></p>
<h2>Will the 10% Mark be Reached in 2009?</h2>
<p>It's difficult to say whether the $1M Netflix Prize will be finally won in 2009. On the positive side, the current leader is only 0.37% away from claiming the prize. The NYT article suggested that the top 10 on the leaderboard all use very similar mathematical theories (&quot;singular value decomposition&quot; being the main one) and that differences between the teams are merely &quot;tweaks&quot;. There's a sense  though that to reach the magical 10% mark will require a <em>breakthrough</em>, rather than continued incremental improvements. The problem appears to be eccentric movies, the type that people either love or hate - such as Napoleon Dynamite. According to NYT, &quot;a small group of mainly independent movies represents more than half of the remaining errors in the way of winning the prize&quot;.</p>
<p><img src="http://www.readwriteweb.com/imgClerkDogs.gif" align="right" />Some people think that the 10% ceiling will not be reached using algorithms. <a href="http://www.clerkdogs.com/">ClerkDogs</a> is a service that <a href="http://www.readwriteweb.com/archives/clerk_dogs_movie_recommendations.php">we profiled in December</a> and its approach is to hire real-life former video store clerks to &quot;create a database that is much richer and deeper than the collaborative filtering engines.&quot; In other words, it's the <a href="http://www.reuters.com/article/pressRelease/idUS150722+09-Dec-2008+BW20081209">opposite principle</a> to what Netflix is trying to do with computer algorithms. Founder Stuart Skorman thinks that the Netflix algorithmic approach to matchmaking has reached a ceiling; and that the only thing left to do is bring humans into the equation. He says it's a similar approach to Pandora, which has 50 employees who listen to songs and tag them. Skorman knows a thing or two about the online movie rental industry, having founded Reel.com in the mid-90s and sold it 3 years later for $100 million to Hollywood Entertainment. </p>
<p><img src="http://www.readwriteweb.com/imgClerkDogRecommendation.jpg" /></p>
<p>We hope the prize is claimed this year, as a 10% increase in recommendation effectiveness on Netflix is a big improvement that will benefit consumers. But we also think the human element that ClerkDogs is advocating will be an essential piece of the puzzle going forward - expert human content is always the most valuable, although it generally costs more too. Perhaps Netflix will end up buying ClerkDogs? That would be an interesting mashup!</p>
<h2>ReadWriteWeb Resources for Recommendation Technologies</h2>
<p>We will be profiling other recommendation companies in upcoming posts. We also invite you to explore using our custom <strong>ReadWriteWeb Resources</strong>:</p>
<ul>
  <li><a href="http://www.readwriteweb.com/RWWRecommenderfeeds2.opml">RWW Recommendation Industry Feed Favorites OPML file</a> (save link)</li>
  <li><a href="http://feeds.feedburner.com/ReadwritewebRecommendationFeedFavs-Aiderssbest">RWW Recommendation Industry Feeds - Best of Feed</a> (copy and paste to your reader)</li>
  <li><a href="http://snipurl.com/1wcbs" target="_blank"><em>Click to preview the above feeds before subscribing</em></a> (pop-up window)</li>
  <li><a href="http://www.google.com/coop/cse?cx=000893276566003557773%3A5w3dmryrdru">RWW Recommendation Site Search</a> (Visit and Bookmark)</li>
</ul>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476-comment:124028</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.13476" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php#c124028" />
    <title>Comment from Falafulu Fisi on 2009-01-21</title>
    <author>
        <name>Falafulu Fisi</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Richard, </p>

<p>I myself and 3 others (2 from Wellington and 2 from Auckland software developers) were going to register a NZ team to participate on Netflix 2008 $1 mil competition and I was to lead the team. I told the others after a week since I had been alerted to it, that we had no chance at all, so we didn't register.</p>

<p>I was alerted to the competition from a Java architect from Wellington, wanting to form a NZ team. I knew various existing & new algorithms that have been published in the computing literatures over the last few years that are used  for online recommendation, so I thought that I could just pick a superior one (ie, one with the least error rate) and develop it for the competition. </p>

<p>Since none of us do original research (however we do implement algorithms that are published in the literatures), I told the other 3 that to participate in the competition we have to invent something that is totally original (ie, never been published before) and not just pick a superior algorithm from the literature and hope that other competitors won't pick the same one as we do for the competition.</p>

<p>I saw the Netflix's competition being circulated in one of the DSP (digital signal processing) mailing list that I am  subscribed to, and that finalized my decision not to participate, since all the members of this DSP list do original research (ie, they can invent new algorithms), therefore it wouldn't be a surprise if the majority of the DSP list members all register to participate. If we (NZ team) did participate, there wouldn't be any chance at all against those DSP R&D participants (assuming they did participate).</p>

<p>I also saw the competition being advertised in the KDD Knuggets mailing list and website (a site for data-mining researchers) and from then on, it confirmed to me that all the powerful brains in the world are going to get involved in this competition, so a small team from NZ wouldn't have a chance at all.</p>]]>
    </content>
    <published>2009-01-22T05:38:19Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476-comment:124075</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.13476" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php#c124075" />
    <title>Comment from Phoebe on 2009-01-22</title>
    <author>
        <name>Phoebe</name>
        <uri>http://www.jinni.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.jinni.com">
        <![CDATA[<p>Our approach at Jinni has also been that more data beyond people's ratings history is needed to really improve recommendations. (We wrote about this on our blog <a href="http://blog.jinni.com/2008/11/if-i-like-this-why-will-i-love-that/)." rel="nofollow">http://blog.jinni.com/2008/11/if-i-like-this-why-will-i-love-that/).</a> We generate personalized recommendations based on our semantic catalog - analyzing the movie plots, moods, styles, etc. that each person enjoys. Try it out and see what you think  - <a href="http://www.jinni.com." rel="nofollow">http://www.jinni.com.</a></p>]]>
    </content>
    <published>2009-01-22T18:36:36Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476-comment:124098</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.13476" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php#c124098" />
    <title>Comment from Falafulu Fisi on 2009-01-22</title>
    <author>
        <name>Falafulu Fisi</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Phoebe said...<br />
<i>Our approach at Jinni has also been that more data beyond people's ratings history.</i></p>

<p>Academic/Industry researches has already covered this domain in the last couple of years or so and there is no doubt that  it will improve over time (ie, getting more accurate - low classification error). The majority of online recommendations today are based on 2D data matrix (ie, rows & columns) of user by item-rating, such as the example below. The example has a database of 6 users that give ratings on 4 different movie items (perhaps on Star-Wars, Alien, Predator and Indiana-Jones). Rating scale ranges from 1 to 5. User1 rates Star-Wars=2, Alien=3, Predator=1 and Indiana-Jones=2.</p>

<p>User1:  2,  3, 1, 2  <br />
User2:  5,  1, 2, 4<br />
User3:  4,  5, 4, 1<br />
User4:  1,  3, 5, 2<br />
User5:  2,  5, 5, 5<br />
User6:  5,  3, 1, 4</p>

<p>This data matrix is then run thru different algorithms such as reducing the data matrix dimension from [6 rows by 4 columns] into a lower one, such as [6 rows by 2 columns]. <a href="http://en.wikipedia.org/wiki/Data_clustering" rel="nofollow">Clustering</a> will follow to identify groupings of similar tastes from movie-goers ratings. When  one searches a movie item from the database, the recommender system will try and locate the set of nearest clusters to the target movie query title. It then retrieves those clusters/items to the user as the next best match.</p>

<p>There are tons of algorithms available today for doing recommendation, my example above just described one way of doing it.</p>

<p>The next revolution will come from multi-dimensional dataset recommendation, ie, dataset that is more than just 2D matrix (User-by-ItemRating). Dataset will be 3D, 4D, 5D and so forth. This is based on <a href="http://en.wikipedia.org/wiki/Tensor" rel="nofollow">Tensor Calculus</a> (a.k.a - multi-linear algebra). Tensor is not a new subject, it has been around for over around 100 years, but it is only recent that researchers have realized its potential use for data-analysis apart from their use in Physics/Enginnerings. Einstein use tensor to develop his General Theory of Relativity around 1913.</p>

<p>Here is a paper that Tensor algorithms is used for recommendations, that appeared in <i>Proceedings of the 2008 ACM conference on Recommender systems</i>, with the title:</p>

<p><a href="http://portal.acm.org/citation.cfm?id=1454008.1454017" rel="nofollow">Tag recommendations based on tensor dimensionality reduction</a></p>

<p>I went to the Jinni website just to have a look and the description of their system, doesn't seem to be anything new that is not yet covered in the literatures yet. It may be to others who don't follow the computing peer review literatures but for those of us who do follow them, there is nothing ground-breaking there in what Jinni is doing. I am not saying the Jinni can't be up there and competitive/innovative and good luck, but sometimes I am alarmed at how easy VCs could be mesmerized by good messaging words of companies in how they describe/promote  their products. It is new to VCs, but its old-hat to those of us who follow the trend in academic/industry technology researches & publications.</p>

<p>By the way Phoebe, here is the full table of content for the <a href="http://portal.acm.org/toc.cfm?id=1454008&type=proceeding&coll=GUIDE&dl=GUIDE&CFID=19282164&CFTOKEN=36019957" rel="nofollow">Proceedings of the 2008 ACM conference on Recommender systems</a>, just pass on the link to your chief technology officer to take a look if he/she finds something interesting in there to implement. Only abstracts of each article can be viewed online, however the full paper/article can be purchased online from ACM. The authors of each paper have their email contacts available there, so they can be contacted for queries related to their algorithms, regarding implementation issues or just requesting clarifications in their algorithm derivations, etc... </p>

<p>Authors are always keen to help someone who is interested in implementing their published algorithms if questions are being forwarded to them and this has always been my experience (authors are generous).<br />
</p>]]>
    </content>
    <published>2009-01-22T20:50:01Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476-comment:124128</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.13476" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php#c124128" />
    <title>Comment from Andraz Tori on 2009-01-22</title>
    <author>
        <name>Andraz Tori</name>
        <uri>http://www.zemanta.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.zemanta.com">
        <![CDATA[<p>@falafulu Fisi<br />
Yeah, new methods are exciting, but VCs are not so naive as you present them. Among other things ours hired an academic from the field to evaluate our knowledge and our algorithms before investing.</p>

<p>And also VCs don't always need to invest in the 'perfect' technology. They invest in technology that does the job + business plan + people capable of executing.</p>

<p>bye<br />
Andraz Tori, Zemanta<br />
</p>]]>
    </content>
    <published>2009-01-23T00:24:26Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2009://1.13476-comment:124174</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2009://1.13476" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/netflix_prize_2009.php#c124174" />
    <title>Comment from Phoebe on 2009-01-23</title>
    <author>
        <name>Phoebe</name>
        <uri>http://www.jinni.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://www.jinni.com">
        <![CDATA[<p>Falaful Fisi, thanks for your comments. To clarify, our recommendations are based on semantic information and not solely on ratings. In describing what "Academic/Industry researches has already covered in the last couple of years," you refer to ratings-based collaborative filtering methods that are different from semantic approaches like ours. The Jinni team includes world-class scientists who are engaged with new developments in their fields.</p>

<p>Semantic approaches have several advantages compared to collaborative filtering, including eliminating cold start problems (ie, needing a large number of users and rating data before quality recommendations can be generated), mostly avoiding irrelevancy, and offering a better discovery tool for long tail content. Our approach is also a basis for meaning-based search as well as recommendations. </p>

<p><i>"Yeah, new methods are exciting, but VCs are not so naive as you present them."</i><br />
Thank you, Andraz Tori, I think you are right.</p>]]>
    </content>
    <published>2009-01-23T13:50:06Z</published>
  </entry>

</feed>