Written by Guest Blogger Emre Sokullu and edited by Richard MacManus.

Personalized Content is one of the two most popular approaches in next generation news sites - the other is Power of Masses, which we will cover in a future post. The leading examples of these approaches are reddit for Personalized Content and digg for Power of Masses. In this article, we will cover the personalized content approach and in particular reddit. We will describe the technical details and compare existing personalized content solutions.
First a brief technical explanation: the Personalized Content approach uses a very similar technique to spam detection software. The idea is that everyone has their own pattern of reading. To recognize your pattern, Personalized Content services omit stopwords and extract keywords from the news you read - then use Bayesian Statistical analysis to predict what kind of news you will like or dislike in future.
Reddit, backed
by Paul Graham's Y Combinator startup program, is
the leading player in this field - and has put a lot of effort into having the best
algorithms. Reddit has tried out 2 languages to
achieve optimal results. They started with Lisp, which is known as a very suitable
programming language for artificial intelligence and natural language processing
applications. But then they turned to a more widely used language in the web 2.0 world,
Python.
However, as the dharmesh.com site explains in detail, many users still complain about not receiving relevant news recommendations. This might be a bad sign, because it shows that their pattern recognition technology doesn't seem to work in some cases - even in a limited pattern span. Nevertheless, reddit appears to be on the right path - the latest code changes received positive signals from their community.
But competition is heating up for Reddit. For instance, an
Israeli startup called Spotback
targets a wider audience and offers a more attractive, Digg-like user interface. Their
job is harder though, as they're covering a greater span of news. See Techcrunch's recent
review
of Spotback for more details.
Some sites are taking a wider approach to personalized news.
Instead of personalizing news flowing just within their site (as Reddit does), they try
to personalize external RSS feeds. As a result, their algorithms span much wider -
because theoretically this means they can personalize news sites, blogs and more. A
pioneering company in this area was SearchFox, which was almost immediately acquired
by Yahoo in January. SearchFox enabled you to personalize your RSS feeds. Indeed its
flexibility may allow Yahoo to integrate this technology into every corner of their
network.
Personalized Start Pages (like Netvibes and Pageflakes) are also in this space, because feed filtration can be a differentiating factor for them. Imagine a start page full of your favourite widgets, RSS feeds and tools - but you see not all the news flowing from your favorite sites, only a smaller filtered set of relevant news items. However we have yet to see a working, satisfactory prototype of this.
Greece based Feeds2.0 and San Jose based LeapTag (which was just launched in the latest DemoFall) are tackling the same "machine learning" problem of personalized news, from different perspectives. Feeds2.0 is doing exactly the same as SearchFox, filtering RSS feeds. LeapTag is still in private beta and does link recommendation via their downloadable browser plug-ins.

Feeds 2.0 process
Let's also not forget one of the longest running personalized news sites of this era - Findory. It aims to be a personalized newspaper for the Web. Findory creator Greg Linden is an insightful commentator on personalized news issues and he says it is a technically challenging space. As he noted at the time SearchFox was acquired:
"Building scalable personalization systems is hard. Techniques that work fine on toy problems completely break down at scale. The systems have to be designed from the start to do fast recommendations in real-time for hundreds of thousands of users."

Findory process
The graph below shows the current Alexa traffic of the following personalized news sites: reddit, Spotback, Feeds2.0 and Findory. It should be noted that each of these sites has a slightly different focus, nevertheless it is clear that reddit has the most traffic.

The next graph shows that reddit, the leader in personalized content, is far behind Digg (the leader of the Power of Masses approach). Therefore, we can say that personalized content still has a long way to go.

Our guess is that personalized content will become a more popular paradigm in about 1 to 2 years, provided of course that the technical challenges can be overcome. Which is by no means certain, since a lot of smart developers think that personalized content is a huge challenge.
Personalized news has a couple of main attractions. Theoretically, if your news is personalized then it's not as vulnerable to gaming as the power of masses approach. Plus people are getting busier everyday, so personalized news has a strong appeal as a potential solution for information overload.
We're not sure who will end up being the key player in this space - maybe a giant like Google, maybe an existing startup like reddit, or maybe a whole new startup. But one thing we're sure of: the current personalized news services still need more work and the technical issues around personalizing content are far from solved.
Listed below are links to blogs that reference this entry: Personalized News: A Market Overview.
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2828
There is a good high-level overview article by Emre Sokullu and Richard MacManus from Read/Write Web on Personalized News. From the article: "Personalized Content is one of the two most popular approaches in next generation news sites - the other is Po... Read More
This past week we had some excellent posts and discussions on Read/WriteWeb. In fact I was literally exhausted by the end of the week and it's taken me a couple of days to recover from all the goings on :-)... Read More
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
Best article I've ever read :) :P
Posted by: Emre Sokullu | October 18, 2006 2:16 PMHow reliable is Alexa for judging things as per in the article?
Posted by: Juha | October 18, 2006 6:33 PMYes the usual 'your mileage may vary' disclaimer about web stats applies. I think in this case it's the high level trends that are worth noting: reddit's huge lead over the other PN sites, digg's huge lead over reddit. Drilling down more than that with alexa data is always dangerous ;-)
Posted by: Richard MacManus | October 18, 2006 6:41 PMJuha: I think Alexa is pretty trustable here. It is bullshit after 10K but it has good results for top tier sites like Digg and reddit. Moreover, these two have similar audiences, so we don't compare apples with oranges here. Even we can say, the difference between them should be more in favour of Digg, because Digg is not only a geek site but they're more mainstream with the 3rd release.
Posted by: Emre Sokullu | October 18, 2006 7:11 PMNo, Alexa is useless for anything concerning Digg, and these stats are massively over-estimating its size. Look at the sudden jump in Digg's ranking in April - that was caused by a bunch of Digg users downloading the Alexa toolbar. If you ignore the jump caused by this event, you see that Digg is still bigger, but not by such a startling degree.
Posted by: Pete Cashmore | October 18, 2006 8:21 PMPete, thanks for informing about the background of this jump..
But it's still the best tool to measure web traffic and many organizations, including top-tier VC firms, rely on it. What else do we have in hand, how can this be solved? No way, it seems..
That jump can be a special case only for Digg, no other big sites have so attached community. Adidas.com can't advertise or call its visitors to use Alexa. Or maybe Google will never advertise Alexa Toolbar instead of Google Toolbar.. But a Digg may make it to frontpage. So Digg is an exception, but yes it seems it was a bad idea to use them here.
Posted by: Emre Sokullu | October 18, 2006 8:47 PMI think that the key to using Alexa correctly is to compare things. In other words its a good RELATIVE measure. By itself the number for a site is not as meaningful.
I also think that page views metric is best, as it is not perturbed like rank.
Alex
Posted by: Alex Iskold | October 18, 2006 8:56 PMBy the way, I should confess: I love the slogan of Spotback, I find it very successful; short but explains it well.. "Stop reading irrelevant news"
Posted by: Emre Sokullu | October 18, 2006 9:43 PMPete, if you look at Google Trends you'll see a similar trend:
http://www.google.com/trends?q=digg%2C+reddit
Re traffic, based on reported figures digg does 8 million monthly uniques (ref: http://www.federatedmedia.net/authors/digg)
The same source (FM) says that Reddit does 800,000 monthly uniques (ref: http://www.federatedmedia.net/authors/reddit)
So 10 times the traffic... which is pretty much what Alexa says. Hmmmm.
Posted by: Richard MacManus | October 18, 2006 10:03 PMFor my news feeds - I use dotso.com
You can't customize it at the moment, but it gives all the main syndicates I am interested in.
Posted by: Bjorn Tarper | October 18, 2006 10:47 PMFelicitations pour ton premier blog entry ici;)
Posted by: Honor Gunday | October 18, 2006 11:20 PMBiraz international olsun diye Fransizca yaziyim dedim... simdi bir de Japon bloggerlar filan Trackback yaparlarsa bu yaziyi tamamdir;)
Kendi blogun da guzel olmus bu arada ve grou.ps'u kullaniyor olman cok iyi olmus 10/10.
The correct link for "feeds 2.0" is www.feeds2.com and they are right now on a private beta.
Posted by: George S. | October 19, 2006 3:21 AMThanks George, fixed that url.
Posted by: Richard MacManus | October 19, 2006 3:26 AMExcellent article! But I think you're forgetting one important player: myFeedz, the social newspaper.
Posted by: Marius | October 19, 2006 6:13 AMhttp://myfeedz.com/
It was launched as private beta in March 2006 by Romanian start-up company InterAKT Online. The company has been acquired by Adobe in the meantime, and myFeedz will hopefully be featured on Adobe Labs, as you can read here:
http://www.interaktonline.com/FAQ/#Q5
myFeedz is currently in public beta since August 15, and aims to offer personalized news, which are indexed via RSS feeds and automatically tagged by its engine.
You can read on its development on the myFeedz blog: http://www.interaktonline.com/FAQ/#Q5
The fact that Adobe decided to continue this technology on Labs should sparkle some interest in the application (which can and should stand up to all its competitors you mentioned).
The application also offers a pretty solid API, making it perfect for integration with popular blogging engines.
If you want to see the future of personal news, go here:
http://neurokinetikz.com
Posted by: neurokinetikz | October 19, 2006 6:27 AMWow. It is one of the best articles that I have read. Good comparison, realistic facts and figures.
Posted by: managed dedicated server | October 19, 2006 7:19 AMGreat Article. You left out two sites I think are very important. The first is Clipmarks (www.clipmarks.com) and the other is newsvine.
Posted by: Rob | October 19, 2006 7:03 PMRob, Newsvine is just power of masses, similar to Digg and Netscape. Clipmarks is not personalized content either, it seems.
myfeedz seems on that field though. I would like to cover it but I wasn't aware. Their popularity is still low and the web site seems very slow but I wish best of luck to them too. Someone should tackle this problem, I don't want to see irrelevant news in my startpage anymore. I'm sure I'm not alone in this.
Posted by: Emre Sokullu | October 19, 2006 9:59 PMIf I may do a bit of shameless self promotion - we have been talking about the need for a personal experience of the web for a little while now - and actively working to solve the problem. You can find a post I have made about it recently here:
http://www.touchstonelive.com/blog/2006/08/people-powered-news-done.html
Also I am hoping that all the services mentioned will collaborate around a standard called APML which, like OPML, will allow users to export their 'personal profile' and migrate it from service to service.
You can find out more about APML at www.apml.org
Posted by: Chris Saad | October 19, 2006 10:15 PMChris, I'm a big OPML supporter too. Check out open source PHP OPML APIs we released at Grou.ps: http://trac.grou.ps/wiki/OPMLWriter
Also advocate Microformats and OpenID. Tantek Celik (CTO of Technorati) backs all these very strongly; his blog is http://tantek.com
Posted by: Emre Sokullu | October 20, 2006 12:58 AMOops! The myFeedz blog link I sent was incorrect. The correct one is:
Posted by: Marius | October 20, 2006 2:45 AMhttp://blog.myfeedz.com/
We're working on improving the performance (we are aware of the issues). As for popularity, myFeedz is still just 3-month old. ;)
Thanks!
Great to hear Emre - if you or anyone wants to be involved in the APML workgroup please drop me a line!
Posted by: Chris Saad | October 20, 2006 5:00 AMNice article, Emre and Richard.
Just a brief clarification. Not all personalized news sites use Bayesian Statistical analysis over keywords.
Findory, for example, uses a form of social filtering where Findory readers anonymously and implcitly share the articles they find and enjoy with other Findory readers.
Oversimplifying a little, it is a bit like Digg except that rather than seeing a front page of the generally most popular articles, you see a front page of the articles that are most popular for a group of people who seem to share your interests.
Posted by: Greg Linden | October 20, 2006 10:03 AMWhaoh... not including Tailrank is a fairly large omission.
Posted by: Kevin Burton | October 21, 2006 9:35 AMA few more notes.
Tailrank has an open API so if you're an RSS reader you can incorporate personal recommendations based on our tech.
Also, Greg is right. The mathematical technique isn't really a requirement here. Bayes theory is only one approach.
Comparing Reddit to the other players here is apples and oranges. I'd be willing to bet $20 that the majority of Reddit's traffic is to the home page which isn't personalized.
This isn't necessarily bad for Reddit of course but unfair to the other personalized news engines.
Posted by: Kevin Burton | October 21, 2006 9:42 AMA few more notes too !
Nice report ! I really enjoyed. Talking about startpages, between goowy, pageflakes and netvibes I prefer pageflakes because of their new version 2.0. Now it's really nice, colors, templates, and the gallery. The best thing i see on it it's the sharing feature.
Good Day Guys !
Rafa
Posted by: Rafael Zina | October 22, 2006 4:35 PMVery interesting article and seems to reflect much of my gut feelings about the need.
Looking at the primitive search most services seem to offer-- including Google. I was asked "Can't you write one?".. and thought.. Sure.. Instead of sitting back and getting frustrated and angry about how some of the mainline players collaborate with totalitarian governments to filter news.. I thought.. Then DO IT.
So we started to work on http://www.ibu.de (IBU News).
On the search side-- as a developer of search technology-- its quite feature rich. One of my favorite human-interface features is [Scan]... Its a kind of wordwheel/autocompletion but a whole lot more.. just start typing in the Scan field (opened with [Scan]).. like bag.. (Yea.. its AJAX at work) Even with wildcards like h?zbollah. or *anon or .. and even in named paths in the RSS tree like TITLE of an ITEM of a CHANNEL (RSS\CHANNEL\ITEM\TITLE).. Lets one discover new words like "hizbollywood" in the news.. Its all "realtime". No canned dictionaries or anything else.. and as news is being continuously updated.. the words continuous change..
And one can do a lot of powerfull searches like.. looking for articles with terms in the same field instance.. or..(and its not just about words but other objects such as numbers, dates, geospatial etc.)
Anyway.. Its all early in development and a LOT of features still need their human interfaces..
I'd really LOVE some feedback, suggestions.. comments..
P.S.: This is really the first time I've mentioned in larger public the development (even existance)...
Edward C. Zimmermann
Posted by: Edward C. Zimmermann | October 23, 2006 3:55 AMNONMONOTONIC Lab of BSn
http://www.nonmonotonic.net