algorithm - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/algorithm en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sun, 22 Nov 2009 19:36:29 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss New Twitter Anti-Spam Bot Causes Chaos Twitter Anti-Spam Bot Punishes Community Managers and Causes Follower Counts to Drop

Did you notice a big drop in your Twitter follower numbers yesterday? It seems that the Twitter team recently decided to step up their Twitter spammer detection, and, in typical Twitter fashion, their algorithm sent the service haywire, leading to yet another sighting of the Fail Whale while the issue was resolved. Meanwhile, Twitterers everywhere were in an uproar over their lost follower counts.

]]>Sponsor

]]> Spam Detection Goes Too Far

Earlier this week, ZDNet reported that many Twitter users were no longer able to add followers thanks to the new limits put in place to discourage spamming. Unfortunately, this action caused some major trouble for community managers, like Pandora's Lucia Willow, for example, who stated her case over on Get Satisfaction. In addition to Pandora, Comcast, Jet Blue, and several others were also affected. In order to add new followers, they had to delete older ones - not a good idea for those that want to stay tuned into their community.

In addition to causing problems for community managers, there were even some cases of follower limits placed on those that had a 1:1 Twitter ratio. And although Twitter has not confirmed the cause of the dropped follower counts, it's likely that the the new anti-spam bot is to blame.

As we wrote earlier this year, many companies are using Twitter for customer service, meaning that they will be following people at higher rates than regular Twitter users due to the fact that they follow back those that follow them. This is certainly a legitimate way to use the service and one that should not be punished through a blind algorithm that can't distinguish a community manager from a spammer.

While we appreciate the fact that the Twitter team is fighting the spam problem (an "ongoing battle," says Biz Stone), you would think that they would have considered this potential ramification to implementing their new pattern-detecting technology. It's almost as if Twitter themselves do not even know what would constitute someone being a spammer. If that's the case, they should ask the community for guidance before rolling out a brand new anti-spam bot.

Ironically, in the midst of these issues, a post on the Twitter Blog on Wednesday was about a new Twitter app, TwitterCounter, that lets you track the number of followers you have on Twitter.

All we can say about that is...well...this may have not been the best time to release that news.

Were you affected by the follower limits? Tell us your story in the comments (or just share your thoughts on this issue!)

]]>Discuss]]>
http://www.readwriteweb.com/archives/new_twitter_anti-spam_bot_causes_chaos.php http://www.readwriteweb.com/archives/new_twitter_anti-spam_bot_causes_chaos.php Trends Thu, 24 Jul 2008 06:00:00 -0800 Sarah Perez
The Decline and Fall of Quality on Digg If you're even peripherally involved in the social news space you are probably familiar with the rather rocky relationship that Digg has with its core community. Fueled partly by a need to counter false accusations from disgruntled community members who claim that Digg is rigged (i.e. that a core group of users decide what content is promoted), partly by the desire to encourage non-core members to participate more passionately, and partly by a need to affect a level of diversity and equality that would appear promising to potential acquirers, Digg has changed its algorithm again and again to artificially favor certain categories over others (i.e. world news and politics over technology) and to favor relatively new users over long-time, active users.

]]>Sponsor

]]> This is a guest post by Muhammad Saleem, a social media consultant and a top-ranked community member on multiple social news sites.

Let me assure you that this is not just another rant about how top users aren't being treated fairly. This post isn't about top users or new users, it is purely about the quality of the content on the front page of Digg, and the causes for the decline in the quality of that content. Chief among those causes is the lack of transparency and the imbalance in the algorithm that favors certain users over others and ultimately results in diversity, but poorer (if not poor) content. To understand the change and the effects of the change I analyzed the available data on different users' submitting habits, the Digg algorithm's promotion habits, and the reaction of the community to the content that is ultimately promoted.

The metrics used to gauge content popularity are quite straightforward: the absolute number of Diggs per story and the absolute number of comments per story. The Diggs determine how many other community members - the ones that don't vote to make the story popular but find it worthwhile once it is promoted - like a story, and the comments determine how much engagement and conversation apart from Digging, the story generates. The data used here is for the 20 most prolific users but when I tested for the top 50 the results were similar.

Let's first look at the user rankings based purely in terms of the number of stories promoted in a 30 day period. The users are ranked so that the person with the most number of stories promoted appears first and the one with the least number of stories promoted appears last, the idea being that the person with the most quality content gets the most attention, or more appropriately, that person's content gets the most attention (the most basic principle behind all of Digg and the content promotion algorithm). The ranking is as follows:

Now let's look at the same rankings (i.e., users are ranked by the number of stories promoted), but look at how many Diggs an average story from those users gets once it is promoted. If the algorithm works well in determining quality and is not flawed or artificially helping some users over others, the graph should be exactly the same as above and the people with the most stories promoted are also the people that get the most Diggs per story (because the algorithm is only promoting the best of the best and nothing else).

As you can see, the graph is not the same. What you see on the other hand is that there are some users whose stories are promoted more frequently even though they don't perform as well, while there are other users whose stories consistently out perform but aren't promoted as frequently. What this means is that some users have an easier time getting their content promoted (for whatever reason) but once the content does get promoted, it largely falls flat on the front page.

Now let's look at how the promoted content engages the community in conversation. Again the users are ranked by the number of stories they got promoted (as in the first graph).

Again you can see that some users have an easier time getting their content promoted to the front page but once the content gets there, people aren't really that interested in talking about it. Others, however, have content that everyone has something to say about (even though the algorithm won't let them get stories on the front page as often)

And finally, let's look at the number of stories promoted and the number of Diggs received per story (average) on the same chart (note I had to multiply the promotion data by 10 so the graph would be visible compared to the Diggs per story):

As you can see there is a huge disparity in the number of stories promoted and how viral those stories go. Ultimately, what's happening is that people whose stories almost always get 1200+ Diggs are getting throttled (for whatever reason) whereas people whose stories are routinely getting under 1000 Diggs once promoted, are being favored. It doesn't matter if we're taking about top users or not, because the end result is that lower quality content is promoted to the front page more often than content that performs better after promotion. Case in point, why is it that user "zaibatsu" - whose average Digg-per-story number out paces any Digg user in history (1775 Diggs per story) - gets on average only one story promoted per day while users getting less than 700 Diggs per story get promoted multiple times per day?

Note: This isn't meant as an attack on any user and his or her submissions. All the users mentioned and displayed on the graphs are my friends and I respect their submissions. This post is just meant to point out the flaws of Digg's content promotion algorithm.

]]>Discuss]]>
http://www.readwriteweb.com/archives/the_decline_and_fall_of_quality_on_digg.php http://www.readwriteweb.com/archives/the_decline_and_fall_of_quality_on_digg.php Trends Thu, 01 May 2008 18:10:21 -0800 Guest Author
Alexa Updates Its Web Rankings - Still Not Good Enough Amazon-owned Alexa has announced a major update to its 10 year old web ranking system. Previously, Alexa's rankings were based solely on data collected from the downloadable Alexa Toolbar, but now the company is aggregating data from multiple sources. That's good news, but it may be too little, too late for a company whose rankings have faded in relevance in recent years.

]]>Sponsor

]]> Alexa launched its web site rankings in 1998 based on data from its toolbar software. In the late 90s and early part of this decade, Alexa was more or less the only place people could turn for public ranking data on the web at large, and so their rankings -- though often times inaccurate -- were widely quoted. At the time, unless you wanted to pay for data from firms like Nielsen, comScore, or HitWise, it was Alexa or nothing. Alexa rank became a metric that people actually paid attention to and took seriously.

But in recent years, that has changed. Alexa now faces competition from Compete, which launched a similar public service in 2006 (our coverage), and from Quantcast, which was founded in 2005. Both of those companies gather data from numerous outside sources and their rankings are generally seen as more accurate than Alexa's.

"In recent months we've heard from our Alexa users that understanding Internet usage beyond Alexa Toolbar users was increasingly of interest," wrote Alexa in the announcement of their rankings overhaul. Recent months? The inaccuracy of the toolbar-based rankings has been discussed for years, which is why we think this might be too little, too late for Alexa.

Beyond the problem of public perception, Alexa also still displays their data in non-standard ways. The hard-to-understand pageviews per million, reach per million, and rank are not easily compared to other data sources, which makes Alexa's information less useful than it could be, even if it is presumably now more accurate.

Historical data on Alexa is currently only available for the past 9 months while the company recalculates old data with its new ranking algorithm.

]]>Discuss]]>
http://www.readwriteweb.com/archives/alexa_updates_its_web_rankings.php http://www.readwriteweb.com/archives/alexa_updates_its_web_rankings.php Products Wed, 16 Apr 2008 19:20:34 -0800 Josh Catone