algorithms - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/algorithms en Copyright 2012 Richard MacManus readwriteweb@gmail.com Wed, 15 Feb 2012 14:45:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Palo Alto Researchers Create Tool for Dealing with Twitter's "Information Overload" Researchers at the Palo Alto Research Center (PARC) are developing a new Twitter client application that aims to derive meaning from the next-ending influx of tweets. The application, called "Eddi," automatically groups tweets for you into topics mentioned either explicitly or, unlike most Twitter clients that also provide topic browsing, implicitly. The end result is a Twitter app you can use to quickly find the popular discussions within your own personal Twitter stream, either by search, tag cloud, timeline or category list. It even suggests tweets you might be interested in reading, helping you sort the signal from the noise.

]]> Project "Eddi"

Ed Chi, area manager and principal scientist for the Augmented Social Cognition Research Group at PARC, told MIT's Technology Review that the way people use Twitter is that they "dip in" to the Twitter stream from time to time, but don't want to consume it all at once. The Eddi Project was created so that those brief dips into Twitter are more valuable to the end users.

The tool, Eddi, a Twitter client application named after the idea of eddies in a stream, has the barebones look of something built by data researchers as opposed to web designers. But its user interface isn't the most important aspect - it's the algorithms behind the facade that are its standout feature.

In order to filter Twitter's content, Eddi provides two tools: a topic browser that shows tweets broken down into categories and a recommendation engine.

Twitter Topics - And Not Just the Popular Ones

The idea of browsing Twitter by topic is not unique - plenty of Twitter apps do the same, as does Twitter's own search interface at search.twitter.com. But the problem with most of these systems is that they rely on keywords or hashtags - the latter being the annotations preceded by the pound sign (#) which users add to their tweets to make them searchable.

When there is a major event, such as the Icelandic volcano eruption, Michael Bernstein, a researcher at the Computer Science and Artificial Intelligence Lab at MIT who is involved with the project, explained, the sheer volume of tweets provides a lot of information for an algorithm to use. What's harder is to figure out are the topics attached to tweets that are more unique.

"The essence of the approach is to coerce a tweet to look more like a search query and then get a search engine to tell us more," says Bernstein. After cleaning up the tweet, the tool feeds them into Yahoo's Build your Own Search Service interface in an effort to surface web pages related to the tweet in question. This helps the system to appropriately categorize the tweets into topics.

Recommendation Engine

The second aspect to the system is a recommendation engine that ranks tweets by how interesting they are to you. To determine this, Eddi's algorithms look at your own tweets and interactions with other Twitter users.

The new system will go live on the web for public testing sometime this summer. In the meantime, you can sign up for another of PARC's experimental Twitter recommendation engines, this one called ZeroZero88. Information on sign up is here.

]]> Discuss]]>
http://www.readwriteweb.com/archives/palo_alto_researchers_create_tool_for_dealing_with_twitter_information_overload.php http://www.readwriteweb.com/archives/palo_alto_researchers_create_tool_for_dealing_with_twitter_information_overload.php Recommendation Engines Fri, 30 Apr 2010 07:09:40 -0800 Sarah Perez
What Websites Do You Like? New Twitter Tool Will Tell You The Website Taste Predictor is a new Twitter tool that analyzes your Twitter account in order to recommend websites you would like. The project uses Twitter's OAuth authentication protocol to access your Twitter account so you don't have to enter in your username and password in order to try it out. How exactly it works, we can't say. There's no "about" page, "FAQ" or other explanation. In fact, there's not even a credit as to who made it, only a URL. But the URL is a big hint: it's hosted on the MIT.edu domain underneath the subheading ~peretti. And just who is ~peretti? Only the co-founder of the Huffington Post and the viral tracker BuzzFeed, Jonah Peretti.

]]> New Twitter Tool From HuffPo and BuzzFeed Co-Founder?

Peretti is a graduate of the MIT Media Lab, has taught at NYU and the Parsons School of Design, consulted for major brands like Sony Pictures and Procter & Gamble and created several viral experiments like the Nike sweatshop email and FundRace.org. However, he's best known for co-founding BuzzFeed, The Huffington Post, ContagiousMedia.org, and the Eyebeam Open Lab. So if this "Website Taste Predictor" is also his creation (we've contacted him to confirm), you know it's not going to be your run-of-the-mill Twitter tool.

For what it's worth, we're nearly 100% sure about Peretti's involvement. The tool is hosted under his account on MIT's servers, he tweeted about it back on April 7th and he responded personally to a comment about it over on Digg (the fact that this post never hit homepage it a testament to all that is going wrong over there). However, while these clues seem to point to Peretti as the creator, you can never be too sure. We'll wait for an official word and will update accordingly.

Website Taste Predictor in Action

So what does the Taste Predictor actually do? Well, it doesn't just parse your Twitter history to spit back a list of links you've tweeted. That would be too easy.

It appears to delve deeper than that to function as a true recommendation engine. Whether it looks at keywords, follower lists or sites related to those you post links to, we can't be sure, but we do know this: the app gets it right on the money. And I mean downright scary right.

In my case, for example, the list returned included a large group of sites I read regularly consisting mainly tech-focused blogs and mainstream media sites plus a handful of sites I've been known to check out less often. What I don't know is how it figured out that I've been known to gaze at the occasional lolcat, fail photo, web comic or celebrity gossip post when my brain needed a break from all this tech. I certainly never tweeted about those things nor do I follow people who do. So how did it know?

More importantly, though, the tool actually pointed me to a few sites that I really should be reading more often like the image-heavy online paper Newser, the op-ed content network True/Slant and mobile app analytics site Localytics whose blog I just subscribed to.

In other words, the Website Taste Predictor is accurate and useful, or, as Peretti recently tweeted himself: "I think this is the kind of awesome new Twitter App @FredWilson was talking about!"

]]> Discuss]]>
http://www.readwriteweb.com/archives/what_websites_do_you_like_new_twitter_tool_will_tell_you.php http://www.readwriteweb.com/archives/what_websites_do_you_like_new_twitter_tool_will_tell_you.php Product Reviews Wed, 14 Apr 2010 07:21:58 -0800 Sarah Perez
Google Now Scanning RSS, Atom Feeds, May Experiment with Real-Time Protocols in Future According to a post on Google's Webmaster Central blog, Google is now discovering web sites by automatically scanning RSS and Atom feeds. This new process will help Google more quickly identify web pages and will allow users to find new content in search results as soon as it goes live. While not exactly "real-time," using feeds to identify updates to websites is an arguably faster method than the traditional crawling techniques Google has used in the past. And Google may get even faster in the near future - the post also notes that the company may soon explore using mechanisms like the real-time protocol PubSubHubbub to identify updated items going forward.

]]> The blog post doesn't say whether or not RSS and Atom discovery is displacing traditional web crawling for sites that are feed-enabled, but it's likely that, if given the choice, Google will opt for the faster method if available. As Vanessa Fox notes on the SearchEngineLand blog, since it's unknown at this time whether Google is using the feeds in place of traditional web crawling, it may make sense to use full feeds rather than partial ones in order to get your content indexed faster by Google's search engine.

Real-Time Web Crawling in the Future?

Although only briefly mentioned in the post, Google hinted that they may begin looking into other mechanisms such as PubSubHubbub, an open protocol that provides near-instant notifications of change updates. No further details were provided beyond the one sentence, but the announcement clearly shows that Google has seen the writing on the wall and knows that the real-time web is the future. This is one trend the company isn't planning to ignore.

The real-time web, heavily influenced by the speed of Twitter and other other rapid-fire social networking updates, has created a desire among internet users for faster access to information. This desire has, in turn, led to the creation of new real-time protocols such as the above mentioned PubSubHubbub and its counterpart RSSCloud. If Google began to use these technologies for scanning the web, their search results wouldn't just be updated faster - they would be updated in real-time. That means information would become available in the search results listings as soon as it was published to the web.

That, of course, would lead to a whole new series of challenges for the search engine - most notably, how to rank the real-time results? Given that Google's search algorithm has been built on top of the concept of PageRank, a way to determine the relevance of a website by what other sites link to it, ranking search results that are so fresh that there is an absence of links could prove a difficult feat. However, Google is already doing this to some extent now. Over time, the PageRank algorithm has evolved and can now reward sites with fresher, more fitting content and rank them higher than sites with more links on some occasions. And if anyone can figure out the proper algorithm for mixing in real-time content and ranking it appropriately along with static pages, it's got to be Google. In fact, we'll probably soon see exactly how they plan on addressing this issue, when they incorporate Twitter search results into their index, as announced last week.

...But Until Then, Google Delivering Faster, Fresher Results Instead

Although the PubSubHubbub mention may have been the most exiting part of the announcement, real-time search results aren't here just yet. In the meantime, we have to just be content with sped up results instead. The post advises website owners who are blocking Google's search bot software known as Googlebot from crawling their RSS/Atom feeds to unblock it via their robots.txt file. If unsure, webmasters can test their feed URLs with the robots.txt tester in Google Webmaster Tools, as the post recommends.

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_now_scanning_rss_atom_feeds.php http://www.readwriteweb.com/archives/google_now_scanning_rss_atom_feeds.php Google Fri, 30 Oct 2009 06:44:01 -0800 Sarah Perez
Identify Any Website's Sentiment with ContextSense ContextSense is a newly launched sentiment extraction technology from Wingify, a company focused on website optimization solutions. As a part of their core product which helps website owners identify visitor demographics and behavior, target ads, and optimize landing pages, ContextSense demonstrates how Wingify's contextual targeting technology works. To use the tool, you simply enter in a URL or a piece of text, and it will then reveal the overall sentiment of the website (positive or negative), relevant tags, concepts, categories, and contextually similar links. The end result is a quick glimpse into what a site is all about.

]]> Tags, Concepts, and Sentiment

To test out ContextSense's accuracy, we put in the URL for ReadWriteWeb.com (but of course). The end result was mostly on target, identifying our main concepts as a top ten list including things like software, Google, iPhone, news and media, commenting, semantic web, and more. The last three items in the list - AJAX, class libraries, and JavaScript - were off base. Perhaps that's why they were greyed out while the rest of the list was in black, though. There isn't any explanation as to what the shading means, but that's a logical leap.

The categories list was similar to the concepts list except it showed more of a drill-down as to what broader topics the concepts came from. For example, for "Semantic Web," the associated category was "Reference > Knowledge Management > Knowledge Representation > Semantic Web."

The tool also ranked our site as "slightly positive," which makes sense since we're passionate about technology and don't (often) post negative reviews - we tend to just skip product reviews for those sites and services we don't think much of.

The final bit of analysis presented us with a list of "contextually sensed links from the internet." According to Wingify, this is meant to expose other sites which focus on some of the same concepts as the one in question does. This is where the ContextSense technology was a little more questionable. You would think that it would identify similar technology weblogs here - like those that light up Techmeme on a daily basis - but instead, it appeared to be a list of web products and resource-based sites. Many of the sites were iPhone-focused, including appSafari, iAppCat, AppCraver, and appRater. Others seemed completely random like Piggy Bank, a semantic Firefox extension, and PurpleBunny, a social commenting utility. After the fun of exploring the sites we've never heard of wore off, reality set in. This tool isn't quite ready for primetime. And similar tests of other URLs revealed similar results.

Sentiment Analysis: the Next Big Attraction for the Web?

That being said, we like the concept behind ConextSense and its simple interface (type a URL, click a button!). However, it goes to show you that contextual sentiment analysis is still very much a new and struggling technology with loads of room for improvement.

To date, there are only a few companies providing solutions. Recent examples of publicly available tools include Evri's sentiment engine as well as Twitter trackers twendz, Twitrratr, and Tweetfeel. Other companies, like San Francisco-based Scout Labs, provide subscription services to businesses wanting to track the sentiment appearing in blogs, news articles, online forums, and social networking sites as it relates to their brand. Then there is the Financial Times' site Newssift which tracks sentiments about business topics in the news.

But determining sentiment, which requires computers and algorithms to sense the meanings behind our words in our everyday language, is difficult to do. Slang, sarcasm, and various expressions are easy for other humans to understand but don't make sense in the binary world of machines. Yet, with all the content on the net today, not to mention the rise of social media as a marketing and advertising platform, we expect to see more companies attempting to solve this challenge in the near future.

ConextSense makes a valid attempt at website sentiment and contextual analysis, but ultimately, their solution still needs work. It's available either as an online tool or as an API for developers. Contact them for more information.

]]> Discuss]]>
http://www.readwriteweb.com/archives/identify_any_websites_sentiment_with_contextsense.php http://www.readwriteweb.com/archives/identify_any_websites_sentiment_with_contextsense.php Product Reviews Wed, 02 Sep 2009 07:06:05 -0800 Sarah Perez
Facial Recognition Comes to Facebook This morning, Face.com announced that they're bringing advanced facial recognition technology to Facebook by way of a new application called Photo Finder. Using proprietary facial scanning algorithms, this application scans through your photos and those public photos belonging to your friends in order to identify and suggest tags for the untagged people within them. The results of these scans are highly accurate - almost frighteningly so - and should lead to some interesting discoveries as the app spreads through Facebook when it finally becomes public.

Limited invites available, click though to learn more!

]]> How Photo Finder Works

Face.com's facial recognition software is able to scan through millions of photos in a relatively short amount of time. Although the results of the scan are not immediate upon adding the application, you're able to view them even while the scan is in progress. There's no exact time frame for how long this process takes - it depends on the number of friends and photos you have available among many other factors. However, Photo Finder does save its results for future use - if your friends later add the app too, it won't need to rescan the photos that have already been analyzed.

To begin using Photo Finder, you simply add it to your profile as you would with any other Facebook app. You then click the "Get Started" button and Photo Finder will launch its scan. There's nothing else you need to do until the scan is complete. You can leave the page to return to other parts of Facebook or even close the web page altogether - Photo Finder will continue to run as you've already granted it offline access when you initially added it.

After Photo Finder has finished its work, you can return to review the results. You can click on the "Me" button to see your own photos or click on the "Friends" button to discover those belonging to your friends. Next to each user, all the Facebook photos of that particular person are displayed. The ones in which they're already tagged via Facebook are outlined in blue and the ones where they've been "auto-tagged" by the application are outlined in orange.

On the auto-tagged photos, you can click a green checkmark to confirm the match or a red "X" if the match is incorrect. Upon hitting the red "X," you're presented with a dialog box where you can fill in the name of the person who is actually in the photo or you can click a button that reads "unknown" if you don't know who it is.

If you'd like, you can later navigate to the "Who's This?" section within the application to help tag all the "unknowns" in your network. The software also identifies how accurate a match is by displaying a percentage beneath each photograph.

To keep track of the photos of your closest friends, Photo Finder presents an option that allows you to add people to a "Watch List." This is a section of the application where all the related photos for those on the list are tracked.

Privacy Concerns?

The Photo Finder application may sound a bit frightening at first, given its capabilities to uncover long-lost and hidden Facebook photos. However, the company has taken great strides to make sure that its application respects your privacy. For one, the app will not tag photos within Facebook itself - they are only tagged within the application, meaning no one can see them unless they too are running the app.

Photo Finder also correlates its settings with your Facebook privacy settings. So, for example, if you've specified that a certain subset of your friends may not see your tagged Facebook photos, that is also reflected within the application.

When you're auto-tagged in a photo, you are the first one to be alerted via Facebook's notification mechanism. You can then either approve the photo or untag it (hide it from the other users of the Photo Finder app). If you untag yourself, none of your friends will be alerted to this action.

The Technology

The algorithm behind Photo Finder has been in development since 2007. Unlike many of the facial recognition algorithms out there today, this one does not require people to face the camera head-on in order for it to be accurate. Instead, it focuses on identifying people in "everyday photos" - that is, photos taken from different angles, out-of-focus shots, photos in low lighting, or those in which people are making odd facial expressions, etc.

That the algorithm excels at matching people with their pictures has been at least partially confirmed by an independent study conducted by the University of Massachusetts. Here, the Photo Finder team contributed but one aspect of their algorithm for examination and its accuracy far exceeded that of its competitors. You can see the results of that study here - the Photo Finder algorithm is identified as the "hybrid descriptor-based, funneled" model which is seen on the chart with the highest "true positive" rate.

Join the Private Alpha

The Face.com technology has been in private alpha testing for a number of months among a group of 150 users, mostly friends and colleagues of the founders Gil Hirsch (CEO), Yaniv Taigma (CTO), Eden Shochat, and Moti Shniberg. During that time, 20 million photos were scanned, identifying around 30,000 people.

You can join the private alpha by clicking this link here: http://face.com/invite.php?promo_code=S226566001

Only 100 members are invited.

The company is not able to confirm an exact date as to when the app goes public, only that they expect that sometime later this year, after tens of millions of photos have been scanned, they'll know better where they stand. They don't want to rush things - it's one thing to open up the app to select users, it's quite another to make Photo Finder available to scan the 15 billion photos hosted on Facebook. But one day in the not-too-distant future, that's exactly what they plan to do.

Face.com is currently angel funded and looking to raise a VC round.

]]> Discuss]]>
http://www.readwriteweb.com/archives/facial_recognition_comes_to_facebook.php http://www.readwriteweb.com/archives/facial_recognition_comes_to_facebook.php Product Reviews Tue, 24 Mar 2009 08:00:00 -0800 Sarah Perez