semantics - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/semantics en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 15:30:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Making Decisions With Machines and People: 3 New Cyborg Q&A Services cyborgpic.jpgThe following post was originally titled The Robot Made Me Do It: Comparing Three New Cyborg Q&A Services and ran a week and a half ago. It's a slow morning around here and we thought readers who missed this the first time might appreciate a chance to see it now.

One part people, one part machine. Is that a formula for more effective decision making? A number of high-profile entrepreneurs believe it is, and they are starting companies based on the idea.

]]> In the following post we take a look at three of the most exciting startups entering this emerging market. The movement is a logical development now that millions of people are comfortable posting information online. The web's next step is to leverage machine learning. These are three companies to watch who are doing just that - combining user input with technology that improves its performance by gathering and processing data. In this case they are doing it in order to help people make better decisions, but these are just some of the first consumer technologies that will enter the cyborg-like space that combines people and machines in order to better serve people.

The three services we look at are Aardvark, Hunch and Swingly. Unfortunately none of these services are wide open to the public yet. If you go to their sites and request an invite, you should get one soon. You might also try asking around on other networks like Twitter or Facebook; two of the three services discussed below have invites in the wild now.

Aardvark

vark.com (Our initial review)

Premise: Ask any question by IM and your question will be routed to a tagged "expert" on the topic, among your friends and their networks.

Logic: There appears to be some semantic analysis of the tags given users by their friends and themselves, cross referenced with semantic analysis of the questions asked in order to find the right fit. We presume there is or will be some logic judging the history of successful answers from users so as to rank relative expertise.

History of one query.

Aardvark2.jpg

An IM thread.
aardvark3.jpg

Editing user profile.
Aardvark1.jpg

User experience: High coolness factor when a real person quickly answers your question. How reliable that person is regarding the topic of the question is not readily apparent. Interesting IM interface facilitates relatively sophisticated interactions based on short commands. Fun to browse through open questions; smart deference to email when people aren't available by IM. Can be irritating to be interrupted by other people's questions by IM, but not such a big deal. Web interface is quite nice but I've hardly ever seen it -- just asking and answering questions through IM.

How It Differs From the Others: IM interface offers almost zero barrier to entry and a powerful hook to return to the service over time. Machine learning focuses on identifying human experts, and search is rich with human interaction, thinly mediated by a smart system. You could call this a friend-network-based, semantically powered expert discovery and conversation system.

Stage: Closed beta; new users get 50 invites. Has been in the works for years and is relatively well baked.

Backing: Made up largely of ex-Googlers. The parent company is called The Mechanical Zoo and has raised $6 million from very hip VC firm August Capital and Ron Conway's Baseline Ventures.

For more info, see this review on VentureBeat.

Hunch

hunch.com (Our previous coverage)

Premise: You may like the same advice for common questions that people with similar tastes like.

Logic: A series of decision topics have been populated with questions concerning factors to consider for each decision. Users go through and answer those questions and are then presented with a series of answers that other people who answered the questions the same way and who have similar tastes have said they are happy with. It's hard to explain but really easy to use. Users can add "factors to consider" questions to any question. There's a really interesting social networking component to it as well.

Home page: random questions; taste-profile-building question about you, users.

Hunch3.jpg

Answering a question as part of a larger question.
hunch2.jpg

Answer page, with opportunity to edit inquiry.
Hunch1.jpg

User experience: Using Hunch is an odd experience, but it's a whole lot of fun once you get it figured out a little bit. Much of the User Experience design is a model that you'll wish every website followed. It's quite game-like. That said, the site can be overwhelming and make your brain hurt. The service tells me that most people who said they think clowns are funny (as I did), and who don't do video editing on their computers, also liked the answer "no, you probably don't need to upgrade your Mac's RAM." I don't really know what to make of that. You'll probably want to go back, though, and you'll probably want to clap your hands and smile each time you do.

How It Differs From the Others: By far the most "involved" for users of these three services. The user experience is very structured but it's also a lot of fun. You could call this a profile-driven, crowd-built recommendation system.

Stage: Closed beta; new users get a very limited number of invites. One co-founder says it's still quite rough around the edges, but if that's the case we sure can't see it.

Backing: The company has raised $2 million in VC funding and has an executive team of successful startup founders who've sold other companies, most prominently Caterina Fake, one of the co-founders of Flickr, who is now Chief Product Officer at Hunch.

To read more about Hunch, see the company's official FAQ.

Swingly

swingly.com

Premise: Answers to any question you have can be found out around the web. Swingly finds those answers hidden in plain text articles, databases and other Q&A sites. Then it makes them structured for easy sorting in response to queries.

Logic: This un-launched company uses Seti@Home-style distributed computing to perform Natural Language Processing on pages all around the web, hunting for information that can be turned into Questions and Answers to serve up to Swingly users. The company believes that "next-gen search should [include] 'micro-retrieval,' rather than return pages, and return only the content (word/sentence/paragraph) you need."

A screen shot from earlier this week.

swingly1-1.jpg

Some sample answers to questions asked of Swingly.
swinglypic2.jpg

The system claims it understands subtle differences between questions.
swinglypic3.jpg

User experience: We've not been able to test Swingly yet, but it looks relatively straightforward so far. There will be any number of additional services built out as well, including a widget for bloggers to offer Q&A functionality on their sites. When you talk about billions of pieces of structured data that you can query with common questions, almost anything is possible. That said, Q&A is a field that several other companies have done a good job nailing already, from Yahoo Answers to ChaCha to Mahalo.

How It Differs From the Others: Swingly is the most mysterious of the three services and the most likely to become "a platform." It's also the most likely to suffer from the Powerset dilemma: hype, hyper-nerdy ambitions, big expectations, lackluster launch, $100m payday from Microsoft, getting turned into a term of derision among some in the industry and maybe buying a yacht.

Stage: Closed alpha right now. Starting to make the first public rumblings with screen shots, Twitter presence, initial PR outreach. "Alpha coming in late March and a public beta in mid-May. The alpha version will use an index of about 850 million question-answer pairs (more than all the Q&A sites put together) and will only be searchable. The beta release will consist of about 5 billion question-answer pairs and will include full questions and answers plus semantic search capabilities." - CEO Andy Hickl, last month

Backing: Dallas, Texas-based Swingly CEO and founder Andy Hickl is also CEO and President of the very related-looking Language Computer Corporation. CNN calls that company "closely held."

One thing's for sure - we're going to hear a lot more about Swingly. The company is working with Porter Novelli's Josh Dilworth, one of the smartest and most effective PR agents in the tech industry. Dilworth has a history of working with uber-nerd companies and getting them huge media coverage. His recent clients include database super-search engine Wolfram|Alpha (our review) and the most-discussed consumer semantic web company to date, Twine (our most recent coverage).

To follow the unfolding of Swingly, check out Hickl's personal blog.

Those are three companies we'll be watching closely as they break new ground in the combination of social and machine learning online. Which would you be most likely to go to first with a question? We'd love to hear from readers who have thought about this field, who are doing work in it as well, or who have initial impressions about these services that they would like to share. We expect to see a whole lot more like this in the near future.

Title photo Cyborg 2.0 by Y0si CC on Flickr

]]> Discuss]]>
http://www.readwriteweb.com/archives/making_decisions_with_machines_and_people_3_new_cy.php http://www.readwriteweb.com/archives/making_decisions_with_machines_and_people_3_new_cy.php Analysis Fri, 08 May 2009 08:44:38 -0800 Marshall Kirkpatrick
The Robot Made Me Do It: Comparing Three New Cyborg Q&A Services cyborgpic.jpgOne part people, one part machine. Is that a formula for more effective decision making? A number of high-profile entrepreneurs believe it is, and they are starting companies based on the idea.

In the following post we take a look at three of the most exciting startups entering this emerging market. The movement is a logical development now that millions of people are comfortable posting information online. The web's next step is to leverage machine learning. These are three companies to watch who are doing just that - combining user input with technology that improves its performance by gathering and processing data. In this case they are doing it in order to help people make better decisions, but these are just some of the first consumer technologies that will enter the cyborg-like space that combines people and machines in order to better serve people.

]]> The three services we look at are Aardvark, Hunch and Swingly. Unfortunately none of these services are wide open to the public yet. If you go to their sites and request an invite, you should get one soon. You might also try asking around on other networks like Twitter or Facebook; two of the three services discussed below have invites in the wild now.

Aardvark

vark.com (Our initial review)

Premise: Ask any question by IM and your question will be routed to a tagged "expert" on the topic, among your friends and their networks.

Logic: There appears to be some semantic analysis of the tags given users by their friends and themselves, cross referenced with semantic analysis of the questions asked in order to find the right fit. We presume there is or will be some logic judging the history of successful answers from users so as to rank relative expertise.

History of one query.

Aardvark2.jpg

An IM thread.
aardvark3.jpg

Editing user profile.
Aardvark1.jpg

User experience: High coolness factor when a real person quickly answers your question. How reliable that person is regarding the topic of the question is not readily apparent. Interesting IM interface facilitates relatively sophisticated interactions based on short commands. Fun to browse through open questions; smart deference to email when people aren't available by IM. Can be irritating to be interrupted by other people's questions by IM, but not such a big deal. Web interface is quite nice but I've hardly ever seen it -- just asking and answering questions through IM.

How It Differs From the Others: IM interface offers almost zero barrier to entry and a powerful hook to return to the service over time. Machine learning focuses on identifying human experts, and search is rich with human interaction, thinly mediated by a smart system. You could call this a friend-network-based, semantically powered expert discovery and conversation system.

Stage: Closed beta; new users get 50 invites. Has been in the works for years and is relatively well baked.

Backing: Made up largely of ex-Googlers. The parent company is called The Mechanical Zoo and has raised $6 million from very hip VC firm August Capital and Ron Conway's Baseline Ventures.

For more info, see this review on VentureBeat.

Hunch

hunch.com (Our previous coverage)

Premise: You may like the same advice for common questions that people with similar tastes like.

Logic: A series of decision topics have been populated with questions concerning factors to consider for each decision. Users go through and answer those questions and are then presented with a series of answers that other people who answered the questions the same way and who have similar tastes have said they are happy with. It's hard to explain but really easy to use. Users can add "factors to consider" questions to any question. There's a really interesting social networking component to it as well.

Home page: random questions; taste-profile-building question about you, users.

Hunch3.jpg

Answering a question as part of a larger question.
hunch2.jpg

Answer page, with opportunity to edit inquiry.
Hunch1.jpg

User experience: Using Hunch is an odd experience, but it's a whole lot of fun once you get it figured out a little bit. Much of the User Experience design is a model that you'll wish every website followed. It's quite game-like. That said, the site can be overwhelming and make your brain hurt. The service tells me that most people who said they think clowns are funny (as I did), and who don't do video editing on their computers, also liked the answer "no, you probably don't need to upgrade your Mac's RAM." I don't really know what to make of that. You'll probably want to go back, though, and you'll probably want to clap your hands and smile each time you do.

How It Differs From the Others: By far the most "involved" for users of these three services. The user experience is very structured but it's also a lot of fun. You could call this a profile-driven, crowd-built recommendation system.

Stage: Closed beta; new users get a very limited number of invites. One co-founder says it's still quite rough around the edges, but if that's the case we sure can't see it.

Backing: The company has raised $2 million in VC funding and has an executive team of successful startup founders who've sold other companies, most prominently Caterina Fake, one of the co-founders of Flickr, who is now Chief Product Officer at Hunch.

To read more about Hunch, see the company's official FAQ.

Swingly

swingly.com

Premise: Answers to any question you have can be found out around the web. Swingly finds those answers hidden in plain text articles, databases and other Q&A sites. Then it makes them structured for easy sorting in response to queries.

Logic: This un-launched company uses Seti@Home-style distributed computing to perform Natural Language Processing on pages all around the web, hunting for information that can be turned into Questions and Answers to serve up to Swingly users. The company believes that "next-gen search should [include] 'micro-retrieval,' rather than return pages, and return only the content (word/sentence/paragraph) you need."

A screen shot from earlier this week.

swingly1-1.jpg

Some sample answers to questions asked of Swingly.
swinglypic2.jpg

The system claims it understands subtle differences between questions.
swinglypic3.jpg

User experience: We've not been able to test Swingly yet, but it looks relatively straightforward so far. There will be any number of additional services built out as well, including a widget for bloggers to offer Q&A functionality on their sites. When you talk about billions of pieces of structured data that you can query with common questions, almost anything is possible. That said, Q&A is a field that several other companies have done a good job nailing already, from Yahoo Answers to ChaCha to Mahalo.

How It Differs From the Others: Swingly is the most mysterious of the three services and the most likely to become "a platform." It's also the most likely to suffer from the Powerset dilemma: hype, hyper-nerdy ambitions, big expectations, lackluster launch, $100m payday from Microsoft, getting turned into a term of derision among some in the industry and maybe buying a yacht.

Stage: Closed alpha right now. Starting to make the first public rumblings with screen shots, Twitter presence, initial PR outreach. "Alpha coming in late March and a public beta in mid-May. The alpha version will use an index of about 850 million question-answer pairs (more than all the Q&A sites put together) and will only be searchable. The beta release will consist of about 5 billion question-answer pairs and will include full questions and answers plus semantic search capabilities." - CEO Andy Hickl, last month

Backing: Dallas, Texas-based Swingly CEO and founder Andy Hickl is also CEO and President of the very related-looking Language Computer Corporation. CNN calls that company "closely held."

One thing's for sure - we're going to hear a lot more about Swingly. The company is working with Porter Novelli's Josh Dilworth, one of the smartest and most effective PR agents in the tech industry. Dilworth has a history of working with uber-nerd companies and getting them huge media coverage. His recent clients include database super-search engine Wolfram|Alpha (our review) and the most-discussed consumer semantic web company to date, Twine (our most recent coverage).

To follow the unfolding of Swingly, check out Hickl's personal blog.

Those are three companies we'll be watching closely as they break new ground in the combination of social and machine learning online. Which would you be most likely to go to first with a question? We'd love to hear from readers who have thought about this field, who are doing work in it as well, or who have initial impressions about these services that they would like to share. We expect to see a whole lot more like this in the near future.

Title photo Cyborg 2.0 by Y0si CC on Flickr

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_robot_made_me_do_it_comparing_three_new_cyborg_q_and_a_services.php http://www.readwriteweb.com/archives/the_robot_made_me_do_it_comparing_three_new_cyborg_q_and_a_services.php Analysis Thu, 30 Apr 2009 18:53:45 -0800 Marshall Kirkpatrick
Web 3.0 Conference: Real-World Value from Semantics and Analytics Web 3.0 ConferenceEditor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products.

From May 19th to 20th, mediabistro will hold its Web 3.0 Conference in New York City at the New Yorker Hotel. The conference focuses on the semantic web, mashups, text and data analytics, and how they add real-world value to end users and businesses.

]]> The last phase of the web, which has been referred to as Web 2.0, was more about AJAX-driven interactivity and social media. The Web 3.0 conference focuses on technologies that make the Web and data management substantially smarter.

Keynote speakers at the conference include:

  • Christine Connors, Global Director of Semantic Technology Solutions, Dow Jones;
  • Aza Raskin, Head of User Experience, Mozilla Labs;
  • Thomas Tague, Calais Initiative Lead, Thomson Reuters;
  • Loren Grossman, Global Chief Strategy Officer, Rapp/Omnicom.

While some think of Web 3.0 as an almost science-fiction-like intelligent Web, the truth is that a lot of here-and-now technology can make your Web and corporate applications smarter and more profitable. This includes everything from extracting insights from customer behaviors to serve them better, to breaking down the corporate information silos spread throughout your company so that your business information can become actionable insights.

The next generation of the Web is about leveraging the massive amounts of information you have or intend to collect or find available on the Web to make more profitable, efficient businesses and services. This concept will be one of the major drivers of profit as we push past the "2.0" generation and seek the "what's next" of the Internet.

For more details and to register for the conference, visit www.web3event.com. ReadWriteWeb readers save 15% with the discount code XRWW. For best available rates, register by 29 April 2009.

]]> Discuss]]>
http://www.readwriteweb.com/archives/web_30_conference_real-world_value_from_semantics_analytics.php http://www.readwriteweb.com/archives/web_30_conference_real-world_value_from_semantics_analytics.php Sponsors Mon, 27 Apr 2009 13:00:00 -0800 RWW Sponsor
Interactive iPhone Kiosk Lets You Play with Semantic Web Two German researchers, Simon Bergweiler and Matthieu Deru, came up with a way to explain the heady concept of the semantic web, aka "Web 3.0," to everyday people who aren't as steeped in technology advancements and lingo as perhaps we are. To do this, the researchers set up an experimental kiosk that lets you use semantic web capabilities with only an iPhone and a swish of your finger.

]]> The kiosk, or "shared interaction space" as they prefer calling it, uses MP3 files to demonstrate semantic web technologies. MP3 files were chosen because they are easy to understand as being "things" that can have additional data attached to them ("artist," "album," "year," etc.). This additional data in MP3 files is stored in "ID3 tags," which are basically the portion of the file that tells the computer about that extra information. An MP3 file on an iPhone then is already a semantically annotated object which can be read by a computer.

Wait, What's the Semantic Web Again?

Why is this considered semantic technology? Because the core idea behind the semantic web, the next big leap in computing technology is a web where "things" (like MP3 files, for example) can be read and understood by computers. A semantic web would better understand our search queries and how objects were linked with each other. The understanding that comes from that network of linked information could even bring about a sort of artificial intelligence, as the web could then deliver information to us that the human mind wouldn't have been able to access on its own. That's somewhat of a simplification of the semantic web, but it works for our purposes in understanding this experiment.

The Semantic iPhone Kiosks

To demonstrate semantics in action, participants in the experiment would place an iPhone on the kiosk's surface and watch as a circle appeared around it. Next to the iPhone, a list of songs arranged by artist, title, or genre would then appear. Elsewhere on the screen were things called "spotlets," or intelligent information agents that performed actions when the songs were dragged to them. For example, one spotlet played the MP3 when it was dragged there, another played YouTube music videos from the same band.

To better understand spotlets, you can check out this YouTube video of spotlets in action.

The tabletop computer looks to be very much like a homegrown version of Microsoft's Surface computer, except for the fact that the cameras that detect the action taking place on the screen were strung up above the computer instead of housed inside and underneath the screen as they are with Surface. The researchers are calling this the Comet system, short for "Collaborative Media Exchange Terminal."

The rest of the interactions that take place on the computer's screen are the sort of natural user interface actions that we've come to expect from touch screen technology. You can actually touch and drag the MP3s from one spotlet to the next, playing music, watching videos, and getting recommendations for other songs you might like.

According to a CNN article about the technology, the researchers will also soon be launching a web site version of their system, where you'll be able to drag icons with a cursor instead of your finger. You'll also be able to use speech commands to interact with the system, which could be an interesting development in home entertainment systems.

This iPhone interactive kiosk is a great example of how the semantic web doesn't have to be a dry concept, unexciting to anyone who isn't a technophile or programmer. Instead, it shows the promise of what the semantic web could bring and how it could impact our everyday life. Keep your eye on this team for more information about their technology in the future - they are definitely a group to watch.

]]> Discuss]]>
http://www.readwriteweb.com/archives/interactive_iphone_kiosk_lets_you_play_with_semantic_web.php http://www.readwriteweb.com/archives/interactive_iphone_kiosk_lets_you_play_with_semantic_web.php Product Reviews Mon, 22 Dec 2008 07:17:45 -0800 Sarah Perez
Law 2.0 News: Mumboe Uses Semantics To Pull Key Data From Contracts Mumboe isn't just another enterprise collaboration suite. Instead, they focus on doing one thing and doing it well: making business agreements searchable. That's a very unique need they fill, which is why is why they already have 3000 customers using their free Express solution after only having launched earlier this spring. To compete with the handful of other vendors in this narrow space, Mumboe has now added a new feature called On-Demand Contract Intelligence, which takes advantage of the service's semantic processing engine to deliver something the others don't: automatic extraction of data.

]]> In these uncertain times, there has been a lot of speculation about what sort of web apps will survive the U.S. financial crisis. Clones and other sites offering little to no value may disappear, but ones that offer something unique may have a shot. Those that deliver something of value to a business may be even more likely to weather the storm. Although Mumboe hasn't been the only business contract management solution around - others like EchoSign, DocuSign, and Entrust, for example are also available - their competitors tend to offer document management through the use of digital signatures. What Mumboe does is different - business agreement tracking and management. Through their dashboard you can keep tabs on upcoming and past-due tasks as well as deadlines. You can also create, view, or search for agreements in your online database. Now Mumboe is introducing another feature that lets them stand from the crowd out even more than before: On-Demand Contract Intelligence.

Auto-Extracting Data

To use this new feature, you begin by uploading documents into Mumboe as usual, a process that involves nothing more than browsing for the file on your PC and uploading it to Mumboe. The OCR software in the system will scan the document's text and make the text searchable. Once complete, you can then select the new "Auto Extract" button which lets you automatically extract the key details from the document such as the parties involved and the agreement term. Those details are displayed for your reference along with the exact place in the document where they were found. You'll also be able to see an excerpt of the text contains those key words and phrases. You can choose to save the data as is or edit it as you see fit. If the system extracts terms you don't need to track, you can simply discard those items and keep the rest. When you're finished, just click "Done."

Pulling out the details:

Why This Matters

For consumers and every one else outside the target audience, Mumboe and other similar types of applications seem dry and boring. But for those in need of better tools, mainly those working in the legal profession, they will see this auto-extract capability as one sexy new feature. Why? Because according to the International Association for Contract and Commercial Management (IACCM), poorly managed contracts result in over $153 billion in missed savings and revenue per year. Give our current economic conditions, those numbers cannot be ignored.

The only hurdle Mumboe has to overcome is the lawyers' (a typically conservative bunch) fear of moving to an online application. Their, in our opinion, misguided fear and mistrust of any sort of cloud app keeps them doing business the old-fashioned way - reading through pages and pages of documents themselves instead of enlisting the aid of some "new fangled" system to help them out. But for those who do take the leap, their efficiency will increase dramatically. If there's any occupation that understands the concept of "time is money," law would be it, and that's why Mumboe has a shot.

How To Join the Beta

The Mumboe auto-extract features are in private beta at the moment, but you can sign up to join here: https://app.mumboe.com/registration/beta_promotion_index. Mumboe will give out invites to RWW readers exclusively.

More About Mumboe:

]]> Discuss]]> http://www.readwriteweb.com/archives/law_20_news_mumboe_uses_semantics_to_pull_data_from_contracts.php http://www.readwriteweb.com/archives/law_20_news_mumboe_uses_semantics_to_pull_data_from_contracts.php Enterprise Wed, 01 Oct 2008 06:40:00 -0800 Sarah Perez Hakia Announces Semantic API Semantic search engine Hakia today announced a set of APIs that opens up their natural language processing and search platform to developers. Hakia's Syndication Web Services really comes in two parts: search queries, which allow developers to add web search functionality leveraging Hakia's five billion page index, and XML feed calls, which give developers access to Hakia's underlying natural language processing technology. The latter of the two is clearly the more compelling of the offerings.

]]> Mobile video firm, Berggi, released Berggi Search, a mobile search application that lets users search Hakia's index via the API from mobile phones. Berggi is leveraging the part of the Hakia's API that lets developers lean on the company's search platform -- that, however, is not the part that really interests us.

What is more interesting are the XML feed calls that Hakia is offering that give access to their underlying NLP engine. Right now, only the "Summarizer" element is available. Summarizer, which Hakia says can be used to suggest tags or abstracts, analyzes and extracts meaning from large blocks of text or the contents of URLs. Other elements that are not yet available are Categorizer, which identifies "categorical phrases" in text, Characterizer, which "identifies and expands descriptive keywords or tags," and Text Meaning Representation.

Hakia has an XML testing form up on their Club Hakia page, and in our testing it seemed a little rough around the edges. Compared to our testing of Open Calais from Reuters (our coverage), the summaries and tags the XML testing form returned using the Summarizer element weren't very impressive. Mostly, it seemed to just return the headline or first sentence as the summary for articles we threw at it. And for RWW articles, Hakia Summarizer would suggest as tags the tags that we entered by hand in MovableType.

Hakia's Syndication Web Services are free for up to 30,000 requests per day for search services (unlimited free queries for Quotes and Cartoons), and free for up to 1,000 requests per day for XML feed calls. Have you had a chance to play with Hakia's new semantic API? If so, what did you think? How does it compare to Calais or Semantic Hacker? Let us know in the comments below.

Full Disclosure: Occasional ReadWriteWeb contributor Emre Sokullu is a technology evangelist at Hakia.

]]> Discuss]]>
http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php http://www.readwriteweb.com/archives/hakia_announces_semantic_api.php Semantic Web Thu, 19 Jun 2008 12:56:42 -0800 Josh Catone
Thinkbase: Mapping the World's Brain If Freebase is an "open shared database of the world's knowledge," then Thinkbase (found via information aesthetics) is a mind map of the world's knowledge. The interesting and incredibly addictive Freebase visualization and search tool is the brainchild of master's degree student Christian Hirsch at the University of Auckland. Thinkbase is one of the cool proof of concept applications built on top of Freebase that we mentioned last week.

]]> As we've mentioned here on RWW, Freebase is best suited for complex inferencing queries -- the type that expose relationships between various entities to figure out an answer. Things like, "What's the name of the actor who was in both "The Lord of the Rings" and "From Hell?" (Answer: Ian Holm)

Thinkbase doesn't necessarily answer those questions -- at least not directly, but it does allow people to visually explore the relationships that Freebase can expose. Thinkbase employs the Thinkmap visualization software to visually represent the semantic relationships between objects on Freebase as an interactive mind map. Each object on the map is represented by an icon that corresponds to the type of object it is. For example, person, place, movie, song, or artwork.

The site uses a two-pane display, putting the relationship map in the left pane, and the Freebase entry for the active node in the right pane. Every node on a Thinkbase map and be expanded to see concepts related to that object, or collapsed to clean the graph of relationships you're unconcerned with. Every map you create can also be linked to via a dynamic share URL.

Thinkbase is a really fun visual front end to the Freebase database that exposes the semantic relationships that such a database can reveal in a compelling way. Alex Iskold wrote last week that the problem with semantic search is that we're asking the wrong questions. Tools like Thinkbase can help us start to think about what type of questions we should be asking by clearly showing the type of semantic relationships that databases like Freebase excel at finding.

]]> Discuss]]>
http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php Product Reviews Thu, 05 Jun 2008 10:30:01 -0800 Josh Catone
Making the Web Searchable: The Story of SearchMonkey Last week at the SemTech 2008 Conference that took place in San Jose, Yahoo! Researcher Peter Mika spoke in detail about the company's new SearchMonkey search platform initiative. Mika talked broadly about his work looking at metadata on the web, and how that led to the birth of SearchMonkey. This post is based on notes from that talk.

]]> History of Web Page Annotations

The motivating question for Mika's presentation was: How can we make web search better by leveraging web annotation? There are many kinds of annotations, but Mika focused on simple data and lightweight semantics, and began by reviewing the history and evolution of annotations to explain how we got to where we are today.

One of the first methods of annotating HTML was Simple HTML Ontology Extensions (SHOE). This method allowed for the declaration of ontologies as well as relationships between the entities on HTML pages. The problem with it was that it introduced new tags that were not part of standard HTML and were not recognized by most browsers.

In 2003 Tantek Celik started work on Microformats - a way to embed light semantics using XHTML. Microformats are now driven by a community of developers, which evangelizes existing formats and is working on new ones. The major focus of this effort is to leverage standards, but Microformats are limited because they don't share common syntax. Every microformat looks different and there are no ontologies, and no schemas.

Things get particularly complicated when you start combining different Microformats, for example, when you describe that a person wrote a review at a particular event. In addition to this, Microformats have no concept of unique identity, and for this reason are largerly incompatible with other Semantic Web efforts. Yet, Microformats took off and have become somewhat widespread. So, the take away here is that simple things can quickly gain adoption.

Another way of providing metadata that emerged recently is tagging. As an example, Flickr uses tags for photos to enable its users to annotate and describe the content. The problem with tags is that there is no agreement on meaning, so the same tag on Flickr and del.icio.us can mean different things, and there's no way to be sure which tag means what. Tags are a much more personal way of annotating information; they are not objective.

In 2005, Ian Davis, CTO of Semantic Web infrastructure company Talis, proposed eRDF - a form of RDF that can be embedded into HTML (compatible with HTML4). There is a simple mapping from eRDF to RDF so you can use any RDF/OWL vocabulary. But eRDF is not full RDF -- it has limitations. For example, there are no data types and there no blank nodes. Also, each page can only "talk" about itself and not about other pages.

Finally, the W3C published RDFa the latest embedding of RDF in XHTML, which has full RDF support. RDFa adds complexity in terms of implementation, but at the moment, gives the best way to embed RDF into HTML.

How Much Metadata is Out There?

Given the increasing trend towards web annotations, the natural question is, Just how much metadata is already out there?. Peter Mika set out to answer this question and created a prototype, called Microsearch. The idea was to look at web pages and to see how much metadata was there. Beyond that, Mika was also interested in what type of metadata, as well as the ratio between annotated and plain HTML pages.

With the Microsearch exercise, Mika wanted to demonstrate what could be done to enhance search with this information. For each type of metadata, Mika augmented search results with additional links and information. For example, maps, events, information from hCard, etc. are presented in an enhanced way, unlike what we're used to seeing with today's search engines.

Mika discovered a few interesting things. First, about 53% of queries have 1 page with metadata in the top 10 results. However, lots of the data Mika saw was not clean and contained information that was not well formed, and performance was pretty poor due to lack of an index. So the unfortunate conclusion that Mika came to was that RDF templating was difficult and the approach was not easily scalable. Finally, Mika realized that metadata really needs to be on the page for users to see, because otherwise there is a big opportunity for semantic spam.

The Birth of SearchMonkey

The point of any experiment is to draw the right conclusions. Looking at the facts, Mika and the Yahoo! search team realized that they could not count on enhancing search by leveraging metadata on today's web - it simply does not exist to the extent needed. At the same time, it was clear that enhancing search results and cross linking them to other pieces of information on the web is compelling and potentially disruptive. Yahoo! realized that in order to make this work, they need to incentivize and enable publishers to control search result presentation. And thus, SearchMonkey was born.

SearchMonkey is a system that motivates publishers to use semantic annotations, and is based on existing semantic standards and industry standard vocabularies. It provides tools for developers to create compelling applications that enhance search results. The main focus of these applications is on the end user experience - enhanced results contain what Yahoo! calls an "infobar" - a set of overlays to present additional information. For example, with SearchMonkey, LinkedIn is able to surface additional information from the user profile, Netflix can present a blurb a about plot and a rating for a movie, and Barnes & Nobles can embed a preview of a book.

SearchMonkey's aim is to make information presentation more intelligent when it comes to search results by enabling the people who know each result best - the publishers - to define what should be presented and how.

A Better Search Experience Ahead

This first version of Search Monkey is just the first small step towards creating a better search experience. Much more is planned, but even with this first simple version, we can clearly see the power of semantics and annotations in web pages. By creating the right incentive for publishers and putting them in control, Yahoo! is aiming to up the bar on search results, and, who knows, maybe even start attracting converts from Google's plain-looking results.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semtech_making_the_web_searchable_searchmonkey.php http://www.readwriteweb.com/archives/semtech_making_the_web_searchable_searchmonkey.php Semantic Web Tue, 27 May 2008 20:29:34 -0800 Alex Iskold
Reuters Launches Calais 2.0 - Now With Pop-Culture Thomson Reuters' Calais, a semantic markup API that we first reviewed in February, has reached its 2.0 release. The latest version aims to fix one of the main issues with Calais -- that it was too focused on business. Because Calais has roots as Clearforest, the rules it applies while parsing text are biased toward the language of business, which meant that its utility was limited. Version 2.0 has added new semantic entity types in an effort to rectify that.

]]> Calais 2.0 has a dozen new semantic entity types, which Reuters says will increase its utility for "pop-culture publishers and bloggers covering media, music, entertainment and sports, as well as those covering pharmaceuticals, medicine and healthcare." In addition to expanded semantic identification capabilities, Calais 2.0 can now prints results in the Simple Tags format and Microformats, as well as the original RDF.

More than 3,200 developers have signed up to work with Calais since launch, according to product lead Thomas Tague, who said in a press release that Calais and plugins and services built on the API will "make it easy to kick-start metatagging and enter the era of the Semantic Web."

Along with an updated web site, a handful of new code samples and libraries, Thomson Reuters is announcing three new plugins that utilize Calais.

  • Calais Marmoset is a tool that enables developers to automatically create metadata for use with Yahoo!'s open search platform, Search Monkey (our coverage).
  • Calais is also announcing the official release of Tagaroo, a Wordpress plugin that allows bloggers to automatically tag relevant people, places and things in their posts, as well as pull in semantically relevant Flickr photos. We wrote recently about an unofficial Wordpress plugin for Calais, and noted that its utility would be limited mainly to business and tech bloggers because those were the API's strengths. Calais 2.0 should theoretically improve the utility for both plugins for a wider variety of bloggers.
  • Though they've been out since last month, Thomson Reuters is also officially introducing their Calais plugins for Drupal, a popular content management system, that it developed with Phase2Technology.

Calais is an awesome top-down semantic API that can help fuel the bottom-up approach by combing unstructured data and spitting out structured tags. We're excited for the second version of Reuters' product and the added utility that new semantic entity types should bring.

]]> Discuss]]>
http://www.readwriteweb.com/archives/calais_20_launches.php http://www.readwriteweb.com/archives/calais_20_launches.php Product Reviews Sun, 18 May 2008 21:01:01 -0800 Josh Catone
Semantify Your Web Apps with Triplify Alright, "semantify" may not be an actual word, but you can probably guess at its meaning: "add a semantic layer to." In this case, we're looking at a small plugin called Triplify that reveals the semantic structures of web applications by converting their database content into semantic formats.

]]> About Triplify

To grasp what this all means, we'll translate into plain English:

A large part of the content on the web is generated by web applications that are driven by databases on the back-end. For example, look at the top 15 most popular web apps hosted at Sourceforge:

Sourceforge Projects, Image via Triplify.org

However, the structure and semantics in these relational databases behind apps, such as those above, are not accessible by search engines. What Triplfiy does is use the structured nature of the databases behind these and other, similar apps to generate semantic data.

How It Works

The Triplify plugin generates database views by performing a small number of queries against the web app's database. These views are then converted into a semantic format - either RDF, JSON, or Linked Data representations. Once in this format, data can then be shared and accessed on the Semantic Web.

Triplify Overview, Image via Triplify.org

To install the plugin, you download and extract the folder containing the script into your web app. Then download a Triplify configuration matching your Web application or create a new one. There's an example file to get started with, or you can use one of the files already available, like this one for WordPress or this one for Joomla.

Finally, integrate the plugin into your web application. (More info here).

Benefits

Once the web app has been "triplified," search engines can better evaluate the content, and semantic search engines, like Sindice, SWSE, or Swoogle can do the same.

But even better, once Triplify is installed, your web app becomes easily mashable with other web data sources via a tool like Yahoo! Pipes, for example.

The Challenge

Because those behind Triplify feel strongly about expediting the deployment of the Semantic Web, they're posing a challenge to the web developer community: develop the most innovative and promising semantifications and win fabulous prizes!

The first prize is a MacBook Air, second prize is an Asus EeePC, and third prize is an iPod Touch.

To get a better idea of what they will be looking for, check out the Challenge page of the Triplify site.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantify_your_web_apps_with_triplify.php http://www.readwriteweb.com/archives/semantify_your_web_apps_with_triplify.php Product Reviews Mon, 21 Apr 2008 10:51:34 -0800 Sarah Perez
Australian Museum Uses Open Calais to Tag Collection The Powerhouse Museum of Science and Design in Sydney, Australia has begun to utilize the Reuters Open Calais API (our coverage) to tag their collection. The museum's online collection database houses some 66,303 objects, so tagging them all by hand would be quite a task. By using the Open Calais web service, the museum is able to automate much of the process.

]]> That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum's electronically documented collection is online in the database which went live in June 2006. Museum objects are searchable, taggable (by humans) and painstakingly described.

However, there are so many objects, that even though users can help to tag them, many of them haven't yet been tagged. Sebastian Chan, who is the Manager of Web Services at the museum, told us that Open Calais is being used to compliment the people-powered tagging they've had running for two years. "What Open Calais lets us do now is connect people, places and companies across our collection and has already revealed many new pathways through our dataset (navigating by designer or inventor is now much easier for example)," he said.

The automatically generated tags at right were created by the API for some swim wear designed by Speedo for the 1991 Australian swimming team that competed at the World Swimming Championships in Perth. Open Calais was correctly able to identify some important locations in the document -- Perth where the competition took place, and Sydney where Speedo is based -- as well as an important corporation (Speedo). It also picked up the name of the designer, and the name of the person who owned the suits before the museum.

However, as you can see, the API made some mistakes too -- it classified "World Championships" as a company, and mistook the general text "international swimming organisation" as an actual organized body. It missed the actual organization (FINA) and probably should have picked up the MacRae Knitting Mills company, which was a predecessor to Speedo. Further, because Open Calais is built around people, places, and companies, general information about items may be lost on it. Tags that would be obvious to humans, such as swimming, swim wear, Olympics, or the year 1991, are beyond the scope of Open Calais.

"These errors and other like them reveal Open Calais' history as Clearforest in the business world," said Chan. "The rules it applies when parsing text as well as the entities that it is 'aware' of are rooted in the language of enterprise, finance and commerce." On the other hand, according to Chan, the technology has already revealed "many new connections between objects," even though it has so far been deployed only very sparingly across the collection.

Powerhouse's use of Open Calais may be the first large scale deployment of the technology across a large public data set. It will be interesting to see the results as they evolve. "It is important to remember that there is no way that this structured data could be generated manually - the volume of legacy data is too great and the burden on curatorial and cataloguing staff would be too great," reminded Chan.

]]> Discuss]]>
http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php Trends Tue, 01 Apr 2008 16:45:34 -0800 Josh Catone
Swotti - A Semantic Opinions Aggregator Swotti is a new semantic search engine that aggregates opinions about products to help you make purchasing decisions. With Swotti, you can learn from the good and bad experiences of others as the site gathers together reviews and feedback from across the web and categorizes them to provide you with more information about the product you're interested in. What's unique about this search engine is that it uses semantics to do so.

]]> There isn't a lot of info about Swotti on their main site - no FAQ, no blog, no how-to section; it's just a search box on a white page. But as you begin typing, search suggestions appear underneath the search box, making it easier to find what you're looking for. Click on search and you'll be taken to a product reviews page, where you'll be amazed at the amount of data displayed.

Swotti aggregates opinions about products from product review sites, forums and discussion boards, web sites and blogs, and then categorizes those reviews as to what feature or aspect of the product is being reviewed, tagging it accordingly, and then rating the review on as positive or negative.

Take the iPhone for example - each review is tagged with keywords like Design, Usability, Display, Reliability, Noise, Battery, Service, Camera, Keypad, Size, etc. Based on the number of positive reviews for a tag, a rating for that feature is given. Bar charts show green bars for good, yellow for average, or red for bad reviews. And they seem to be pretty accurate, at least for the iPhone - "design" is 5 green bars, "speed" is 3 red bars.

There is even a pie chart that summarizes the views. In the iPhone example, 15% said "I Love," 11% said "Too Expensive," 11% said "Worst." (Note to those who hated your iPhones: please send them this way.)

Product images display on the left and the reviews themselves, linked to the original source, display on the right. The reviews can also be sorted to display the best reviews, the worst, or the most relevant. Beneath the sorting options, the number of reviews display.

iPhone Results in Swotti

What's interesting is that this data seems to have been collected, tagged, and rated using only Swotti's technology. This isn't Mahalo - no user-intervention here - it's all automated.

One problem with the site seems to the be with the English spellings of things and wording, like "Adjective" was spelled "Adjetive." Since the site is also offered in Spanish, its likely that the Spanish version was created first and this is an English translation. However, this is only a minor drawback.

Whether it gets it right all the time - that's the real issue. The problems lies in similarly named products, obviously something that is still being sorted out. For example, a search for the Lenovo x300 also returned results for the Dell Latitude x300. I couldn't filter out the Dell results by using -dell in my query a la Google, as that returned a "No enough opinions" result (Yep, that's the English again).

Clicking on "Are you unsatisfied with your results? Help us" gave me a Spanish entry form which returned a bunch of code when I submitted my comments...although at the bottom it did say "Gracias por haber dado tu opinion," so maybe it went through anyway.

Altough these issues would have to be worked out for the site to became mainstream, it doesn't deduct from Swotti's potential - Swotti is reading, categorizing, and rating data from the web on its own. A great concept which hopefully will get better with time. Definitely worth watching.

]]> Discuss]]>
http://www.readwriteweb.com/archives/swotti_a_semantic_opinions_aggregator.php http://www.readwriteweb.com/archives/swotti_a_semantic_opinions_aggregator.php Product Reviews Fri, 21 Mar 2008 10:08:31 -0800 Sarah Perez
Live Semantic Service Inform.com Takes $15m Investment Semantic analysis service Inform.com announced today that the company has received a $15 million investment from Spark Capital. Inform analyzes content from online publishers and inserts links from a publisher's own content archives, affiliated sites or the web at large to augment content being published. The company says it already has more than 100 clients, including CNN.com, WashingtonPost.com and the Economist. Those who would contend that semantic web technology has not arrived can stick that in their pipes and smoke it.

]]> Inform says its technology determines the semantic meaning of key words in millions of news stories around the web every day in order to recommend related content. The theory is that by automating the process of relevant link discovery and inclusion, Inform can easily add substantial value to a publisher's content. Inform also builds out automatic topic pages, something you can see around WashingtonPost and CNN.com. It sounds like a solid value proposition to me. This is the kind of thing that semantic technology is best at providing: making content machine readable allows the human mind to focus on genuinely creative work instead of determining things like what constitutes related content.

Standards?

No. Inform crunches straight text and outputs HTML. I asked whether they publish content with any standards based semantic markup and they said that actual publishing is up to publishers. That's a shame, I don't see any reason why Inform wouldn't participate in the larger semantic web to make its publishers' content more discoverable. Perhaps when you've got 100 live clients and now $15m in the bank, it feels like there's no reason to open up and play nice with a movement of dreamers having trouble getting other apps out of academia.

Different Approaches

While many publishers have been criticized for linking only to their own internal pages for reference (including many leading blogs) it's good to see that Inform at least provides the option of including outside links. That is, after all, one of the most important characteristics of the web - links from one site to another.

Inform indexes blogs, audio and video as well at standard web pages. It's a smart idea and similar to a number of related companies you may be more familiar with. Our own Alex Iskold runs AdaptiveBlue, a semantic company that offers related links tied to links already added by publishers and a semantic browser plug-in. SystemOne is an elegant system that offers related content automatically during the writing process. Lijit is a custom search engine of sorts, allowing you and your readers to manually search through a confined set of content.

The key way the above services probably differ is the degree of automation that they offer. Inform is highly automated, once a publisher sets up general rules for brining in related content. A publisher might say, for example, to insert a link to their own content on any terms they have more than 20 articles about from the past week, or that their affiliate network can provide content for with a certain minimum percentage of relevance.

There's some heavy math and linguistics going on at Inform and it's a good example of how proprietary technology is headed for the bank while open standards based approaches dawdle. In theory openness and standards should be clear winners in terms of ultimate value delivered to any company, someday. In the meantime, publishers can deploy Inform's semantic technology now.

]]> Discuss]]>
http://www.readwriteweb.com/archives/inform_funding.php http://www.readwriteweb.com/archives/inform_funding.php Product Reviews Wed, 23 Jan 2008 12:03:35 -0800 Marshall Kirkpatrick