semantic search - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/semantic search en Copyright 2009 Richard MacManus readwriteweb@gmail.com Mon, 23 Nov 2009 13:08:45 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Hakia Relaunches With 'Credible Sites' hakia_logo.pngSemantic search engine Hakia announced a major redesign of its site today, including the addition of 'credible sites' to its search index. In order to create this index of trustworthy sites, Hakia is asking volunteers to submit credible, peer reviewed sources. Credible sites are currently limited to health and environmental topics, but Hakia is planning to expand this quickly. By adding these credible sources, Hakia wants to go beyond '10 blue links' and give its users an alternative to popularity driven approaches like Google's PageRank. Hakia has also added a 'Galleries' section, which is a structured directory of some of the most popular search topics.

]]>Sponsor

]]> Credible Sources

In order to create this index of credible and trustworthy sites, Hakia is relying on volunteers. Hakia is specifically recruiting librarians, though it seems anybody can sign up, which could potentially leave the site open to spammers. Hakia asks submitters for their professional credentials, but it is not clear if the company will actually check these.

hakia_new_sshot.png

Hakia uses a very strict definition for what makes a site credible. To be included in the index, a site should have gone through a peer review process, not have any commercial bias, and the information should be current. The fact that Hakia insists on only adding peer reviewed sites should greatly enhances the signal-to-noise ratio of the search results.

Great Structured Results

In our tests, we were often impressed by hakia's ability to structure its regular search results. For 'Sarah Palin', for example, Hakia organizes the results by official websites, images, news, biography, awards, and speeches. A search for 'Portland, OR,' on the other hand, first displays general information about the city, images, transportation options, and restaurant guides.

hakia_credible_small.pngAll results now also feature images and user-generate content.

Whenever we tried to ask more general questions ("What is a blog?"), however, Hakia's results were often underwhelming and uneven. Sometimes we got results that were spot-on, while at other times, the results barely had anything to do with our query.

Hakia also introduced 'my hakia,' a personal start page which still looks a bit unfinished, but seems to rely on Hakia's expertise in structuring search results to give users more background information about current events.

Overall, we liked Hakia's updates and we are looking forward to the expansion of the 'credible sources' to other topics, as we were quite impressed with the results it returns already.

]]>Discuss]]>
http://www.readwriteweb.com/archives/hakia_relaunches_with_credible.php http://www.readwriteweb.com/archives/hakia_relaunches_with_credible.php News Mon, 06 Oct 2008 10:24:55 -0800 Frederic Lardinois
Live Search: Powerset Integration Already Going Live live_search_logo_sep08.pngMicrosoft only acquired the semantic search engine Powerset a little more than a month ago, but today, the Powerset team announced the first integration of its search technology into Microsoft's Live Search.  Specifically, Live Search will now show better instant answers for queries like "San Francisco weather" and return better results based on Freebase and Wikipedia articles. Currently, these Powerset enhanced results will only appear for a random set of users, but over time, we assume that most of these features will be rolled out for everybody.

]]>Sponsor

]]> Powerset has also integrated xRank biographies into Live Search, which, at least for us, appeared in almost every related search. Live Search will also make use of Powerset's Factz engine to display better related searches.

powerset_live_xrank.png

It is encouraging to see that Microsoft has been able to integrate Powerset's technology into its own products this quickly. Live Search, which is far behind Google in terms of market share, needs exactly these kinds of features to make its search more relevant.

After the acquisition was announced, we wondered if a combination of Microsoft and Powerset could indeed beat Google. Judging from these first results of the Powerset integration, we can at least conclude that Microsoft will make a strong effort to beat Google in terms of search relevance. Whether this is enough to challenge Google's dominance remains to be seen.

]]>Discuss]]>
http://www.readwriteweb.com/archives/live_search_powerset_integrati.php http://www.readwriteweb.com/archives/live_search_powerset_integrati.php Products Wed, 17 Sep 2008 10:26:50 -0800 Frederic Lardinois
Semantics + Google = SemantiFind SemantiFind is a newly launched semantic search tool which made its debut at the recent DEMO conference. Unlike other semantic search engines such as Hakia and the recently acquired Powerset, SemantiFind isn't looking to create a whole new search engine from scratch. Instead, they decided to improve upon the one engine we already use: Google.

]]>Sponsor

]]> How To Use SemantiFind

To get started with SemantiFind, you must first create an account. You can then download the browser plugin which installs the SemantiFind toolbar. This plugin is available for both Internet Explorer and Firefox.

To begin using SemantiFind, you must go to www.google.com - the service won't work from iGoogle or your Google search box in your browser. After you enter your search term in the box as usual, you then are prompted to indicate the precise meaning of your term before starting your query. This is done through the use of a drop-down box where specific terms and their definitions display. For example, if you were searching for "Georgia," you would be presented with the option to select either the U.S. state or the former soviet republic.

Once you've selected the word which matches your search term, you'll then be taken to the search results page. The results are simply Google results as you would normally see them, but the extraneous noise from items that don't match your desired query will not display (in theory).

When you find a page you like, you can then mark it as being relevant and useful with one click of the "Semantify" button on the toolbar. This page will be then be included in your future searches and will also become a part of the SemantiFind community so others may benefit, too. Those "semantified" pages will display at the top of future search results in a separate box.

Does It Work?

Unfortunately, SemantiFind is one of those tools that's good in theory, but not so good in practice. When performing some test searches, results were not as precise as they should have been. For example, in the above-mentioned search for "Georgia," a search for the U.S. state returned Google results for the country as well. Also, the SemantiFind search box included a link to a Valleyway story about the Russian invasion of Georgia the country. Obviously, whoever marked that story as relevant to a search for the U.S. state made a mistake, but that just goes to show why search engines that rely on people to filter the results might not work. Human error shouldn't be a factor in web searches.

Without the "community" element to SemantiFind, the technology could have potential if they would work on providing more accurate results. However, "wisdom of the crowds" is the precise angle they're going for with this tool as they believe it will lead to the best results. We're not so sure, but it's still nice to see some innovation happening in the semantic search space.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantics_google_semantifind.php http://www.readwriteweb.com/archives/semantics_google_semantifind.php Products Wed, 17 Sep 2008 09:00:00 -0800 Sarah Perez
Does Microsoft + Powerset Beat Google? What can the plan be with Microsoft's purchase of hot startup Powerset? The 3-year old company, founded by Dr Barney Pell, recently launched a semantic search experience for Wikipedia.

It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer. In this post we look at what the thinking behind the acquisition might be.

]]>Sponsor

]]> Most initial reviews found the Powerset product release underwhelming. Critics appreciated the innovative semantic UI and recognized its potential, but believed it didn't vastly improve Wikipedia. So in view of the lukewarm reviews, the acquisition by Microsoft was unexpected. The 100M price tag is around 5x the 12M Series A + 8M investment put into the company. Microsoft execs must believe Powerset can be a weapon in its battle with Google.

What Powerset is today

Given a set of unstructured information, Powerset applies Natural Language Processing techniques to extract concepts and the key semantic concepts out of the text. It then builds a semantic index (similar to Google's) as well as a conceptual graph of relationships between entities. This graph is typically expressed in RDF triples.

One of the Powerset innovations is surfacing of semantics to the user interface. The contextual gadget is overlaid to help navigate the unstructured information.

Many thought Powerset to be a generic semantic search engine, but its first product is limited to Wikipedia. It is not trivial to scale the technology to the entire web.

Why Powerset is Powerful

When semantic technologies emerged a few years ago, people started talking about how semantic web and/or semantic search might be a Google killer. The talk was supported by logic that semantic search can deliver more relevant results because it "knows" the content.

Industry realizes that isn't the case. Semantic search has no huge advantage over the statistical approach used by Google. We discussed this in the post Semantic Search - Myth and Reality.

What is powerful about Powerset? Precisely that it doesn't try to search the web as a whole. Right now, the solution works on Wikipedia, but the infrastructure is generic, so any other site could also be enhanced. The contextual outline developed can be used to navigate any content.

Instead of dealing with the whole web, the idea may be firstly to build solutions for specific sites.

Head-on with Google?

Powerset as it is today is no Google killer. At this point only something with huge traction and momentum would stand a chance.

In the search market, Google has a strong hold - potentially stronger if the Yahoo deal goes through. People are conditioned to Google: it's simple and, yes, imperfect, but it's good enough and the results are still better than Live Search.

If Microsoft bought Powerset with the goal to incorporate it into Live Search, then it's likely to be another acquisition to make little impact on the bottom line. In fact, the announcement on the Live Search blog states just that. The number one reason is acquiring talent; the second is the belief that NLP and semantic algorithms will be able to patch holes in today's search.

Today Powerset brings only interesting technology; it doesn't bring traction. So what were they thinking up in Redmond? There may be more subtle play, leveraging the fact Powerset works well on knowledge sets like Wikipedia.

Possibly Microsoft plans to deploy Powerset across its own sites, then perhaps incorporate Powerset into Internet Explorer.

Imagine going to Wikipedia and having a semantic overlay on each page. Now imagine scaling this experience across major information sources around the web.

Providing contextual, semantic experience allows Microsoft to retain eyes longer, shaving off the time people spend searching Google.

This is an important point because Google doesn't make money on search - it makes money on advertising.

Can Microsoft ever beat Google in Advertising?

The real problem Microsoft is seeking to solve is advertising. Until now the web has figured out two fundamentals for advertising - portals and search.

Portals show ads on each page; the more people browse the content, the more ads are shown and the more money is made. The search model emerged as an alternative, now more successful, path to advertising dollars.

With Powerset and other semantic technologies, there's another model: contextual information exploration overlaid on existing content.

If Microsoft can figure how to keep eyes off Google's home page, the game will shift dramatically. The browser is one of Microsoft's most powerful tools - and the default box is Live Search.

If Microsoft wants to win over advertisers, it might just do more with the browser. Incorporating aspects of Powerset's semantic navigator into the browser by default could be a game changer. This is not a straightforward play. A large company with bureaucracy and execution problems is unlikely to be able to merge semantics into the browser quickly and elegantly.

Conclusion

The Powerset acquisition is an interesting move by Microsoft. This hot semantic startup was on everyone's radar.

What can the plan be? It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer.

That is a bold play requiring exact execution - not the kind Redmond has shown lately.

What do you think Microsoft is going to do with Powerset? What are the other applications of this technology that you can think of?

]]>Discuss]]>
http://www.readwriteweb.com/archives/does_microsoft_powerset_beat_google.php http://www.readwriteweb.com/archives/does_microsoft_powerset_beat_google.php Analysis Thu, 03 Jul 2008 01:39:30 -0800 Alex Iskold
Rumor: Microsoft to Acquire Powerset for $100 Million

Venturebeat reports that Microsoft might be close to acquiring the San Francisco based semantic search engine Powerset for about $100 Million. No announcement has been made yet by either party. We contacted Microsoft, but did not get an answer beyond "Microsoft does not comment on rumors or speculation." We will update this post once we receive more information.

Rumors about Microsoft's interest in Powerset had been swirling around the Valley since last month, when Dan Farber first brought up the possibility in a post on CNet.

]]>Sponsor

]]> Powerset launched The consumer-facing side of Powerset currently only searches Wikipedia articles, but Microsoft is most likely more interested in using the underlying technology for its own search products like Live Search. Powerset's specialty iproviding answers through natural language queries like "When was Henry VIII born?" Powerset licensed this technology from Xerox PARC.

Having backing from Microsoft could help the small company to expand beyond Wikipedia and start indexing more of the Internet. Powerset's technology is still unproven to work well for anything but Wikipedia, but if Powerset does manage to scale beyond this, then it would allow users to by-pass Google's keyword driven search in favor of just getting a direct answer to a large number of their questions.

live.png

Mircosoft's search products have struggled to gain any ground back from Google's search. Currently, Google has almost a 70% share of the search market, while MSN/Live Search has about 9.5%.

Powerset's capabilities have generally received very positive reviews and in his original piece on this, Dan Farber already argued that Powerset's ability to create connections between concepts, relationships, and meanings could give it a heads-up over Google's keyword and PageRank driven search.

We first reviewed Powerset vs. Google in May and at the time, Josh Catone's impression wasn't quite as positive and he concluded that "Powerset doesn't do a markedly better job of finding answers than Google for most queries."

Powerset was funded in a $12.5 Million Series A round by Foundation Capital, Founders Fund and various angel investors.

For a more in-depth look at the state of semantic search in general, see also Alex Iskold's article on the myth and reality of semantic search.]]>Discuss]]> http://www.readwriteweb.com/archives/rumor_microsoft_powerset.php http://www.readwriteweb.com/archives/rumor_microsoft_powerset.php News Thu, 26 Jun 2008 15:35:33 -0800 Frederic Lardinois Evri Beta Launches: Search Less - Understand More

Evri, a Paul Allen backed semantic search engine, is launching into a limited beta tonight. Evri was first shown publicly at the D6 conference. Evri's CEO Neil Roseman likes to talk about Evri in terms of organizing content instead of calling it a search engine. At its core, however, Evri definitely is a search engine, though it adds a very sophisticated semantic layer on top of its results that emphasizes the relationships between different search terms.

]]>Sponsor

]]> In its early stages, Evri is only going to start out with a limited set of results and possible search terms, based on what it considers to be the most popular terms and people. This approach of starting with only the most popular terms is reminiscent of Mahalo. However, unlike Mahalo, which relies on paid editors and volunteers to create its results, Evri completely relies on its algorithms to create connections between people, products, concepts, and events.

Evri especially prides itself for having developed a system that can distinguish between grammatical objects such subjects, verbs, and objects to create these connections. In his demo at D6, Roseman described the system as being similar to "an army of 7th grade grammar students graphing the Web."

evri-screen.png

Evri is entering in direct competition with a number of recent entries to the semantic search market, especially Powerset and Hakia. Powerset, however, only indexes Wikipedia articles, while Hakia tries to index all of the web, but focuses less on the relationships between objects and more on providing highly organized results for a given term.

You can sign up for invites to Evri on their homepage. The first wave of users should be receiving invites tonight.

For a more in-depth look at the state of semantic search, see also Alex Iskold's article on the myth and reality of semantic search.

]]>Discuss]]>
http://www.readwriteweb.com/archives/evri_beta_launches_search_less.php http://www.readwriteweb.com/archives/evri_beta_launches_search_less.php News Tue, 24 Jun 2008 21:01:00 -0800 Frederic Lardinois
Thinkbase: Mapping the World's Brain If Freebase is an "open shared database of the world's knowledge," then Thinkbase (found via information aesthetics) is a mind map of the world's knowledge. The interesting and incredibly addictive Freebase visualization and search tool is the brainchild of master's degree student Christian Hirsch at the University of Auckland. Thinkbase is one of the cool proof of concept applications built on top of Freebase that we mentioned last week.

]]>Sponsor

]]> As we've mentioned here on RWW, Freebase is best suited for complex inferencing queries -- the type that expose relationships between various entities to figure out an answer. Things like, "What's the name of the actor who was in both "The Lord of the Rings" and "From Hell?" (Answer: Ian Holm)

Thinkbase doesn't necessarily answer those questions -- at least not directly, but it does allow people to visually explore the relationships that Freebase can expose. Thinkbase employs the Thinkmap visualization software to visually represent the semantic relationships between objects on Freebase as an interactive mind map. Each object on the map is represented by an icon that corresponds to the type of object it is. For example, person, place, movie, song, or artwork.

The site uses a two-pane display, putting the relationship map in the left pane, and the Freebase entry for the active node in the right pane. Every node on a Thinkbase map and be expanded to see concepts related to that object, or collapsed to clean the graph of relationships you're unconcerned with. Every map you create can also be linked to via a dynamic share URL.

Thinkbase is a really fun visual front end to the Freebase database that exposes the semantic relationships that such a database can reveal in a compelling way. Alex Iskold wrote last week that the problem with semantic search is that we're asking the wrong questions. Tools like Thinkbase can help us start to think about what type of questions we should be asking by clearly showing the type of semantic relationships that databases like Freebase excel at finding.

]]>Discuss]]>
http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php http://www.readwriteweb.com/archives/thinkbase_mapping_the_worlds_brain.php Products Thu, 05 Jun 2008 10:30:01 -0800 Josh Catone
Semantic Search: The Myth and Reality For a few years now people have been talking about semantic search. Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different. What are we doing wrong?

]]>Sponsor

]]> For example, when asked, What is the capital of France? both approaches come back with the correct answer - Paris. Also, a lot of queries that we are used to typing into Google in abbreviated form, come back with similar results if we type them using natural language. Clearly something is off. We all know that semantic technologies are powerful, but how and why? In this post we will show that the problem is that we are asking wrong questions.

The mistake is that semantic search engines present us with Google-like search box and allow us to enter free form queries. So we type the things that we are used to asking - primitive queries. It never occurs to us to type in What actor starred in both Pulp Fiction and Saturday Night Fever? or What two US Senators received donations from a foreign entity? We type simple questions, but this is not where the power of semantic search lies. Lets look at the spectrum of semantic technologies from Google, to SearchMonkey, to Powerset, and Freebase to understand what is going on.

What Problem Are We Trying to Solve?

The first confusion in the space comes from the fact that semantic search is being positioned as the answer to all possible problems - from modern search, currently dominated by Google, to problems that are computationally impossible. The situation is made more difficult by the fact that right now there is only a thin range of problems where semantic search can clearly do better. This range is complex queries involving inferencing and reasoning over a complex data set.

As shown in the diagram above basic queries are easily handled by Google. Sadly, natural language processing gives little advantage when it comes to this category of problems. Google correctly answers the question about Leonardo Da Vinci's birthday leaving no opportunities to improve the search by understanding the nouns and the verbs that user typed in.

Before looking at the problems that are perfect for semantic search, lets look at the hardest problems. These are computationally challenging problems that really have nothing to do with understanding semantics. The misconception has been perpetuated since early days of the Semantic Web that somehow, because we will annotate the web, we will be able to solve these super complex problems. This is simply not true. There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.

The good news is that there is a set of problems that are great for semantic search. These are the problems we have been solving so wonderfully with relational database. Way too often we forget that semantic technologies are here to help us represent relational data spread over the entire web - so it should be no surprise to us that it is relational queries that semantic search engines would excel at.

The Spectrum of Semantic Search Players

But semantic search is not just about the questions that we are asking. Because the web is just a bunch of unstructured HTML pages, semantic search is also about the underlying data. At its most structured extreme we find Freebase - the semantic database of everything. Freebase is accessible via free text search, but more importantly via MQL (Metaweb Query Language). MQL is essentially JSON with wildcards. Using it you can construct any query against Freebase and the result will be the same query with answers filled in.

Powerset, in a way, is just a relational database. It operates against certain, structured information. On the other end of the spectrum is Google, which is all about statistical frequencies and very little semantics. The recently launched SearchMonkey from Yahoo! is an interesting twist. It does not add anything to the result set, but instead uses semantic annotations to present a richer, more interactive and useful user interface.

Companies like Hakia and Powerset are probably working the hardest. These companies are trying to simultaneously build Freebase-like structures on the fly and then do natural language queries on top of them. The difference is that Hakia is using (likely similar) technology to query over the entire web, while Powerset has (probably shrewdly) chosen to restrict the search to Wikipedia.

Are Hakia, Powerset and Freebase All That Different?

This analysis brings up a question - which of these technologies are different and which are essentially the same? Lets get the easy one down first. Yahoo!'s SearchMonkey is no different from Google or any other search, as far as the core search technology is concerned. The difference is simply in the presentation layer. SearchMonkey is smart about creating a better user experience by letting publishers present the search results to the users in the best possible way.

But when it comes to Hakia, Powerset and Freebase the situation is much more complicated. On the surface all these products are different - Hakia lets you search the whole web, Powerset is restricted to Wikipedia (and Freebase!) and Freebase itself has two search interfaces - the search box and query language. Here is the problem - the natural language interface has nothing to do with the underlying data representation.

The fact is that all of these semantic search technologies allow people to type in arbitrarily complex questions and then interpret these queries and execute them against their databases. Fundamentally, Hakia, Powerset, and Freebase are databases. Fundamentally, all of them have some kind of Natural Language Processing that translates the question into a canonical query over the database.

To gain insight into all of this, think about Freebase and its query language MQL. Unlike natural language, which allows all sorts of constructs, MQL is non-ambiguous. This JSON-like language allows users to construct precise statements against Freebase. The fact that Powerset allows natural language queries does not mean that inside Powerset there is no database. For sure, though, there is a similar kind of database as there is beneath the Freebase search box. What is really different about Freebase and Powerset is the data gathering approach and user experience.

Back to the Future: It's All About UI

Probably the most striking revelation about the semantic search space is User Interface. First, to go on the tangent, Powerset got it right by realizing that semantics needs to be surfaced in the UI. After a user searches Powerset, a contextual gadget, aware of the semantics of the results, helps the user complete the search experience.

Yet the biggest mistake that I think Powerset is making is also in the UI. The search box that everyone is familiar with via traditional web search engines needs to go. Having a simplistic search interface hurts Powerset and Hakia, and to a lesser extent Freebase, which is not positioning itself as generic search.

Think about the recent launch of Powerset. The company released a vastly better way to interact with one of the most important sources of information on the web - Wikipedia. But what did the critics say? Lets see if this is a Google killer. And the answer to that is "no."

But what if Powerset restricted what can be searched? What if instead of a search box there was another interface or what if they told users not to look up things that they can find easily on Google? Why is it that new companies are expected to improve on the algorithm that has ruled the web for over a decade? Instead, the expectation should really be to solve the problems that can not be solved by Google today.

Conclusion

Semantic search is an upcoming technology that has set the expectations way too high. We have all been misled into thinking that these technologies are here to dethrone Google by delivering better search results. Neither of those things are true. What is true, however is that semantic search is going to be big and it is going to help us answer questions that we simply cannot answer today - complex, inferencing queries asked over the entire web as if it was a database.

In order for these semantic search technologies to make a dent in the market, they need to clean up their messaging and most importantly, their user interface. Presenting a search box is both misleading and detrimental, as people associate it with the simplistic questions that Google solves without any problems. To really showcase semantic search, these companies need to come up with innovative UIs that will help users to understand the power that is being put at their fingers.

As always, please tell us what you think. What should semantic search companies do to gain their place in the marketplace?

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php http://www.readwriteweb.com/archives/semantic_search_the_myth_and_reality.php Trends Thu, 29 May 2008 14:15:01 -0800 Alex Iskold
Semantic Travel Search Engine UpTake Launches According to a comScore study done last year, booking travel over the Internet has become something of a nightmare for people. It's not that using any of the booking engines is difficult, it's just that there is so much information out there that planning a vacation is overwhelming. According to the comScore study, the average online vacation plan comes together through 12 travel-related searches and visits to 22 different web sites over the course of 29 days. Semantic search startup UpTake (formerly Kango) aims to make that process easier.

]]>Sponsor

]]> UpTake is a vertical search engine that has assembled what it says is the largest database of US hotels and activities -- over 400,000 of them -- from more than 1,000 different travel sites. Using a top-down approach, UpTake looks at its database of over 20 million reviews, opinions, and descriptions of hotels and activities in the US and semantically extracts information about those destinations. You can think of it as Metacritic for the travel vertical, but rather than just arriving at an aggregate rating (which it does), UpTake also attempts to figure out some basic concepts about a hotel or activity based on what it learns from the information it reads. Things such as, is the hotel family friendly, would it be good for a romantic getaway, is it eco friendly, etc.

"UpTake matches a traveler with the most useful reviews, photos, etc. for the most relevant hotels and activities through attribute and sentiment analysis of reviews and other text, the analysis is guided by our travel ontology to extract weighted meta-tags," said President Yen Lee, who was co-founder of the CitySearch San Francisco office and a former GM of Travel at Yahoo!

What UpTake isn't, is a booking engine like Expedia, a meta price search engine like Kayak, or a travel community. UpTake is strictly about aggregation of reviews and semantic analysis and doesn't actually do any booking. According to the company only 14% of travel searches start at a booking engine, which indicates that people are generally more interested in doing research about a destination before trying to locate the best prices. Many listings on the site have a "Check Rates" button, however, which gets hotel rates from third party partner sites -- that's actually how UpTake plans to make money.

The way UpTake works is by applying its specially created travel ontology, which contains concepts, relationships between those concepts, and rules about how they fit together, to the 20 million reviews in its database. The ontology allows UpTake to extract meaning from structured or semi-structured data by telling their search engine things like "a pool is a type of hotel amenity and kids like pools." That means hotels with pools score some points when evaluating if a hotel is "kid friendly." The ontology also knows, though, that a nude pool might be inappropriate for kids, and thus that would take points away when evaluating for kid friendliness.

A simplified example ontology is depicted below.

In addition to figuring out where destinations fit into vacation themes -- like romantic getaway, family vacation, girls getaway, or outdoor -- the site also does sentiment matching to determine if users liked a particular hotel or activity. The search engine looks for sentiment words such as "like," "love," "hate," "cramped," or "good view," and knows what they mean and how they relate to the theme of the hotel and how people felt about it. It figures that information into the score it assigns each destination.

Conclusion

Yesterday, we looked at semantic, natural language processing search engine Powerset and found in some quick early testing that the results weren't that much different than Google. "If Google remains 'good enough,' Powerset will have a hard time convincing people to switch," we wrote. But while semantic search may feel rather clunky for the broader global web, it makes a lot of sense in specific verticals. The ontology is a lot more focused and the site also isn't trying to answer specific questions, but rather attempting to semantically determine general concepts, such as romanticness or overall quality. The upshot is that the results are tangible and useful.

I asked Yen Lee what UpTake thought about the top-down vs. the traditional bottom-up approach. Lee told me that he thinks the top-down approach is a great way to lead into the bottom-up Semantic Web. Lee thinks that top-down efforts to derive meaning from unstructured and semi-structured data, as well as efforts such as Yahoo!'s move to index semantic markup, will provide an incentive for content publishers to start using semantic markup on their data. Lee said that many of UpTake's partners have already begun to ask how to make it easier for the site to read and understand their content.

Vertical search engines like UpTake might also provide the consumer face for the Semantic Web that can help sell it to consumers. Being able to search millions of reviews and opinions and have a computer understand how they relate to the type of vacation you want to take is the sort of palpable evidence needed to sell the Semantic Web idea. As these technologies get better, and data becomes more structured, then we might see NLP search engines like Powerset start to come up with better results than Google (though don't think for a minute that Google would sit idly by and let that happen...).

What do you think of UpTake? Let us know int he comments below.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_travel_search_uptake.php http://www.readwriteweb.com/archives/semantic_travel_search_uptake.php Products Wed, 14 May 2008 06:00:00 -0800 Josh Catone
i360 Adds Semantics to Everything Tony Sukiennik believes the power of the people trumps the power of the algorithm when it comes to the development of semantic technology. His company, infoGenome, a startup that has been in stealth mode for about four and half years, wants to harness that power by making semantics easy via its innovative drag-and-drop functionality. The i360 software he's developed is essentially the "Mahalo of semantic apps," relying on human knowledge to add meaningful layers of metadata to the information we work with every day. With i360, you can add semantics to everything.

]]>Sponsor

]]> People-Powered Semantics

When you're doing a web search, you instantly know what information is relevant and which isn't. At i360, they call this flash of understanding an "instant of information insight." In a split second you can identify something as being useful, but the problem in today's world is that there are too many ways to store that information - you can tag it, bookmark it, save it to file, email it, blog about it, share it with others, and so on. Overwhelmed by choices, busy people often choose to "just remember it," a decision that leads to the inevitable: forgetting. The human mind is already overloaded with input, so isn't the ideal repository for storing all the complexities of our information-filled lives.

Instead, software should be doing the remembering for us. That's where i360 comes in. The application itself is really just a prototype of this conceptual idea, but one that Tony hopes Google might be interested in. Or maybe Microsoft. (He plans on proposing his ideas to both companies to see who bites.)

What the i360 software does is provide a way quickly add mark up and add meaning to the data you're working with - be it a link on the web, an email, a file, or anything - with semantics. This process is done via a quick drag-and-drop into the app.

That isn't to say that this technology is using semantics in the technical sense of the word - it's not about converting everything into machine-readable formats for use on the semantic web; what it is doing, though, is adding semantics to everything by assigning meaning to that email, that PDF, that link, that note, that spreadsheet, etc. Meaning that only you, and not a computer or an algorithm, could know. In doing so, the technology is not focused on a semantic web per se, but a semantic database of your own, made up of not only web links, but also files, contacts, emails, keywords, and more, and knowing how they all are associated with each other.

Although Tony believes that we shouldn't give up on the algorithm - by all means, research should continue in that area - he feels strongly that his technology, which taps into the power of the human brain, gives people the ability to organize and assign value to information in a way that a machine cannot.

How It Works

What i360 does is complex and sort of hard to understand if you're not working with it directly. In fact, it's easier to understand if you work backwards from the end result of using the technology.

For example, imagine you do a Google Desktop Search or a Google Enterprise Search, and, instead of just links to items that match keywords, you get something a little more like this:

Augmented Search Results

You can see that by using the software, you've managed to associate people, documents, notes, and more with the original file.

The process of making these associations is via a "fire and move on" drag-and-drop methodology. See a useful link? Drag-and-drop it into i360. Highlight some text and drag and drop that as the item's description. Click a button and a screenshot is added automatically. Now associate that link with a person. That  person with a Word document. That document with a search and an email...and so forth, and so on.

Saving a Web Page

Within a company, the i360 technology can also be used to work with internally running applications, like Microsoft's SharePoint, for example...or any other application to which you have the cooperation of the vendor or access to the app's code base. With 100 lines of code, information from these applications can pass data from the app itself back to the i360 environment as just another informational nugget that can be associated with a person, a file, or anything else.

There's more this application can do, too. For example, searches themselves could begin in a more structured format - focusing on just what you're interested in finding (see example below). Each item you're researching can be available with one click from a sidebar - no saving to del.icio.us required.

Focused Searching

The results of your searches can then be transformed into a new file with links (see below), retaining the same structure of your own headings and listed items, and that file can then be emailed to someone else or published as a page available publicly on the web. If you find something new to add to it, be it another link or a file or anything else, you can just drag-and-drop that new item to i360 to update the results on the fly.

Formatted Results Can Be Shared With Others

A project team in the workplace could use the application together, associating people and emails and files and searches with each other, creating a database of content surrounding their project. A year later, an employee in another department could search via their company's enterprise search and find all the information in that project and how it all interrelates, even if all the original team members had moved on to other jobs in other companies. No more would "everything is stored in that one guy's head" be the norm. Employees could move on, but the data they created or found, and the way that data relates to other data, would remain.

Where It Needs Improvement

As a concept - simple drag-and-drop semantics - the technology is fascinating. In practice though, it's still very rough. You couldn't install i360 and be off and running in minutes - you would still need training to know how to use it as it exists in its present form. It today's world of bubbly web apps, anything that isn't immediately intuitive isn't going to be adopted by the majority of users. The whole Enterprise 2.0 trend is about bringing the simplicity of consumer applications into the corporate world, and, although that is this software's goal, unfortunately, I can't say that it achieves it.

The UI itself is confusing. They've made some interesting choices - the address bar is at the bottom, for example; buttons are labeled with things like "E+" - a reference to the name of a portion of the software suite, but one that is meaningless to the new user. The graphics and fonts used look ancient.

The UI

Conclusion

However, that being said, if you can look past the UI to the underlying idea, there's something about this concept - human-powered semantics - semantics over everything - that could be great, if someone could just make it pretty. It could even be the future.

]]>Discuss]]>
http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php Products Mon, 05 May 2008 12:55:27 -0800 Sarah Perez
Semantify Your Web Apps with Triplify Alright, "semantify" may not be an actual word, but you can probably guess at its meaning: "add a semantic layer to." In this case, we're looking at a small plugin called Triplify that reveals the semantic structures of web applications by converting their database content into semantic formats.

]]>Sponsor

]]> About Triplify

To grasp what this all means, we'll translate into plain English:

A large part of the content on the web is generated by web applications that are driven by databases on the back-end. For example, look at the top 15 most popular web apps hosted at Sourceforge:

Sourceforge Projects, Image via Triplify.org

However, the structure and semantics in these relational databases behind apps, such as those above, are not accessible by search engines. What Triplfiy does is use the structured nature of the databases behind these and other, similar apps to generate semantic data.

How It Works

The Triplify plugin generates database views by performing a small number of queries against the web app's database. These views are then converted into a semantic format - either RDF, JSON, or Linked Data representations. Once in this format, data can then be shared and accessed on the Semantic Web.

Triplify Overview, Image via Triplify.org

To install the plugin, you download and extract the folder containing the script into your web app. Then download a Triplify configuration matching your Web application or create a new one. There's an example file to get started with, or you can use one of the files already available, like this one for WordPress or this one for Joomla.

Finally, integrate the plugin into your web application. (More info here).

Benefits

Once the web app has been "triplified," search engines can better evaluate the content, and semantic search engines, like Sindice, SWSE, or Swoogle can do the same.

But even better, once Triplify is installed, your web app becomes easily mashable with other web data sources via a tool like Yahoo! Pipes, for example.

The Challenge

Because those behind Triplify feel strongly about expediting the deployment of the Semantic Web, they're posing a challenge to the web developer community: develop the most innovative and promising semantifications and win fabulous prizes!

The first prize is a MacBook Air, second prize is an Asus EeePC, and third prize is an iPod Touch.

To get a better idea of what they will be looking for, check out the Challenge page of the Triplify site.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantify_your_web_apps_with_triplify.php http://www.readwriteweb.com/archives/semantify_your_web_apps_with_triplify.php Products Mon, 21 Apr 2008 10:51:34 -0800 Sarah Perez
Using Semantic Search to Cure Disease, Prevent Animal Testing One of the big trends in 2008 has been the emergence of what I call Semantic Apps - a kind of 'Web 2.0 Meets Semantic Web' app typified by startups like Twine, Hakia, Quintura, Powerset and others. Another growing trend is health 2.0, web-based health apps and services. What's interesting is that those two trends are crossing over, with semantic health search engines beginning to make an impact.

Two such apps to cross our desk lately were 1) CureHunter, which claims to be able to find cures for diseases using semantic technologies; and 2) Go3R, an app that claims to provide information transparency "for the prevention of animal testing".

]]>Sponsor

]]> Health is an area where Semantic technologies can be put to great use, due to the overwhelming amount of data in the healthcare industry and the fact that it's largely inaccessible to the general public (despite most of it being our data).

CureHunter - Can it Really Cure Diseases?

CureHunter is an example of the new semantically-charged health search engines popping up. As the name suggests, it is a web service that aims to find cures for diseases. Judge Schonfeld is the CEO and Chief Scientist of CureHunter and he described it to us in an email as a "Medical Data Mining engine system that uses an intelligent semantic processor linked to a network graph theory module to read the scientific literature (entire NLM archive 1949-2008 >) and compute new cures for human diseases completely autonomously." That's a mouthful, but I've highlighted the key points: it uses semantic processing, network graphs and most interestingly claims to "compute new cures" automatically.

The following graphic (excerpted) illustrates CureHunter's approach. Essentially it tries to analyse health research data and compute cures:


Click here for full image, with extra detail

CureHunter is pretty complex, but I did some tests for diabetes type 1 to see if I could find a "cure". The results were overwhelming, in an 'info overload' kind of way:

It outlined some interesting "cures", but much of the information was not something patients would understand. It seems like a great resource for doctors and physicians though. So to answer the question in the subheader, can CureHunter really cure diseases? Probably only if you're a doctor or physician who knows how to interpret the wealth of data that CureHunter serves up.

Go3R - Prevents or Amplifies Animal Testing?

The idea of having a health database that includes animal testing results isn't something most people would find very appealing. However Go3R, developed in four months by a company from Germany called Transinsight, claims to be a "knowledge-based search engine for alternative methods to animal experiments." (emphasis ours) The site aims to enable scientists to "take advantage of the benefits of semantic searches for the area of alternative methods in accordance with the 3Rs principle [Replacement, Reduction and Refinement]." Transinsight is already known in the web 2.0 world for GoPubMed, a health search engine that AltSearchEngines has covered before.

You could view Go3R in two ways. The first is the version Transinsight pushes in its press release: that this app makes it easier to find alternatives to animal testing. However the second point of view is that this is a big database that includes animal experiment results, and so it might be seen to amplify the practice of animal testing. For example I searched for "diabetes" and the number 2 result was a test on rats:

Whether you see this as further exploitation of animal testing, or (as Transinsight says) an app that will "lead to a significant reduction of animal experiments", it is an interesting use of semantic technologies(!).

Conclusion

Health search engines are nothing new - indeed both Google and Microsoft have made important announcements in this domain over the past year. In October 2007 Microsoft unveiled HealthVault, a consumer health and search site. In February this year Google announced a pilot program of their health records application called Google Health. A week later, Microsoft acquired Medstory - a vertical search engine for health information. There is also a lot of interest among startups - see our report from the Health 2.0 Conference in March and another report from a healthcare panel at SXSW later that month. Also our network blog AltSearchEngines continuously covers health search engines.

But I'm liking this latest trend for semantically-powered health search engines. If ever there was a compelling need for Semantic Apps to help users make sense of and organize data, it's in health. CureHunter and Go3R are two apps to look out for.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_search_engines_health.php http://www.readwriteweb.com/archives/semantic_search_engines_health.php Products Wed, 09 Apr 2008 15:16:23 -0800 Richard MacManus
Swotti - A Semantic Opinions Aggregator Swotti is a new semantic search engine that aggregates opinions about products to help you make purchasing decisions. With Swotti, you can learn from the good and bad experiences of others as the site gathers together reviews and feedback from across the web and categorizes them to provide you with more information about the product you're interested in. What's unique about this search engine is that it uses semantics to do so.

]]>Sponsor

]]> There isn't a lot of info about Swotti on their main site - no FAQ, no blog, no how-to section; it's just a search box on a white page. But as you begin typing, search suggestions appear underneath the search box, making it easier to find what you're looking for. Click on search and you'll be taken to a product reviews page, where you'll be amazed at the amount of data displayed.

Swotti aggregates opinions about products from product review sites, forums and discussion boards, web sites and blogs, and then categorizes those reviews as to what feature or aspect of the product is being reviewed, tagging it accordingly, and then rating the review on as positive or negative.

Take the iPhone for example - each review is tagged with keywords like Design, Usability, Display, Reliability, Noise, Battery, Service, Camera, Keypad, Size, etc. Based on the number of positive reviews for a tag, a rating for that feature is given. Bar charts show green bars for good, yellow for average, or red for bad reviews. And they seem to be pretty accurate, at least for the iPhone - "design" is 5 green bars, "speed" is 3 red bars.

There is even a pie chart that summarizes the views. In the iPhone example, 15% said "I Love," 11% said "Too Expensive," 11% said "Worst." (Note to those who hated your iPhones: please send them this way.)

Product images display on the left and the reviews themselves, linked to the original source, display on the right. The reviews can also be sorted to display the best reviews, the worst, or the most relevant. Beneath the sorting options, the number of reviews display.

iPhone Results in Swotti

What's interesting is that this data seems to have been collected, tagged, and rated using only Swotti's technology. This isn't Mahalo - no user-intervention here - it's all automated.

One problem with the site seems to the be with the English spellings of things and wording, like "Adjective" was spelled "Adjetive." Since the site is also offered in Spanish, its likely that the Spanish version was created first and this is an English translation. However, this is only a minor drawback.

Whether it gets it right all the time - that's the real issue. The problems lies in similarly named products, obviously something that is still being sorted out. For example, a search for the Lenovo x300 also returned results for the Dell Latitude x300. I couldn't filter out the Dell results by using -dell in my query a la Google, as that returned a "No enough opinions" result (Yep, that's the English again).

Clicking on "Are you unsatisfied with your results? Help us" gave me a Spanish entry form which returned a bunch of code when I submitted my comments...although at the bottom it did say "Gracias por haber dado tu opinion," so maybe it went through anyway.

Altough these issues would have to be worked out for the site to became mainstream, it doesn't deduct from Swotti's potential - Swotti is reading, categorizing, and rating data from the web on its own. A great concept which hopefully will get better with time. Definitely worth watching.

]]>Discuss]]>
http://www.readwriteweb.com/archives/swotti_a_semantic_opinions_aggregator.php http://www.readwriteweb.com/archives/swotti_a_semantic_opinions_aggregator.php Products Fri, 21 Mar 2008 10:08:31 -0800 Sarah Perez
Hakia Licenses its Semantic Search Technology Semantic search engine hakia is announcing today, at the Search Engine Strategies conference in New York, that it is licensing its proprietary OntoSem technology to other companies. This will enable third parties to build semantic search applications. The first such customer to be made public is RiverGlass, Inc, a provider of real-time analytics. RiverGlass will integrate hakia's OntoSem technology into its analysis software.

]]>Sponsor

]]> This is an interesting development by hakia - and has some parallels to the young Google, which you'll recall started out by licensing its search technology to the likes of Yahoo. But the parallels end there, because this move by hakia is more about licensing their underlying search technology to power the proprietary applications of other companies - whereas Google was a branded search app integrated into Yahoo's front-end.

According to hakia, this is what their OntoSem technology does:

  • information retrieval, analysis, and distribution
  • text summarization
  • information assurance and security
  • machine translation
  • ontology support
  • terminology standardization
  • supply chain automation

Essentially, it will enable third parties to find and use "the meaning of language" in their applications. Hakia's definition of 'semantic search' by the way differs from the traditional Semantic Web definition, in that hakia search aims to automatically determine meaning from search queries using its algorithms - whereas Semantic Web is all about adding metadata to information to enable connections between data.

At this early stage there aren't any visuals from RiverGlass showing how they're using hakia technology, but the company told us that "we will see the biggest boon in increased relevancy of results".

Disclosure: hakia is a RWW sponsor

]]>Discuss]]>
http://www.readwriteweb.com/archives/hakia_licenses_semantic_search.php http://www.readwriteweb.com/archives/hakia_licenses_semantic_search.php Alt Search Engines Tue, 18 Mar 2008 08:00:00 -0800 Richard MacManus