semantics - ReadWriteWeb http://www.readwriteweb.com/feeds/search/semantics en Copyright 2009 Richard MacManus readwriteweb@gmail.com Mon, 23 Nov 2009 21:12:49 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Tim Berners-Lee Says the Time for the Semantic Web is Now In an hour long interview posted today about the Semantic Web, W3C Director Tim Berners-Lee says all the pieces are in place to move full steam ahead and realize the potential of a world of structured, machine readable data. Available as a part of the Talking with Talis semantic web podcast series, the interview (listen here) is summarized on interviewer Paul Miller's new ZDNet blog dedicated to the semantic web. A full transcript is available here.

It's an important conversation and a good introduction to what the semantic web is. Also notable is the way that Berners-Lee sees Semantics and Data Portability as very related. Some highlights are excerpted below.

]]>Sponsor

]]> My standard explanation of the value of the Semantic Web is this:
Once our software is capable of deriving meaning from web pages it looks at for us, then there's a whole lot of work that will already be done, allowing our human, creative minds to reach new heights.

In the interview with Miller, however, Berners-Lee emphasized that it's not just about web pages. He told Miller that that the core pieces are in place today for developers to build robust Semantic Web applications;

“I think… we’ve got all the pieces to be able to go ahead and do pretty much everything… [Y]ou should be able to implement a huge amount of the dream, we should be able to get huge benefits from interoperability using what we’ve got. So, people are realizing it’s time to just go do it.”

On the topic of challenges still faced, Berners-Lee said:

“There’s an awful lot of data out there. And I think, one of the huge misunderstandings about the Semantic Web is, ‘oh, the Semantic Web is going to involve us all going to our HTML pages and marking them up to put semantics in them.’ Now, there’s an important thread there, but to my mind, it’s actually a very minor part of it. Because I’m not going to hold my breath while other people put semantics in by hand… So, where is the data going to come from? It’s already there. It’s in databases…”

Other topics of the interview include whether leading social networks are likely to implement semantic web technologies, how semweb engagement benefits companies and what users can do to move the technology forward.

We've cover the Semantic Web extensively here at RWW. See below for a list of posts on the topic.

]]>Discuss]]>
http://www.readwriteweb.com/archives/tbl_calls_for_semweb.php http://www.readwriteweb.com/archives/tbl_calls_for_semweb.php Semantic Web Wed, 27 Feb 2008 10:50:33 -0800 Marshall Kirkpatrick
2007 Semantic Technology Conference - Showcased Big Internet Potential The Semantic Conference was held last week in San Jose. I went along to check it out. In the keynotes, Oracle's Robert Shimp noted that the attendance has never been so great and remarked that this as a clear proof that the semantics industry is growing at a tremendous pace. Half of the attendees were from high-tech startups that we've never (or barely) heard of. But we will probably hear of them in a few years! The other half were from huge organizations like NASA, Department of Defense, US Air Force, Stanford University, Lockheed Martin, Boeing, Ford Motors, Microsoft, IBM, Oracle, Sun Microsystems, Google and Walmart. In other words, the audience was very diverse.

]]>Sponsor

]]> The first day of the event was full of tutorials and therefore the academic level was very high. The most widely discussed topic was OWL (Web Ontology Language), which was accepted as a web standard by W3C about 3 years ago.

The second day started with the keynotes. Oracle, the title sponsor of the event, underlined their strong commitment to semantics and their unique support of RDF in Oracle 10g database systems. NASA Headquarters CTO Andrew Schain started his speech by noting their internal data problems, such as high volume of unorganized, distributed data silos. He finished with their semantics-based solution to this - the beta stage RDF data browser Jspace, which is expected to be released as a stable open source version very soon.

Yahoo's Dave Beckett talked about the early stage semantic web practices at Yahoo - products such as Movies, Finance, Food and Underground.

CNET and TextDigger

Most of the sessions were showcases of semantic technologies on various topics - health sciences, web search, criminal justice, legal publishing, and more. TextDigger, a semantic web search company, talked about their CNET case study. In order to further increase their pageviews and ad revenues, CNET collaborated with their spin-off company TextDigger to overhaul their search recommendation system; they combined their archaic collaborative filtering based system (which depends on wisdom of crowd and statistical methods) with a new semantic equivalence based one and tried to compare the response of their users. The results were very promising for TextDigger and the whole semantics space - from 700,000 unique queries per day, the percentage of clickthroughs increased from 7.9% to 19.1%; and the pageviews were at a level they had only previously reached at Christmas times. As a result, CNET decided to stick with this semantics-based solution for their search recommendations.

AskMeNow and Convera

Another semantic search upstart (but a public one, with a $20M market capitalization already) to show their wares was AskMeNow. They showcased their semantic search capabilities integrated with mobile solutions, claiming that they have over a million active users asking questions on the move.

Convera was another semantic search company at the event. It focuses solely on custom tailored, domain specific enterprise search solutions.

Other Highlights

Exhibitors at the conference included a wide variety of companies. Visual Knowledge is perhaps the most interesting one, because they showcased the very first semantics powered video game concept, Treasure Hunt. Saltlux which comes from Korea, showcased how semantics and mobile 3G technologies can work together to bring location aware services to your mobile; hopefully we will see similar innovations in the USA too in the near future. Syntactica's Intelligent Lexicon product allows you to abstract (i.e. summarize in a few paragraphs) a long dissertation of hundreds of pages. Government funded Cycorp showcased their terrorism data mining product which can successfully answer extremely complex domain specific questions, like "were there any attacks on targets of symbolic value to Muslims in 1987 that coincided with a Christian holy day".

All in all, the Semantic Conference was a wonderful opportunity to mix with many people trying to solve the problem of information organization. The Internet is a great utility, but it may one day become much more usable - with breakthroughs in the semantics field that are coming soon to the mainstream. There's a lot of potential in the companies and technologies mentioned in this post!

]]>Discuss]]>
http://www.readwriteweb.com/archives/2007_semantic_technology_conference.php http://www.readwriteweb.com/archives/2007_semantic_technology_conference.php Events Mon, 28 May 2007 01:31:50 -0800 Emre Sokullu
i360 Adds Semantics to Everything Tony Sukiennik believes the power of the people trumps the power of the algorithm when it comes to the development of semantic technology. His company, infoGenome, a startup that has been in stealth mode for about four and half years, wants to harness that power by making semantics easy via its innovative drag-and-drop functionality. The i360 software he's developed is essentially the "Mahalo of semantic apps," relying on human knowledge to add meaningful layers of metadata to the information we work with every day. With i360, you can add semantics to everything.

]]>Sponsor

]]> People-Powered Semantics

When you're doing a web search, you instantly know what information is relevant and which isn't. At i360, they call this flash of understanding an "instant of information insight." In a split second you can identify something as being useful, but the problem in today's world is that there are too many ways to store that information - you can tag it, bookmark it, save it to file, email it, blog about it, share it with others, and so on. Overwhelmed by choices, busy people often choose to "just remember it," a decision that leads to the inevitable: forgetting. The human mind is already overloaded with input, so isn't the ideal repository for storing all the complexities of our information-filled lives.

Instead, software should be doing the remembering for us. That's where i360 comes in. The application itself is really just a prototype of this conceptual idea, but one that Tony hopes Google might be interested in. Or maybe Microsoft. (He plans on proposing his ideas to both companies to see who bites.)

What the i360 software does is provide a way quickly add mark up and add meaning to the data you're working with - be it a link on the web, an email, a file, or anything - with semantics. This process is done via a quick drag-and-drop into the app.

That isn't to say that this technology is using semantics in the technical sense of the word - it's not about converting everything into machine-readable formats for use on the semantic web; what it is doing, though, is adding semantics to everything by assigning meaning to that email, that PDF, that link, that note, that spreadsheet, etc. Meaning that only you, and not a computer or an algorithm, could know. In doing so, the technology is not focused on a semantic web per se, but a semantic database of your own, made up of not only web links, but also files, contacts, emails, keywords, and more, and knowing how they all are associated with each other.

Although Tony believes that we shouldn't give up on the algorithm - by all means, research should continue in that area - he feels strongly that his technology, which taps into the power of the human brain, gives people the ability to organize and assign value to information in a way that a machine cannot.

How It Works

What i360 does is complex and sort of hard to understand if you're not working with it directly. In fact, it's easier to understand if you work backwards from the end result of using the technology.

For example, imagine you do a Google Desktop Search or a Google Enterprise Search, and, instead of just links to items that match keywords, you get something a little more like this:

Augmented Search Results

You can see that by using the software, you've managed to associate people, documents, notes, and more with the original file.

The process of making these associations is via a "fire and move on" drag-and-drop methodology. See a useful link? Drag-and-drop it into i360. Highlight some text and drag and drop that as the item's description. Click a button and a screenshot is added automatically. Now associate that link with a person. That  person with a Word document. That document with a search and an email...and so forth, and so on.

Saving a Web Page

Within a company, the i360 technology can also be used to work with internally running applications, like Microsoft's SharePoint, for example...or any other application to which you have the cooperation of the vendor or access to the app's code base. With 100 lines of code, information from these applications can pass data from the app itself back to the i360 environment as just another informational nugget that can be associated with a person, a file, or anything else.

There's more this application can do, too. For example, searches themselves could begin in a more structured format - focusing on just what you're interested in finding (see example below). Each item you're researching can be available with one click from a sidebar - no saving to del.icio.us required.

Focused Searching

The results of your searches can then be transformed into a new file with links (see below), retaining the same structure of your own headings and listed items, and that file can then be emailed to someone else or published as a page available publicly on the web. If you find something new to add to it, be it another link or a file or anything else, you can just drag-and-drop that new item to i360 to update the results on the fly.

Formatted Results Can Be Shared With Others

A project team in the workplace could use the application together, associating people and emails and files and searches with each other, creating a database of content surrounding their project. A year later, an employee in another department could search via their company's enterprise search and find all the information in that project and how it all interrelates, even if all the original team members had moved on to other jobs in other companies. No more would "everything is stored in that one guy's head" be the norm. Employees could move on, but the data they created or found, and the way that data relates to other data, would remain.

Where It Needs Improvement

As a concept - simple drag-and-drop semantics - the technology is fascinating. In practice though, it's still very rough. You couldn't install i360 and be off and running in minutes - you would still need training to know how to use it as it exists in its present form. It today's world of bubbly web apps, anything that isn't immediately intuitive isn't going to be adopted by the majority of users. The whole Enterprise 2.0 trend is about bringing the simplicity of consumer applications into the corporate world, and, although that is this software's goal, unfortunately, I can't say that it achieves it.

The UI itself is confusing. They've made some interesting choices - the address bar is at the bottom, for example; buttons are labeled with things like "E+" - a reference to the name of a portion of the software suite, but one that is meaningless to the new user. The graphics and fonts used look ancient.

The UI

Conclusion

However, that being said, if you can look past the UI to the underlying idea, there's something about this concept - human-powered semantics - semantics over everything - that could be great, if someone could just make it pretty. It could even be the future.

]]>Discuss]]>
http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php Products Mon, 05 May 2008 12:55:27 -0800 Sarah Perez
Spock - Vertical Search Done Right There has been quite a lot of buzz lately around a vertical search engine for people, called Spock. While still in private beta, the engine has already impressed users with its rich feature set and social aspects. Yet, there is something that has gone almost unnoticed - Spock is one of the best vertical semantic search engines built so far. There are four things that makes their approach special:

  • The person-centric perspective of a query
  • Rich set of attributes that characterize people (geography, birthday, occupation, etc.)
  • Usage of tags as links or relationships between people
  • Self-correcting mechanism via user feedback loop
]]>Sponsor

]]> Spock's focus on people

The only kind of search result that you get from Spock is a list of people; and it interprets any query as if it is about people. So whether you search for democrats or ruby on rails or new york, the results will be lists of people associated with the query. In that sense, the algorithm is probably a flavor of the page rank or frequency analysis algorithm used by Google - but tailored to people.

Rich semantics, tags and relationships

As a vertical engine, Spock knows important attributes that people have. Even in the beta stage, the set is quite rich: name, gender, age, occupation and location just to name a few. Perhaps the most interesting aspect of Spock is its usage of tags. Firstly, all frequent phrases that Spock extracts via its crawler become tags. In addition, users can also add tags. So Spock leverages a combination of automated tags and people power for tagging.

A special kind of tag in Spock is called 'relationships' - and it's the secret sauce that glues people together. For example, Chelsea is related to Clinton because she is his daughter, but Bush is related to Clinton because he is the successor to the title of President. The key thing here is that relationships are explicit in Spock. These relationships taken together weave a complex web of connections between people that is completely realistic. Spock gives us a glimpse of how semantics emerge out of the simple mechanism of tagging.

Feedback loops

The voting aspect of Spock also harnesses the power of automation and people. It is a simple, yet very interesting way to get feedback into the system. Spock is experimenting with letting people vote on the existing "facts" (tags/relationships) and it re-arranges information to reflect the votes. To be fair, the system is not yet tuned to do this correctly all the time - it's hard to know right from wrong. However, it is clear that a flavor of this approach in the near future will 'teach' computers what the right answer is.

Limitations of Spock's approach

The techniques that we've discussed are very impressive, but they have limitations. The main problem is that Spock is likely to have much more complete information about celebrities and well known people than about ordinary people. The reason for it is the amount of data. More people are going to be tagging and voting on the president of the United States than on ordinary people. Unless of course, Spock breaks out and becomes so viral that a lot of local communities form - much like on Facebook. While it's possible, at this point it does not seem to likely. But even if Spock just becomes a search engine that works best for famous people, it is still very useful and powerful.

Conclusion

Spock is fascinating because of its focus and leverage of semantics. Using tags as relationships and the feedback loop strike me as having great potential to grow a learning system organically, in the matter that learning systems evolve in nature. Most importantly, it is pragmatic and instantly useful.

]]>Discuss]]>
http://www.readwriteweb.com/archives/spock_vertical_search_done_right.php http://www.readwriteweb.com/archives/spock_vertical_search_done_right.php Startups Tue, 26 Jun 2007 06:10:00 -0800 Alex Iskold
Everything You Wanted to Know About Semantic Technology, But Were Afraid to Ask (at SemTech 09) Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.

Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.

]]>Sponsor

]]>

Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:

Structured Data

Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.

Morphology

If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax

A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.

Statistics

An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.

Scalability

Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.

These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.

]]>Discuss]]>
http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php Sponsors Fri, 26 Jun 2009 05:00:18 -0800 RWW Sponsor
Powerset and hakia - Quest For The Semantic Web This week I spoke with Barney Pell, CEO of Powerset; and Melek Pulatkonak, COO of hakia. In both (separate) conversations we discussed how the Semantic Web is getting very close. The Semantic Web as defined by Tim Berners-Lee is: "a universal platform for the exchange of data, information and knowledge." I think Barney and Melek would agree, that the only thing preventing the Semantic Web so far has been an inefficient use of horsepower - or a lack of it.

]]>Sponsor

]]> Speed, Power and Getting There

Semantics is expressed meaning in language, code or "other" representations of information. My discussions with Barney and Melek revealed the fundamental differences in architecture and philosophy between hakia and Powerset. The index systems of the two companies are fundamentally different, as is their philosophy - but their goals and visions are remarkably similar. They are also different in the way they apply what I term horsepower to natural language search. Like the symbolism of Shelby vs. Ferrari,– it is possible for different approaches to achieve a desired result - given enough horsepower.

Hakia has built their search in-house, refining and sculpting the QDex indexing system (like an Enzo Ferrari). Their view is that processing power should be maximized with super efficiency, via fuzzy logic and advanced semantics. Powerset, on the other hand, utilizes basically the same inverted indexing system as Google - but backed by natural language and immensely powerful processing that essentially “overpowers” the long tail query (like the GT 500). This is a vast oversimplification, but the elements involved reveal the larger story.

Technology (horsepower), communication (language) and people make up the semantic Web. The Web has not been lacking "language", but the adequate application of processing power. As Barney said: "Even five years ago we did not have the processing capability to even attempt this, but five years from now these answers will seem elementary." Google's system below, currently consumes massive horsepower with comparatively limited results - at least according to hakia and Powerset!

Diagram of Google's inverted index and search (courtesy -changturtle)

Unbending Humans

Barney described the relationship between people and computers as people being "bent" around or adapted to technology in order to utilize it. With the advent of services like Facebook, programs and applications are beginning to “understand” each other. Everyone reading this has been “forced” by technology to conform to varied “bending events”, in order to use it. Barney explained this idea by calling Facebook and the iPhone true innovations approaching total “community engagement.” Barney also said that “Facebook will become one of the primary communications platforms of the future.” Given this new perspective, I could not agree more because Facebook is one heck of a representation of information for a social network. Essentially, hakia, Powerset, Facebook and others are bending the machines to engage humans. And in a way, Facebook is the semantic Web in a microcosm - but in it's infancy.

Semantics and Search?

Search is a critical part of our daily lives, but the interface has changed very little over the years. We define search as the act of typing in a query on Google and getting results. This is a type of search, but how many other kinds of “searches” do we perform? In an earlier article, Josh Catone wrote about Yahoo!’s contention that search will not determine the future of the Web. Josh rightly asked if Facebook and MySpace might be better positioned if “personalization” was to be the future of the Web.

Conclusion

I should make it clear that neither Barney nor Melek really consider themselves as "Google Killers". Powerset and hakia are not in a race either against each other or to overtake Google, but they are on a quest for better Web communication and engagement. Both efforts emphasize the necessity for “the system” to be able to universally understand and handle data without ambiguity. Viewing Facebook and others as functional repositories of semantic data is essential in seeing the long view. Whether we are talking about object oriented data, textual semantics or complex algorithms, the semantic Web is about making people “bend” less for technology.

]]>Discuss]]>
http://www.readwriteweb.com/archives/powerset_and_hakia_quest_for_semantic_web.php http://www.readwriteweb.com/archives/powerset_and_hakia_quest_for_semantic_web.php Analysis Fri, 20 Jul 2007 00:15:56 -0800 Phil Butler
The State of the Market in Semantic Technologies Tom Tague from Thomson Reuters' OpenCalais team did a keynote speech today at SemTech in San Jose. His presentation was a wonderful wrapup of current semantic technology trends, and what we can expect over the next few years.

To open, he said that where we are now in the evolution of the Web is content rich, but information poor - plus "experientially deficient". He suggested that 'web 3.0' is about cleaning up the mess of web 2.0 and improving interfaces. In terms of semantic technology, he explained that over the past 5 years it has evolved from invention of standards to a period of commercial innovation on top of those inventions. While standards are still being worked on, now "we are at an inflection point where innovation is exploding."

]]>Sponsor

]]> Tague called Calais, the project he leads at Thomson Reuters, "a web service a.k.a. plumbing". They've had 13 releases, talked with 100+ customers about Calais, have 13,000 registered developers. He put the ideas that he's been talking about with customers and developers into 6 buckets, which we've listed with sub-categories below.

Tools

  • Semantic data mgmt
  • Semantic data generation
  • Databases
  • Integration and workflow

Tague said that tools are important, particularly in the enterprise. He sounded a note of caution to tools vendors: they need to simplify their stories, along with have "simple basic tools."

Social

  • Semantics-powered link sharing
  • Network mining
  • News sharing
  • Tweet mining

Tague said that we shouldn't focus on providing "frosting" on top of current social Web tools. He advised to focus on commercial imperatives, such as the categories above.

Advertising

  • Semantic ad placement
  • Contextual ad placement
  • Semantically driven landing pages
  • Mashup ads

There are clearly opportunities to improve advertising using semantic technology, said Tague.

Search

Tague noted that semantic search may be "the answer to the question nobody is asking." He said that we should look at general "semantic search" vs domain specific semantically-enhanced search. The latter is where the commercial opportunity actually is, but he questioned the economics of general semantic search.

Publishing

He put this into 3 sub-categories:

  • A-Content Producers - from back office to user experience
  • B-Editorial + Aggregation Publishing Models
  • C-Robotic publishing - aggregation only

Tague explained that Calais has really focused on this over the last 8-9 months. He said that classic publishers can get an enormous amount of value from this. Right now the big focus is "back in the bolier room," for example to cut editors from 3 to 2. He expects that later on more focus will go on enhancing the user experience.

Tague thinks that B is the biggest opportunity, using Huffington Post as an example. He said that it gives a "near newspaper like experience" at perhaps a 5th of the cost. It's an area where they're seeing adoption of Calais.

Interface

Tague noted that gaming is a huge industry that the semantic technology industry can learn from. He listed these attributes:

  • Great story line
  • High interactivity, immediate responsiveness
  • No interuptions
  • Graphically engaging
  • Seamless
  • Fun

So he asked who out there is trying to really change the user experience in semantic technology? He listed 4 companies (all of whom we've profiled on ReadWriteWeb):

  • Zemanta
  • Apture
  • Feedly
  • Glue

Tague told the audience that the next big innovation in interface will be something that stays with the user where they are, which will be mobile and in the browser.

To sum up, Tague suggested that semantic technologies vendors should decide whether they care about semantics or about user value. If it's semantics, then be a tools vendor. He said the basic building blocks are out there already, so focus on user experience.

Disclosure: SemTech has been a recent sponsor of ReadWriteWeb

]]>Discuss]]>
http://www.readwriteweb.com/archives/the_state_of_the_market_in_semantic_technologies.php http://www.readwriteweb.com/archives/the_state_of_the_market_in_semantic_technologies.php Conferences Tue, 16 Jun 2009 09:23:17 -0800 Richard MacManus
blueorganizer: Interview with adaptiveblue founder and CTO Alex Iskold adaptiveblueAlex Iskold was at DEMOfall last week, but not only to live-blog the event and do interviews for Read/WriteWeb :-) He was also promoting his own product blueorganizer, so I thought it's only fair to turn the tables and interview him about DEMO - particularly as blueorganizer was regarded as one of DEMO's highlights by both Techcrunch and ZDNet.

Also adaptiveblue has just released a brand new version of the blueorganizer. New features include the "autobluemark" (which automatically collects objects from the sites that users visit often), blogs collection with popularity ranking built in, smart filtering (which brings iTunes-like flexible selectors to the blueorganizer), a google desktop widget and much more.

Richard: What is your company about?

Alex: adaptiveblue was founded with the vision to build the next generation of smart browsing and personalization technologies. Our first product, the blueorganizer extension, is focused on bringing the semantics of everyday objects into the browser to make users more productive.

Richard: Why did you start this company?

Alex: I have been thinking about personalization and semantics for quite sometime. I saw that there was a gap between theoretical thinking about semantic web and practical steps to get to it and wanted to help bridge it. Ironically my previous startup, Information Laboratory (which was sold to IBM), was focused on the structure of complex systems like software, power grids and society. So I think that understanding of the structure can take you very far, but to build truly personalized online experiences you need to understand the semantics of things.

Richard: Tell us what adaptiveblue has achieved so far?

Alex: We have developed and launched our product in record time - just short of 5 months. We also created innovative and important pieces of infrastructure for blueorganizer. We leveraged XML and JavaScript to roll out new collections and actions in a very short time, without having to do JavaScript coding. Finally, we just had an amazing launch at DEMOfall. It has been a great success and we are very pleased.

blueorganizer

Richard: What are your major challenges?

Alex: There are a couple major challenges. Number one is building the user base - standing out from the crowd. DEMOfall helped us address that in an excellent way. Another challenge is expanding and growing in the right way. We are here to build products that people use without expanding to be a 30 people company. Our challenge is to scale and we are going to address it by being smart about our software infrastructure and resources.

Richard: What are you going to build in the next 12 months?

Alex: We are going to add more collections like images, video and people. Expect support for microformats and more smart browsing stuff. We are also planning to start work and roll out some backend personalization technologies. But we can't talk about them yet :)

Richard: What is the most important thing for a start up to be successful?

Alex: Passion, closely followed by people, focus and agility.

Richard: What web sites / blogs do you use / read often?

Alex: Techcrunch, Read/Write Web, Peter Rip's blog, Headrush. Use Basecamp from 37signals a lot, and cvsdude to store our code.

Richard: Which 'web 2.0' things are noise and which are signals?

Alex: Signals are true innovations, noise are clones.

Richard: How did you find DEMOFall?

Alex: We found this show fantastic! The energy and the crowds were just amazing, We got so much out of it and were very well received. We highly recommend the show to all companies that are launching new products.

Disclaimer: not only is Alex a regular R/WW contributer, but blueorganizer is a sponsor too.

]]>Sponsor

]]>
http://www.readwriteweb.com/archives/blueorganizer_interview.php http://www.readwriteweb.com/archives/blueorganizer_interview.php DEMOfall 2006 Mon, 02 Oct 2006 18:11:31 -0800 Richard MacManus
Top-Down: A New Approach to the Semantic Web Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is "understandable" by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.

One of the technical difficulties that we outlined was the bottom-up nature of the classic semantic web approach. Specifically, each web site needs to annotate information in RDF, OWL, etc. in order for computers to be able to "understand" it.

As things stand today, there is little reason for web site owners to do that. The tools that would leverage the annotated information do not exist and there has not been any clearly articulated business and consumer value. Which means that there is no incentive for the sites to invest money into being compatible with the semantic web of the future.

]]>Sponsor

]]>

But there are alternative approaches. We will argue that a more pragmatic, top-down approach to the semantic web not only makes sense, but is already well on the way toward becoming a reality. Many companies have been leveraging existing, unstructured information to build vertical, semantic services. Unlike the original vision, which is rather academic, these emergent solutions are driven by business and market potential.

In this post, we will look at the solution that we call the top-down approach to the semantic web, because instead of requiring developers to change or augment the web, this approach leverages and builds on top of current web as-is.

Why Do We Need The Semantic Web?

The complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic. The simple question: Why do we need computers to understand semantics? remains largely unanswered.

While some of us think that building AI is cool, the majority of people think that AI is a little bit silly, or perhaps even unsettling. And they are right. AI for the sake of AI does not make any sense. If we are talking about building intelligent machines, and if we need to spend money and energy annotating all the information in the world for them, then there needs to be a very clear benefit.

Stated the way it is, the semantic web becomes a vision in search of a reason. What if the problem was restated from the consumer point of view? Here is what we are really looking forward to with the semantic web:

  • Spend less time searching
  • Spend less time looking at things that do not matter
  • Spend less time explaining what we want to computers

A consumer focus and clear benefit for businesses needs to be there in order for the semantic web vision to be embraced by the marketplace.

What If The Problem Is Not That Hard?

If all we are trying to do is to help people improve their online experiences, perhaps the full "understanding" of semantics by computers is not even necessary. The best online search tool today is Google, which is an algorithm based, essentially, on statistical frequency analysis and not semantics. Solutions that attempt to improve Google by focusing on generalized semantics have so far not been finding it easy to do so.

The truth is that the understanding of natural language by computers is a really hard problem. We have the language ingrained in our genes. We learn language as we grow up. We learn things iteratively. We have the chance to clarify things when we do not understand them. None of this is easily replicated with computers.

But what if it is not even necessary to build the first generation of semantic tools? What if instead of trying to teach computers natural language, we hard-wired into computers the concepts of everyday things like books, music, movies, restaurants, stocks and even people. Would that help us be more productive and find things faster?

Simple Semantics: Nouns And Verbs

When we think about a book we think about handful of things - title and author, maybe genre and the year it was published. Typically, though, we could care less about the publisher, edition and number of pages. Similarly, recipes provoke thoughts about cuisine and ingredients, while movies make us think about the plot, director, and stars.

When we think of people, we also think about a handful of things: birthday, where do they live, how we're related to them, etc. The profiles found on popular social networks are great examples of simple semantics based around people:

Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

The Top-Down Semantic Web Service

The essence of a top-down semantic web service is simple - leverage existing web information, apply specific, vertical semantic knowledge and then redeliver the results via a consumer-centric application. Consider the vertical search engine Spock, which scans the web for information about people. It knows how to recognize names in HTML pages and it also looks for common information about people that all people have - birthdays, locations, marital status, etc. In addition, Spock "understands" that people relate to each other. If you look up Bush, then Clinton will show up as a predecessor. If you look up Steve Jobs, then Bill Gates will come up as a rival.

In other words, Spock takes simple, everyday semantics about people and applies it to the information that already exists online. The result? A unique and useful vertical search engine for people. Further, note that Spock does not require the information to be re-annotated in RDF and OWL. Instead, the company builds adapters that use heuristics to get the data. The engine does not actually have full understanding of semantics about people, however. For example, it does not know that people like different kinds of ice cream, but it doesn't need to. The point is that by focusing on a simple semantics, Spock is able to deliver a useful end-user service.

Another, much simpler, example is the Map+ add-on for Firefox. This application recognizes addresses and provides a map popup using Yahoo! Maps. It is the simplicity of this application that precisely conveys the power of simple semantics. The add-on "knows" what addresses look like. Sure, sometimes it makes mistakes, but most of the time it tags addresses in online documents properly. So it leverages existing information and then provides direct end user utility by meshing it up with Yahoo! Maps.

The Challenges Facing The Top-Down Approach

Despite being effective, the somewhat simplistic top-down approach has several problems. First, it is not really the semantic web as it is defined, instead its a group of semantic web services and applications that create utility by leveraging simple semantics. So the proponents of the classic approach would protest and they would be right. Another issue is that these services do not always get semantics right because of ambiguities. Because the recognition is algorithmic and not based on an underlying RDF representation, it is not perfect.

It seems to me that it is better to have simpler solutions that work 90% of the time than complex ones that never arrive. The key questions here are: How exactly are mistakes handled? And, is there a way for the user to correct the problem? The answers will be left up to the individual application. In life we are used to other people being unpredictable, but with computers, at least in theory, we expect things to work the same every time.

Yet another issue is that these simple solutions may not scale well. If the underlying unstructured data changes can the algorithms be changed quickly enough? This is always an issue with things that sit on top of other things without an API. Of course, if more web sites had APIs, as we have previously suggested, the top-down semantic web would be much easier and more certain.

Conclusion

While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise. In the mean time, existing data is being leveraged by applying simple heuristics and making assumptions about particular verticals. What we have dubbed top-down semantic web applications have been appearing online and improving end user experiences by leveraging semantics to deliver real, tangible services.

Will the bottom-up semantic web ever happen? Possibly. But, at the moment the precise path to get there is not quite clear. In the mean time, we can all enjoy better online experience and get to where we need to go faster thanks to simple top-down semantic web services.

]]>Discuss]]>
http://www.readwriteweb.com/archives/the_top-down_semantic_web.php http://www.readwriteweb.com/archives/the_top-down_semantic_web.php Analysis Thu, 20 Sep 2007 16:22:36 -0800 Alex Iskold
Semantic Wave 2008 - Free Summary Report for RWW Readers Project10X has just released a 400-page study of semantic technologies and their market impact, entitled Semantic Wave 2008: Industry Roadmap to Web 3.0 and Multibillion Dollar Market Opportunities. The report discusses the emergence of semantic technologies for consumer and enterprise applications, and the evolution from Web 2.0 to the so-called "Web 3.0".

A free 27-page summary of Project10X’s Semantic Wave 2008 Report has been made available to ReadWriteWeb readers.

]]>Sponsor

]]> You need to provide your name, email address and answer a few non-invasive questions, but the summary report is well worth it.

The report defines Web 3.0 as "about representing meanings, connecting knowledge, and putting these to work in ways that make our experience of internet more relevant, useful, and enjoyable." In other words, it's the Semantic Web. I'm not a big fan of the Web 3.0 moniker, but I do agree that we've entered an era where Semantic technologies will enhance and extend the current Social Web era. We've written a lot about this on ReadWriteWeb - check out Alex Iskold's Semantic Web: What Is The Killer App? for a recent example.

The report also defines a "Web 4.0", as follows: "Web 4.0 will come later. It is about connecting intelligences in a ubiquitous Web where both people and things reason and communicate together." The following diagram is a good overview of the concepts tying these Web versions together:

Note: I think the number in the top left is supposed to be a 3.

The report correctly points out that the new era of Semantic Apps isn't restricted to traditional W3C technologies. It states that "as a platform, Web 3.0 will embrace all semantic technologies and open standards that can be applied on top of the current Web. It is not restricted just to current Semantic Web standards."

There's some useful discussion on the type of products we can expect in this Semantic Web. For example, on web browsers: "Web 3.0 browsers will understand semantics of data, will broker information, and automatically interpret metadata." We've discussed before on this blog how Firefox 3 will act as an information broker, through the use of microformats and other technologies.

The report also outlines some intriguing future trends, for example on identity: "The trend is towards semantic avatars that enable individuals to manage and control their personal information, where ever it is across the net."

Another interesting trend is "collective knowledge systems", where users "collaborate to add content, semantics, models, and behaviors, and where systems learn and get better with use." Twine and Freebase are two apps that spring to mind here. See ReadWriteWeb's 10 Semantic Apps to Watch for more on this.

Check out the summary report for more, it's an excellent primer on these topics. Thanks to Mills Davis, founder and managing director of Project10X, for forwarding it to us. The full report features 150 case studies in 14 industry sectors, so this is a comprehensive study of the emerging Semantic Web.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_wave_2008_free_report.php http://www.readwriteweb.com/archives/semantic_wave_2008_free_report.php Trends Thu, 17 Jan 2008 21:40:56 -0800 Richard MacManus
Retrevo - What Vertical Search Will Become Written by Alex Iskold and edited by Richard MacManus.

During DEMOfall 06, we wrote about about the Retrevo vertical search engine for electronics. Retrevo is one of the more advanced vertical search engines, because it uses sophisticated mining and crawling technologies. In terms of UI, their aim is to blur the boundaries between a search engine and a portal. Instead of just giving you a list of search results, Retrevo creates the feel of a specialized application which has a semantic understanding of the electronics space.

Retrevo is a vertical search engine, so initially what you get is a search box. You can search for anything electronics related, for example Sony Digital Camera. The result is a portal-style page like this:


Retrevo results page

]]>Sponsor

]]> Instead of a single results list that you would get from a traditional search engine like Google, Retrevo automatically groups the results into major categories: Most Popular Results, Manufacturer Info, Reviews and Articles, Forums and Blogs, Daily Deals, and Stores. These groupings are both intelligent and useful, regardless of what you are trying to do. In addition, Retrevo provides a handy preview pane that allows you to see each result without navigating to its page.

Closer look at how the results are organized

If you are shopping for an electronics item, you will find the Reviews and Articles section helpful. This section brings together gadget reviews from places like CNET. If you already own the item, then use the Manufacturer Info section to get access to product manuals. Retrevo told us they have a special crawler focused on enabling just this feature.  

Another handy section is Forums and Blogs. You probably will visit it once you need to troubleshoot something - i.e. read up on what other people did when they had the same issues.

In addition to these features, Retrevo recently launched the Daily Deals tab. You can find electronics deals a dime a dozen these days. But the one that comes with Retrevo seems different, because it is context sensitive and fresh. In general, deals are annoying because they blink in your face with things you are not interested in. But if you are shopping for Sony Digital Camera and there is a good deal for that particular brand and item type available, it makes sense to check it out.


Daily deals

Summary

Relevancy, semantics and presentation are the things that make Retrevo compelling. Could you do the same (re)search for Sony Digital Camera on Google? Absolutely. But it will take you much longer and you are not going to enjoy the experience. Retrevo proves the case for vertical search in a simple and effective way. 

We anticipate that more vertical search engines will excel in relevancy, semantics and presentation in the coming year. But tell us if you agree - take a look at Retrevo and give us your feedback. Also, what vertical search engines (if any) do you use currently?

]]>Discuss]]>
http://www.readwriteweb.com/archives/retrevo_vertical_search.php http://www.readwriteweb.com/archives/retrevo_vertical_search.php Search Services Tue, 05 Dec 2006 01:00:12 -0800 Alex Iskold
The Road to the Semantic Web Written by Alex Iskold and edited by Richard MacManus.

John Markoff's recent article in NY Times has generated an interesting discussion about Web 3.0 being the long-promised Semantic Web. For instance, a short post on Fred Wilson's blog had a lot of lengthy comments attempting to define Web 1.0, Web 2.0 and Web 3.0. Some people think that the Semantic Web is about AI, some claim that it is more about semantics, while others say that it is about data annotation. All agree however, that we will all be wonderfully more productive and simply happier when it arrives. Lets take a look at the ingredients, definitions and approaches to the Semantic Web so that we can recognize it when it is finally here.

What is the Semantic Web?

The Wikipedia defines the Semantic Web as a project that intends to create a universal medium for information exchange by putting documents with computer-processable meaning (semantics) on the World Wide Web. The core idea is to create the meta data describing the data, which will enable computers to process the meaning of things. Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems. For example, as John Markoff describes in his article, a computer will be able to instantly return relevant search results if you tell it to find a vacation on a 3K budget.

In order for computers to be able to solve problems like this one, the information on the web needs to be annotated with descriptions and relationships. Basic examples of semantics consist of categorizing an object and its attributes. For example, books fall into a Books category where each object has attributes such as the author, the number of pages and the publication date. The basic example of a relationship comes from various social networks that we are part of. In one network the relationship might be a friend of, in another a family member and in another works with.

]]>Sponsor

]]> RDF, OWL and the mathematical approach to annotation

There are billions of fairly unstructured HTML pages which contain no annotations and meta data. The fundamental engineering question is how can we go from today's unstructured web to one rich with semantical information? W3C consortium authored specs for RDF (Resource Description Framework) and OWL (Web Ontology Languages) attempt to enable the collective capture and description of information, along with the ontology and the relationships with other pieces of information, in a rigorous, mathematical way.

RDF is an XML-based language which enables description of relationships via predicates. The Wikipedia explains: The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".

OWL is another XML-based language used for describing and reasoning ontologies. In a nutshell, OWL facilitates semantic descriptions such as Dog is an animal or Dog has four legs. There are three flavors of OWL: OWL Lite, OWL DL and OWL Full - each flavor capturing a different side of a trade off between expressiveness and computability. This RDF/OWL framework is comprehensive, but is difficult for people without a background in mathematics and computer science to understand. Given that this is a bottom up approach, it is clear that if it is to succeed, there needs to exist an automated mechanism that takes existing HTML content and turns it into RDF and OWL meta data. This, however, is a chicken-egg problem because if we could already do this, the problem would not be there to begin with. Still we can envision tooling which does 80% of the work automatically and then interacts with the person to complete the other 20% of the work.

Microformats

Recognizing the complexity of RDF and OWL, a group of people are trying a different approach called Microformats. The goal of microformats is to embed the basic semantics right into HTML pages. It is not as expressive right now as RDF and OWL, but it is very compact and uses available XHTML facilities to add semantics to the pages. For example, there is a microformat for describing contact information called hCard. Using hCard it is possible to annotate the HTML so that a microformat-aware browser or a search engine can deduce the information about a person such as first and last name, a company or a phone number. Another mature microformat called hCalendar enables page authors to describe events. Many popular event sites, such as Facebook and Yahoo! Local use this format to annotate events in their HTML pages.

Leaving the aesthetics of the representation aside, the microformats approach is clearly simpler than RDF and OWL. And even though it is less powerful, it is becoming very popular. Many site authors are starting to embed microformats into their HTML pages. We are also seeing some early examples of search engines based on microformats, like this one from Technorati. The simple gain in using microformats and doing search is removing ambiguity. In a way, it is similar to the vertical search engine - which knows which vertical you are searching. With microformats inside the pages, the data is also no longer ambiguous, so the search results are more precise.

Still, there are some issues with microformats. The first one is the same as with the previous bottom up approach - people have to do the work to annotate the pages. The good news is that since the format is simpler, more can be done via reverse engineering and automation. The second issue is that the current set of microformats does not cover many things that we encounter online. For example, we are not aware of a format that would help represent a book or a movie. Many more formats need to be created before they can really "cover" the web.

Semantic Web is Personalized Web

The problem of annotating data is very complex and is far from being solved completely. But let’s leave it aside for a moment and think of what we can be doing once all the data becomes annotated. The promise is that we will be doing less of what we are doing now - namely sifting through piles of irrelevant information. Given that the amount of information is growing exponentially and our tolerance is shrinking, this is a very intriguing proposition. If the computer can return relevant results instantly, we can potentially save a ton of time.

But having semantics and knowing all relationships between the data is not enough to do that. Take the simple example of a travel agency. When you show up there for the first time, the agent does not know what to offer you, even though she knows the semantics of travel, the relationships between things and the prices of everything. In order to be effective, she needs to know where you've been already and what kind of destinations you like. That’s why she asks you questions. All services that we receive work this way and the results are better and more refined over time, because service people have time to learn what you like.

So the second important ingredient of the Semantic Web, the one that will facilitate productivity, is a set of persistent personal preferences. Once the computer knows your preferences and has a semantical representation of it online, it can then run an algorithm to deliver you precise, personalized results. To put it differently, your personal preferences is the filter that needs to be applied to the results that the computer returns in response to: Find a vacation for under 3K. And when this happens, then we can claim that the Semantic Web has arrived.

Conclusion

So will the 'Web 3.0' be the Semantic Web? Probably. But are we there yet? Not quite. It will take some time to annotate the world's information and then to capture personal information in the right way, to enable the kinds of applications that we have discussed. We are certainly getting close and it will be interesting to see how things unfold over the next few years.

Incidentally, if you would like us to write more about the Semantic Web please let us know and we will do follow up posts.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_web_road.php http://www.readwriteweb.com/archives/semantic_web_road.php Web Theory Tue, 14 Nov 2006 13:26:50 -0800 Alex Iskold
The Semantic Desktop? SDS Brings Semantics To Excel When you hear the word "semantic" you likely think of the semantic web - the supposed next iteration of the World Wide Web that features structured data and specific protocols that aim to bring about an "intelligent" web. But the concept of semantics doesn't necessarily apply just to the web - it can apply to other things as well, like your desktop...or even your Excel spreadsheets, according to Ian Goldsmid, founder of Semantic Business Intelligence, whose new app, SDS, brings a semantic system to spreadsheets.

]]>Sponsor

]]> Semantic Spreadsheets

The problem with spreadsheets that their system is trying to address has to do with those who need to derive data from multiple spreadsheets (two or more). Although it's easy enough to perform sorts, build macros, and create formulas within one spreadsheet, when needing to compare values in multiple spreadsheets the process becomes more difficult.

The company's app, The Semantic Discovery System for Excel, or just SDS for short, will look for similar columns or rows between the sheets and then "semantically" connects them. They don't appear to just be throwing that term around either - the app uses the same W3C Semantic Web technologies (RDF, OWL, SPARQL) to help you capture "meaning, intelligence, and knowledge" from the data saved in your spreadsheets.

Do We Need Semantic Desktop Apps?

Does SDS solve a business problem that is not yet being addressed through current technologies? In my experience, the short answer to this question is "no." (But wait, there's more...)

Typically, when a business has need of comparing and analyzing large amounts of data, the solution is to turn to a database product that can then be queried and from which custom reports can be pulled. And a business doesn't need to spend a lot of money on a robust solution to do so - even a smaller business can create a database by using inexpensive desktop software.

However, the difference between using a database technology and "semantically connecting" some spreadsheets comes down to for whom this product is being built. In the past, databases and other business intelligence apps were built as if the creators knew that the only person using them would be an I.T. guy or gal. SDS, instead, aims to satisfy the needs of the non-technical end user.

Is this another example of tech populism at work? It certainly looks like it. Yet, in this case their market is small - a non-technical user who's also a power user with Excel? There's usually some overlap there. Not to mention, by the time you've achieved "power user" status, you've often also figured out how to do more complicated things in Excel...like, say, formulas that work across spreadsheets, for example - the very pain points this app is trying to address.

Still, it's an interesting concept to think of taking the semantic web capabilities and integrating them into everyday programs to add a layer of intelligence to these programs as well. Done correctly, it could improve the capabilities of our favorite software apps without making the programs overly complex, which is what typically happens when you add more features.

What do you think? Is the Semantic Desktop (that is, semantically-enabled desktop apps) right around the corner? Or is this product and those like it too niche to find an audience? Let us know what you think in the comments.

]]>Discuss]]>
http://www.readwriteweb.com/archives/the_semantic_desktop_sds_brings_semantics_to_excel.php http://www.readwriteweb.com/archives/the_semantic_desktop_sds_brings_semantics_to_excel.php Products Wed, 13 Aug 2008 06:30:00 -0800 Sarah Perez
Do Semantic Search Companies Need a Semantic Map? It's All Semantics... This week we reported that Cognition had announced "the largest commercially available Semantic Map of the English language." In our interview with Cognition CEO Scott Janus, we asked him to compare Cognition's technologies to those of other semantic search companies Hakia and Powerset. Janus pointed to their large Semantic Map as the main differentiator. Indeed he told us that semantic search companies "must include a comprehensive semantic map" to be successful.

Is this true? We sought a response from both Hakia and Microsoft-owned Powerset on this semantically charged question.

]]>Sponsor

]]> Cognition claims that its Semantic Map has over 10 million semantic connections, including "over 4 million semantic contexts (word meanings that create contexts for specific meanings of other related words)".

Hakia CEO Riza C. Berkan responded in the comments to the original article that "hakia is deploying Ontological Semantics (OntoSem)", which he described as "a network of concepts reflecting ontology." He went on to say that hakia covers "over [a] million words in English".

However Berkan noted that the size of a Semantic Map does not necessarily matter: "the sheer size of the collection of words or concepts does not represent, by any means, the capability of the system." Hakia's position is that "there is no silver bullet for a semantic solution that will succeed", as long as the system developed is scalable and imposes "minimum reliance on 'words'".

Semantopoly: Advance token to nearest Semantic Context

At this point we were still confused. Cognition uses the term "semantic map" and said it was necessary to have. One of the commenters on the original post agreed with that assumption. Yet Hakia's Riza Berkan didn't use the term "semantic map". So we asked Hakia in a follow-up email, does it or does it not have a semantic map? Dr. Christian Hempelmann, Hakia's Chief Scientific Officer, responded:

"The term sometimes comes up in the context of data integration, but "Semantic map" is not a term used in linguistics. I can only speculate that it is what is commonly called an ontology. To the degree that they let us on about it in the documentation on their website, Cognition operates with only 2 main relations, much like WordNet: hyperonymy/hyponymy (e.g. cat is-a feline is-a mammal; their "taxonomy") and synonymy (e.g., "buy" means almost the same as "purchase"; their "thesaurus"). Furthermore, this map is not independent of English, cannot grow into other languages. hakia, on the other hand, has an ontology with many more relations, effectively raising our "semantic map" to the size of a higher power, and can and is already growing into other languages."

We also tried to get a comment from Powerset, but as of writing we haven't received it.

So, are we all clearer now on what is a Semantic Map, is it needed, and does size matter? Er, it depends. If you think you know the answers, tell us in the comments please!

]]>Discuss]]>
http://www.readwriteweb.com/archives/do_semantic_search_companies_need_a_semantic_map.php http://www.readwriteweb.com/archives/do_semantic_search_companies_need_a_semantic_map.php Analysis Fri, 19 Sep 2008 15:05:28 -0800 Richard MacManus
New Version of BlueOrganizer Launched - Semantic Web In Action? Today AdaptiveBlue released a new version of its BlueOrganizer product, a Firefox extension that aims to provide extra contextual information to you while browsing the Web. Basically after you install BlueOrganizer in your Firefox, it enables you to discover all kinds of relevant content while you're browsing - such as books, music, links, related information, etc. Essentially then, it adds personalization and semantics into the browser (Firefox).

AdaptiveBlue is the company of Alex Iskold, a regular writer on Read/WriteWeb. He is one of the smartest technologists I know and his posts here are a consistent source of inspiration and conversation amongst the R/WW community. Which leads me to say: what's interesting about the new release of BlueOrganizer is that it puts much of Alex's theories about Web 2.0 and the Semantic Web into practice. I'll explain how in this post.

]]>Sponsor

]]> What's New?

First, what has been released today? Here are the highlights of the release, code-named BlueOrganizer Denim:

  • BlueMenu in the Firefox Toolbar - the menu is now accessible from the Firefox toolbar, without the need to right-click;
  • SmartLinks - BlueOrganizer users can now publish SmartLinks that contain contextual shortcuts to related information. See demo here;
  • Re-designed BlueMarks and sidebar, for saving and managing your personal information (see screenshot below);
  • Improved Content Recognition - faster, more precise and works on any page, link and text selection.


BlueMarks and sidebar

Techcrunch has more coverage of the features, but for this post I want to turn now to the Semantic Web elements.

Top Down Semantic Web

Most of the web 2.0 products you see nowadays use bottom-up content organization techniques. For example, del.icio.us and Flickr use tags from users to organize content, and MySpace and YouTube are based on user-generated content that is mostly discovered via a user's social network along with search. But BlueOrganizer is different - it is a "Top-down Semantic Web" approach.

Alex's theory is that the Semantic Web will arrive gradually, using a top-down approach. In the case of BlueOrganizer, what that means is it adds semantics to basic web elements: pages, links, text. With BlueOrganizer, a user can add a description to a web page (they might note down "web 2.0 blog" about readwriteweb.com, for example). BlueOrganizer then uses "a combination of parsers, web services and analysis algorithms" to take that unstructured content and turn it a structured content - such as related websites to readwriteweb.com. This structure is then retained, as the content is used again and again in BlueOrganizer. Alex has a neat phrase to describe this, saying it "empowers people to re-write the web using automation." Here's an illustration of the concept:

Conclusion

All up, BlueOrganizer is an excellent practical example of how semantic web technologies are creeping into Web 2.0. We forecast this in our 2007 predictions post at the end of 2006, which Alex contributed to. Interesting though that the 'official' semantic web efforts, led of course by Sir Tim Berners Lee, still aren't making much headway on the Web. But when you mix in user-generated content, APIs, "parsers, web services and analysis algorithms" and the rest of the techniques that AdaptiveBlue is using, well just maybe Sir Tim will get his Semantic Web after all.... just not perhaps as he envisaged it.

Disclosure: Alex Iskold, CEO of AdaptiveBlue, is a writer for R/WW.

]]>Discuss]]>
http://www.readwriteweb.com/archives/blueorganizer_semantic_web.php http://www.readwriteweb.com/archives/blueorganizer_semantic_web.php Startups Wed, 23 May 2007 01:26:12 -0800 Richard MacManus