semantic technology - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/semantic technology en Copyright 2012 Richard MacManus readwriteweb@gmail.com Wed, 15 Feb 2012 12:30:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Drupal 7 Released, With Improved UI and Semantic Technology drupallogo150.jpgThe popular open source content management system Drupal releases its latest version today. Drupal 7 has been three years in the making, with code from thousands of contributors from over 200 countries.

Drupal 7 includes a number of improvements to both performance and usability. The enhancements to the UI mean easier administration, update management, accessibility and content creation. There's also a new image editor that allows users to re-size and crop photos without having to leave the platform.

]]> In order to help website performance, Drupal 7 offers advanced caching, content delivery network and master-slave replication. It also includes a new automated testing framework with over 30,000 built-in tests, something that will allow users to check the integration of patches and modules in order to help maintain platform stability.

Drupal 7 also features RDFa semantic technology as part of its core. The design of the platform embeds semantic metadata that will make machine-to-machine search native for a Drupal 7 website. RDFa will be able to give search engines more details not visible to humans, such as latitude and longitude of a venue. According to Drupal's creator Dries Buytaert, "Adding semantic technology to Drupal core will make a notable contribution to the future of the web."

The Drupal platform has seen increasing adoption, powering hundreds of thousands of websites, including a number of quite prominent ones, including WhiteHouse.gov and NASA.

]]> Discuss]]>
http://www.readwriteweb.com/archives/drupal_7_released_with_improved_ui_and_semantic_te.php http://www.readwriteweb.com/archives/drupal_7_released_with_improved_ui_and_semantic_te.php News Wed, 05 Jan 2011 08:05:40 -0800 Audrey Watters
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the five biggest, most cutting-edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]>

Editor's note: This story is part of a series we call Redux, where we'll re-publish some of our best posts of 2009. As we look back at the year - and ahead to what next year holds - we think these are the stories that deserve a second glance. It's not just a best-of list, it's also a collection of posts that examine the fundamental issues that continue to shape the Web. We hope you enjoy reading them again and we look forward to bringing you more Web products and trends analysis in 2010. Happy holidays from Team ReadWriteWeb!

Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php 2009 Redux Sat, 26 Dec 2009 14:00:00 -0800 Richard MacManus
Screencasts of Twine's Facelift; Does It Live Up to the Hype? We've chronicled semantic web service Twine's birth, checkered youth, and recent woes in terms of traffic waning and criticism waxing.

We've been given screencasts of the new version of this knowledge management application - screencasts of both the consumer- and developer-facing facets of the site. Take a look, and let us know if the new Twine lives up to expectations. This new version, we are told, will be live by the end of the year.

]]> The consumer product promises to supplant keyword search by treating the web like a huge database, with filtering capabilities that allow users to pare down search results to only the most relevant, applicable, and useful links.

Developers and other techies can check out this screencast exploring Twine's collaboratively authored ontologies:

The Twine folks see the new version as a realization of Tim Berners-Lee's vision of the semantic web. So what do ReadWriteWeb readers think; is the new Twine worth the wait? Does it live up to the hype? Leave your expert comments below.

]]> Discuss]]>
http://www.readwriteweb.com/archives/screencasts_of_twines_facelift_does_it_live_up_to.php http://www.readwriteweb.com/archives/screencasts_of_twines_facelift_does_it_live_up_to.php Semantic Web Fri, 18 Sep 2009 15:00:44 -0800 Jolie O'Dell
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the 5 biggest, most cutting edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]> Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php Trends Mon, 07 Sep 2009 05:30:00 -0800 Richard MacManus
Eqentia Launches Semantic Portals - Competes with OpenCalais, Evri At the SemTech conference in June I met with William Mougayar, founder and CEO of a semantic news platform called Eqentia. At the time the product was in development, but it is officially launching today. In a nutshell, Eqentia aggregates content into topics using semantic technology. In that respect it is similar to OpenCalais (our coverage) and Evri (our coverage). While all three products have different focuses, each semantically tags and aggregates content in a contextual manner.

The difference, claims Eqentia, is that "with Evri or OpenCalais, the onus is on the programmer." Eqentia says that with its product, "the content is already semanticized and all you have to do is to place it on your portal while preserving your SEO." The other two companies may disagree with that, but let's take a closer look at Eqentia.

]]> Disclosure: We have decided to use this product on ReadWriteWeb, to fuel our upcoming topic pages. Expect this feature to launch within a few weeks.

At its heart, Eqentia is an aggregation platform. It promotes itself as "an aggregator of context, not just content." The way it does this is to add context in the navigation. Each portal has its own taxonomy, which Mougayar described as "a bit like a hierarchical tagging structure." He said that "we basically wrap any content with a semantic wrapper."

How it Works

Under the hood, Eqentia does "content harvesting" from social media sites such as Twitter, blogs and more. Currently Eqentia is getting content from over 13,000 feeds, collecting an estimated 65,000 articles daily.

Eqentia told us that it's indexed 20 million articles so far. The largest topic currently is Outsourcing, with 90,000 articles. Other topics include: Cloud Computing: 60,000; Supply Chain Management: 40,000; Twitter: 20,000; Social Media: 11,000.

Eqentia then does "text mining and filtering" and the results are run through an "Aggregation Engine" (which has rules for sources and filters). Finally there is what Eqentia calls "Semantics Management" - including entity extractions, taxonomy definition, controlled vocabulary.

What The User Gets

Eqentia is starting off with a focus on "professional" content topics. It will target business and technology content, ignoring more mainstream topics like current affairs, sports, entertainment.

Eqentia is launching with 3 products:

1) Out-of-the-box portals. These will give general users free access to topic streams (of which there are 12 at launch, with more coming). There will be email options, widgets and RSS feeds available.

2) Personalized portal. These can be private or public. [note: this is what ReadWriteWeb has signed up for]

3) Enterprise. A SaaS platform that can be customized. A stated use case is for large companies to "disseminate organized news intelligence for their employees across distinct groups or market segments."

Conclusion: Tough Competition, But Important Market

The proof will be in the pudding as to how Eqentia compares to OpenCalais and Evri. We've been very impressed with both OpenCalais and Evri in our previous coverage, so Eqentia has high standards to live up to. In particular Eqentia is going to have to nail the User Experience, because it is relying on its interface a lot to give value to the user.

Finally, Mougayar noted to us that "if web 2.0/social media rewarded the socially savvy user, the semantic web/web 3.0 will reward the research oriented user." It's a nice marketing line, but we are apt to agree that products like Eqentia, OpenCalais and Evri are bringing much needed smarts to the oceans of content in the Web.

]]> Discuss]]>
http://www.readwriteweb.com/archives/eqentia.php http://www.readwriteweb.com/archives/eqentia.php Product Reviews Wed, 02 Sep 2009 06:00:00 -0800 Richard MacManus
Dorthy.com: A (Semantic) Search Engine for Dreams Dorthy.com, a site we've been hearing about since late last year, has just raised $4 million from angel investors for their "new agey" concept of a search engine for dreams. Currently in private alpha, the site makes fluffy claims about how they're "reversing the traditional search process, continuously filtering and focusing the Universe of online content, to connect you with the best stuff around your interests and aspirations."

If you're not clear on what exactly that means, don't feel bad... but don't write them off either. Instead, think of Dorthy.com as a new take on the old 43Things, the site which encourages users to list goals, share progress, and cheer each other on. Dorthy does the same but gets you there by making interesting use of Web 3.0 technologies like AI and natural language search.

]]> Semantic Search for Dreams

According to Jim Anderson, the About.com co-founder who was hired as Dorthy's CTO earlier this year, the site's search engine doesn't use keyword-based search but rather has the user enter a fully formed question, statement, or phrase like "run a marathon in 4 hours." Not only does the search engine parse the semantics of your input using its proprietary algorithms, it also learns from you, incrementally enhancing your results upon every visit.

As an example, Anderson describes how a fictional user named Jennifer might search for information about a trip to Paris. Because Jennifer had previously shared other background information like the fact that she's an avid marathon runner, fluent in French, wants to learn to cook French food, and hates cruises, Dorthy.com will retrieve specific information related to those interests. The results would be filtered to highlight info on cooking schools, shopping, and popular running routes in Paris - things that would be interesting to Jennifer specifically.

This example doesn't even necessarily count as a "dream," it seems - you could plan an actual trip to Paris using Dorthy's technology, too. However, the overall point of the service is to provide you with information about a particular goal or aspiration and then connect you with others who feel the same.

Using Dorthy

When performing searches on Dorthy, you'll have the option to create your own page on a specific topic or view the topic pages others have already created. These pages feature popular articles, videos, photos, and blog entries from the web and are constantly being updated with new content. When you find content you like, even if it's on someone else's page, you can easily copy it over to a page of your own.

After this initial "discovery" process is complete, you can use Dorthy's "Connect" feature to meet others also interested in your topic so you can share your progress and encourage each other, much like how the above-mentioned 43Things operates.

In the future, Dorthy hopes to expand their offering to go beyond simply being a consumer-targeted Web-based service to one that could benefit the enterprise (think "I want to go to a virtualization conference in Las Vegas"), or so reported eWeek earlier this year. They also plan on moving to mobile at some point, too.

At the moment, Dorthy.com is in private alpha, but you can sign up to join here.

]]> Discuss]]>
http://www.readwriteweb.com/archives/dorthy_a_semantic_search_engine_for_dreams.php http://www.readwriteweb.com/archives/dorthy_a_semantic_search_engine_for_dreams.php Search Fri, 21 Aug 2009 08:08:03 -0800 Sarah Perez
How Does the Web Feel? Evri's New Sentiment API Tells You Semantic search engine Evri can now understand how the web feels with the launch of their new sentiment web API. While busy scouring the net for people, places, and things and determining the relationships between them, the search engine is now able to understand the feelings associated with these entities, too, be them positive or negative. Using the API, developers can build applications for things like market intelligence, market research, sports and entertainment, brand management, product reviews and more.

]]> Not Just Good or Bad, but Who, What, and Why, Too

At first we thought Evri's API would simply rank things as positive or negative, much like the Twitter tracker twendz does today, highlighting positive, negative, and neutral items. However, the sentiment API does so much more, allowing you deeper insight into the "who's," and "what's," and "why's" associated with the particular expression or feeling.

To be more specific, according to the announcement, Evri lets you:

  • Find the percentage of positive and negative expressions of sentiment made by an entity, or about an entity. For example, find out what percentage of things being written about the iPhone are positive and which percent are negative.
  • Discover who is criticizing and who is praising a particular person, place or thing. For example, see who is criticizing and praising Microsoft right now.
  • Read what praisers and critics are saying about an entity. For example, see what the GOP are saying about the Democrats.
  • Discover who or what your favorite entity is bashing and why. For example, see who Lance Armstrong is complaining about.
  • Discover who or what your favorite entity is praising and why. For example, see who the World Health Organization is commending and why.

When unleashed upon the web as a whole, this could unearth a veritable goldmine of information. Just thinking of how many different ways it could be used is enough to blow anyone's mind. Of course, marketers will be the first to jump on board, looking for practical ways to track the feelings about their companies, clients, and brands and why they're changing, but an engine that understands sentiment could do so much more than just this. It can literally take the pulse of the web the way we take the pulse of Twitter using apps like the above-mentioned twendz to rank trends as positive or negative.

Demo: The "Vibology Meter"

To demonstrate what Evri can do, the company created a widget called the "Vibology Meter." (Sadly, no link is provided). The widget not only ranks the good or bad "vibes" about a particular entity (in the example, Barack Obama), but also explores topics associated with that entity and whether or not the primary entity feels positively or negatively towards them. For example, the widget shows Obama is negative towards the GOP and Rush Limbaugh but feels positive about Michele Obama. (Well, that's good!)

When you click on any one of the associated topics (or click on "anything" to see all topics of either positive or negative slant), you're then presented with a sidebar of information. Here, snippets from articles found on the web display along with a title, link, and timestamp.

Of course, this is just a simple example of the Evri API in action. We're sure the developers out there can think up even better ideas than this.

Challenges Ahead

The challenge now for Evri is to keep expanding its index in order to track more sources to rank. At the moment, the engine doesn't track a large slice of the web the way a typical search engine like Google does - in fact they don't even claim to be a search engine...despite what that "Go to" box on their homepage would have you believe. Instead, Evri looks specifically at the people, places, and things on the web and maps the connections between them.

To determine these connections - and now, the associated sentiments as well - Evri pulls from a limited number of "highly regarded" sources. That means you'll definitely see a site like CNN used to rank a person like Obama, but the myriad of tiny politico blogs will be ignored. That's actually a shame, since delving into this "long tail" of the web could give a better overall picture of how all people really feel, not just the sentiments expressed on high-profile sites written by top bloggers and journalists. Still, we know indexing and parsing this long tail is something that's much easier said than done.

In the end, what Evri's doing, even on this smaller scale, is definitely interesting. We hope to see the new API put to good use in the near future.

]]> Discuss]]>
http://www.readwriteweb.com/archives/how_does_the_web_feel_evri_tells_you.php http://www.readwriteweb.com/archives/how_does_the_web_feel_evri_tells_you.php Semantic Web Fri, 14 Aug 2009 07:37:20 -0800 Sarah Perez
Faviki's Social Bookmarking Tool Makes Semantic Tagging Even Easier When we first looked at Faviki, a social bookmarking application which made its debut last year, we were intrigued by their idea of "semantic tagging." What makes Faviki different from its competitors, services like del.icio.us, Diigo, and the now-defunct Ma.gnolia, is the way the service suggests tags to its users. The suggestions don't come from the community of Faviki users and their tagging history - they come from structured info extracted from the Wikipedia database.

Today, Faviki is releasing an upgrade to their service which will give you even better control over the tagging process, making bookmarking even easier than before. They're also announcing support for OpenID.

]]> A Better Tagging Interface

The biggest upgrade today is Faviki's enhanced tagging interface. In the past, Faviki struggled with some of the tag suggestions pulled out of Wikipedia because they were too long and too hard to enter for practical use. Plus, users wanted to use tags of their own creation, not the tag suggestions.

For example, if someone is tagging an article about the soccer player "Filippo Inzaghi," they may want to tag it by the player's nickname "Pippo." Before, this was not possible. But now, if Faviki doesn't understand a tag, it will pull in possible matches and ask you "What exactly do you mean by ______?" After you pick your selection, Faviki will remember your choice.

This is an important change for the service because it means users can tag web pages any which way they want, but they're still linked to the structured data on the back-end. That way, when someone searches through Faviki's community tags, all the web pages for that particular item or concept will appear, even if people tagged them using their own personal keywords.

Beyond Wikipedia

Another change in Faviki's service is the ability to define new tags. Prior to today, the service was limited to searching Wikipedia for tag suggestions, but now it has the whole web at its fingertips. If a tag is entered which doesn't match anything from Wikipedia, Faviki will search Google for relevant URLs and then ask if the links presented represent the same tag. As multiple users go through this process, Faviki learns what URLs best represent that concept and adds the new tags created by the users to its database.

API, OpenID, and More

Faviki has also just launched a Save/Edit API that provides a way to save and edit bookmarks from other applications. In addition, they've introduced support for OpenID. Other new features arriving today include a smarter autocomplete list, the ability to convert tags, spam control, the ability to export/backup your bookmarks, and a new tag description tooltip.

The only issue we have with Faviki is the same one we had before: there's still no import function available. That means you'll have to leave your extensive bookmark collection behind if you want to use this service. We suppose that it could be difficult to properly tag and match all of our old bookmarks, but without this feature, Faviki doesn't have the best shot at attracting the heaviest users of social bookmarking services.

]]> Discuss]]>
http://www.readwriteweb.com/archives/favikis_social_bookmarking_tool_makes_semantic_tagging_easier.php http://www.readwriteweb.com/archives/favikis_social_bookmarking_tool_makes_semantic_tagging_easier.php Product Reviews Thu, 02 Jul 2009 06:04:01 -0800 Sarah Perez
Everything You Wanted to Know About Semantic Technology, But Were Afraid to Ask (at SemTech 09) Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.

Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.

]]>

Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:

Structured Data

Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.

Morphology

If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax

A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.

Statistics

An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.

Scalability

Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.

These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.

]]> Discuss]]>
http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php Sponsors Fri, 26 Jun 2009 05:00:18 -0800 RWW Sponsor