Semantic Web - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/Semantic Web en Copyright 2009 Richard MacManus readwriteweb@gmail.com Mon, 23 Nov 2009 16:43:23 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss The Future Is All About Context: The Pragmatic Web The semantic Web has long been heralded as the future of the Web. Proponents have said that Web experiences will some day become more meaningful and relevant based on the AI-esque computational power of natural-language processing (NLP) and structured data that is understandable by machines for interpretation.

However, with the rise of the social Web, we see that what truly makes our online experiences meaningful is not necessarily the Web's ability to approximate human language or to return search results with syntactical exactness. The value of the semantic Web will take time because the intelligent personal agents that are able to process this structured data still have a long way to go before becoming fully actualized.

]]>Sponsor

]]> This guest post was written by Alisa Leonard-Hansen.

Rather, meaningful and relevant experiences now are born out of the context of our identities and social graph: the pragmatics, or contextual meaning, of our online identities. My Web experience becomes more meaningful and relevant to me when it is layered with contextual social data based on my identity. This is the pragmatic Web.

We need to better understand our identity as it begins to define our experience of the Web and the networked-enabled world we inhabit. Our online identity will increasingly be defined by three "pillars": who I say I am, what I do and say, and who I connect to (and who connects to me).

To clarify, our online identities are comprised primarily of three specific kinds of data:

  • Explicit or prescriptive data (i.e. the data that I input about myself: name, age, occupation, etc.);
  • Activity or behavioral data (i.e. what I do and say online);
  • Relationship data (i.e. my social graph and what my connections say about me).

If we consider the power of this pragmatic Web (a highly relevant and individualized Web experience based on the ubiquity of our identity data), we find that it not only impacts individual user experience, but that it opens up entirely new opportunities for business online. The future is not "business as usual." Business models will be based on what Elias Bizannes of the Data Portability Project calls the "information value network-economic value," derived from services that focus on activities with comparative advantage and that leverage free access to data.

Consider this: as media companies scramble to identify new and innovative ways to advertise to the sea of nameless, pixeled users who graze through their content each day, a rich supply of highly valuable identity data lies just beneath the surface, left unmeasured and unmonetized.

Facebook is nothing more than perhaps the largest single database of this kind of online identity data: explicit, activity and relationship data. With the development of Facebook Connect, which allows for the "open" exchange of Facebook user data between Facebook and third parties, Facebook could conceivably (and will) create an Facebook Connect ad network (read: data exchange), supplied by the valuable and highly targetable user identity data that is currently siloed on Facebook's servers. This identity data within Facebook is what makes the activity in "social media" so valuable.

But the centralization of identity data on one or two major networks (such as Facebook, Twitter and MySpace) won't realize the vision of the pragmatic Web. So, how will the pragmatic Web come to be? How do we realize the power of a dynamic Web that is based on our identities? We do so by empowering individuals to access and control their identity across any site or service, through standards that enable data portability and open Web inter-operability. The resulting vision is that of a highly personalized, dynamic, relevant and remixable Web experience, yielding greater access to information through discovery, communication and collaboration. For enterprise, this could mean the rise of innovative new business models, based on data-driven value exchange.

One final note on identity data as it relates to enterprise. As Bizannes points out, the value of this kind of identity data rests on the key factors of time and timeliness. Essentially, identity data is valuable only if it is recent. Facebook wouldn't be able to sell your (permissions-enabled) data to advertisers if it used your explicit data from a year ago rather than from today. So, Bizannes argues that real-time "access" to someone's identity matters most, and it's no longer about data "capture." Thus, as new business models arise out of monetizing permissions-enabled identity data, the value of the business models will depend on these entities having real-time access to the data.

Guest author: Alisa Leonard-Hansen is a digital strategist and Social Media Evangelist at iCrossing, a leading global digital agency. She is also the Communications Chair for the Data Portability Project and blogs about the social Web on her blog, TheWebisSocial.com. Follow her on Twitter @alisamleo.

]]>Discuss]]>
http://www.readwriteweb.com/archives/future_all_about_context_the_pragmatic_web.php http://www.readwriteweb.com/archives/future_all_about_context_the_pragmatic_web.php Analysis Fri, 20 Nov 2009 11:14:15 -0800 Guest Author
A New Commercial Ontology from Hakia Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products.

We at Hakia are proud to announce our upcoming commercial ontology, perhaps the world's first. What is a commercial ontology? If you're asking this question you have just touched on an important distinction: fantasy versus reality. In the context of the Web, a commercial ontology is a realistic version of an ontology, as we explain below.

]]>Sponsor

]]> Realities of the Web

Hakia has accomplished two important innovations in building its commercial ontology (CO): first, the development of concepts and lexicons that follow strict guidelines on the realities of Web operations. What are these realities? Most search queries on the Web reflect a single dimension of intent, almost exclusively relevant to commercial topics. "Commercial topics" here must be taken in the broadest sense possible. For example, if you were looking for "the benefits of foot massage" or "the director of the movie Last Emperor," your queries would fall into a commercial pattern. One particular distinction of the commercial pattern is that they come in short packages, including a name (onomasticon) or referring to something sold, bought, watched, heard, etc.

In contrast, many (if not all) ontologies that have been built to date (or claimed to exist) are focused on the use of language in the general sense, but not in the sense of commercial patterns on the Web. Therefore, their usefulness when tackling Web search queries is greatly compromised, sometimes to the point of absolute failure. If such an ontology could disambiguate a dozen different senses of the word "kill," it would be sad news if the last 100,000 queries in the search logs did not include a single occurrence of the word "kill." Like drowning in two-inch-deep water, such ontologies do not use their disambiguation capacities for nearly 80% of queries because the queries include nothing but onomasticons or are too short (under-articulated).

The Sequence Approach

The second innovation used in the CO is the use of sequences instead of single words. A single word, like "kill," is the most ambiguous state of information and is hardly used in human communication without a strong implied context. As a result, building natural-language processing (NLP) systems by taking individual words as units of computation is an invitation for disaster.

In contrast, word sequences (two or more words) are inherently safe and highly descriptive. Take "road kill," for example. This sequence describes the corpse of an animal killed on the road by a passing vehicle. If a language processing system takes the sequence of words as a unit of computation, 99% of the ambiguity problem vanishes. There is no need to process the words "kill" and "road" separately, trace their senses, and locate convergence to identify the meaning of "road kill" if you can just take the sequence "road kill" itself as your unit of computation for mapping. This is depicted below:

Note the number of traces required in a conventional ontology approach compared to the sequence approach. The sequence approach requires a lot of data storage space (which is dirt cheap), whereas the conventional ontology approach requires a lot of CPU for a simple mapping task (which is expensive). But the bad news does not stop there. The trace routes in conventional ontology require manual work (impossible to automate), whereas sequence-based ontology can be easily built via automation.

Perhaps not everyone will understand the second point above. Nevertheless, the scalability and performance of the end product will speak for themselves when Hakia puts the testing platform online.

Usage of the Commercial Ontology

The immediate use of the CO is for search queries, or document characterizations, not tied to any advertising in conventional systems. This unrecognized domain of search queries and characterizations means loss of revenue. Hakia's CO is designed to fill in this gap. For example, if the search query or page characterization is "beat generation," the CO can map it to "literature" on the fly. As a result, systems using the CO will have a much deeper understanding of the incoming terms, and thus will be able to recognize the underlying intent beyond the face value of the words. The same capability can be used in a number of places other than advertising with the same effect.

Stay tuned for the release of the first version of Hakia's commercial ontology.

]]>Discuss]]>
http://www.readwriteweb.com/archives/new_commercial_ontology_from_hakia.php http://www.readwriteweb.com/archives/new_commercial_ontology_from_hakia.php Sponsors Thu, 30 Jul 2009 05:00:17 -0800 RWW Sponsor
Glue API Links Data from Popular Social & Semantic Sites Glue, a browser-based social network that appears on sites such as Amazon, Last.fm, Netflix, Yahoo! Finance, Wine.com, and Citysearch, today announced their public API for third-party developers.

Glue joins a family of available semantic APIs with a mix of unique semantic and social API features. The API is currently demoed in three apps: Glue Stream, Glue Quilt, and Glue Spider.

]]>Sponsor

]]> Glue Stream is "Glue Live" and shows what is happening on Glue in real time. Glue Quilt shows trending topics over the past 7 days. Glue Spider shows connections between people and sites around the web.

Glue's API is part of an expanding group of semantic web APIs such as Reuters' OpenCalais, Dapper, Freebase, Evri, and Zemanta, all of which helped develop links and relationships between terms and concepts.

According to the company's website, the Glue API complements the existing semantic web family in two ways. "First, it brings the exciting social dimension to the equation, revealing how people connect around things and concepts instead of pages. Secondly, Glue enables developers to get meta data and related links for books, music, movies, video games, topics, stocks, stars, artists, wine from hundreds of popular sites, turning these sites into databases."

Currently the API is free and limited to 5,000 calls per day. For more usage developers should email support@getglue.com.

]]>Discuss]]>
http://www.readwriteweb.com/archives/glue_api_links_data_from_popular_social_semantic_s.php http://www.readwriteweb.com/archives/glue_api_links_data_from_popular_social_semantic_s.php Semantic Web Tue, 16 Jun 2009 15:00:00 -0800 Jolie O'Dell
Baratunde Thurston on Content Curation, Real-Time Search, and "Analytics Porn" In New York City, on the 16th floor of the Roger Smith Hotel, we caught up with social media superhero Baratunde Thurston, web editor for The Onion.

Thurston started getting into this whole "Internet" thing in simpler times when the social web was called Usenet. He now carves out his niche at the overlap of the Venn diagram of comedy, politics, and tech. As an official Internet old-timer who makes it his business to stay relevant, Thurston has particularly useful insights on the business of curating applicable content with great efficiency and timeliness.

]]>Sponsor

]]>

"I remember," Thurston said, "back in 1996 or 1997, when you could finish the Internet... You could stay up until two or three in the morning and go to sleep and know, 'I read the Internet today.'" Simpler times, indeed.

So, with the mind-boggling multiplicity of blogs, news sites, and social networks, how does a professional netizen maintain cultural and technological relevance? And what tools does the modern, socially cognizant webmaster use to track and optimize traffic in real time? Call us cruel, but we prefer you watch the video and hear it all firsthand.

]]>Discuss]]>
http://www.readwriteweb.com/archives/baratunde_thurston_on_parsing_content_real-time_se.php http://www.readwriteweb.com/archives/baratunde_thurston_on_parsing_content_real-time_se.php People in Tech Mon, 18 May 2009 21:13:25 -0800 Jolie O'Dell
Understanding the New Web Era: Web 3.0, Linked Data, Semantic Web I've been following a fascinating 3-part series of posts this week by Greg Boutin, founder of Growthroute Ventures. The series aimed to tie together 3 big trends, all based around structured data: 1) the still nascent "Web 3.0" concept, 2) the relatively new kid on the structured Web block, Linked Data, and 3) the long-running saga that is the Semantic Web. Greg's series is probably the best explanation I've read all year about the way these trends are converging. In this post I'll highlight some of Greg's thoughts and add some of my own.

]]>Sponsor

]]> Web 3.0: What Comes After 2.0 (!)

Part 1 of Boutin's series was about how Web 3.0 will not solve the vexing issue of Information Overload, at least not yet, because there is so much groundwork to lay first. Specifically, there is a lot of unstructured data on the Web right now; and it'll take a lot more sorting out before it gets to be structured.

Last year Boutin loosely defined web 3.0 as "the Web of Openness. A web that breaks the old siloes, links everyone everything everywhere, and makes the whole thing potentially smarter."

There is a lot of debate about what Web 3.0 is and the term itself is open to derision. In my view Web 3.0 is an unoriginal name for the next evolution of the Web. What's important to note though, is that there is a difference in the products we're seeing in 2009 compared to the ones we saw at the height of 'Web 2.0' (2005-08). If Web 2.0 was about user generated content and social applications such as YouTube and Wikipedia, then Web 3.0 is about open and more structured data - which essentially makes the Web more 'intelligent'.

The smarter the data, the more things we can do with it. The current trends we're seeing today - filtering content, real-time data, personalization - are evidence that 'Web 3.0' is upon us, if not yet well defined. We actually saw a great example of Web 3.0 this week, with Google's release of Search Options and Rich Snippets. Those features added real-time search, structured data, and more to Google's core search.

Linked Data: Structured Data, But Not Necessarily Semantic

In Part 2 of his series, Greg Boutin tackled Linked Data. He explained that "Linked Data offers a new medium to link structured data that is then more machine-readable." However, he added that Linked Data "does not by itself add any semantic meaning to the information, but it better carries that semantic information once you have it. So, while Linked Data is not semantic, creating links at the data level paves the way to a true Semantic Web."

Alexander Korth wrote a guest post on ReadWriteWeb recently that explains Linked Data more. It is a concept that comes from the W3C, which has a Linking Open Data (LOD) project. The image below illustrates participating data sets as of March 2009. Some of the more well-known commercial data sets are Thomson Reuters' Open Calais project, Freebase, and DBpedia. As Alexander explained, the data sets are set up to re-use existing ontologies such as WordNet, FOAF, and SKOS and interconnect them.

According to Greg Boutin in Part 3 of his series, the Linked Data format "does not create smart data, it only enables it." He suggests that "technologies to turn unstructured data into structured data is really where we ought to invest, and focus our efforts." Another piece of advice he gives is that entrepreneurs would do well to "consider mashing up Linked Data with other technologies."

Semantic Web: Google Will Play a Big Role

So where does all of this leave the Semantic Web, that great white whale of the Internet? Boutin referenced a ReadWriteWeb post from October '08, that asked Where Are All The RDF-based Semantic Web Apps? And that is the crux of the problem with the Semantic Web. While Tim Berners-Lee claims that the Semantic Web is open for business, the reality is that there are precious few real-world apps that use RDF currently.

However RDFa, which enables web publishers to embed RDF into HTML, gives some hope. Google announced this week that it will support RDFa in its "rich snippets", following on from Yahoo's brave Search Monkey launch last year (which did a similar thing).

Google is going to play a big role in making the Semantic Web mainstream. We noticed here on ReadWriteWeb in January that Google had begun to expose semantic data in search results. We noted that Google appeared to be parsing the semantic structure from semi or unstructured data. An anonymous commenter in Boutin's third post claimed that he'd published a similar finding 6 months prior to us - he said that "Google's algorithm [is] a lot more sophisticated than just statistical methodology and that it was definitely already developing semtech knowhow and capabilities [in mid-08]."

Google isn't the only big company doing this either. We've already mentioned Yahoo, but Microsoft paid over $100 million to try and do the same thing last summer when it acquired Powerset.

Conclusion

Web 3.0 is an amorphous term, and possibly one that people shouldn't even attempt to use. Nevertheless, it's clear to us that the time for structured data has come. We're beginning to see it in the current wave of Linked Data sets being released, and in the support that big companies, like Google and Yahoo, are showing for structured data. Who knows, maybe the Semantic Web is nearly upon us too.

]]>Discuss]]>
http://www.readwriteweb.com/archives/understanding_the_new_web_era_web_30_linked_data_s.php http://www.readwriteweb.com/archives/understanding_the_new_web_era_web_30_linked_data_s.php Analysis Thu, 14 May 2009 05:15:00 -0800 Richard MacManus
Talis Takes on Amazon With Pot of Structured Data in the Sky By making available databases of human genomic data, US census records and other data of public interest, the Amazon Public Data Sets are an incredible resource. They're like a 21st century Public Library for robots to patronize. In this emerging era of flourishing data-centric applications, though, the state of the art never stands still.

Forty year old British technology platform Talis (background) announced this week that it now offers free, perpetual storage and keyless API access to semantically marked-up large data sets. The offering is called the Talis Connected Commons and it's the kind of thing that anyone with a geekish imagination can get excited about.

]]>Sponsor

]]> The Setting

If the current web economy is being rocked by easy publishing systems that make the people formerly known as "consumers" capable of publishing and socializing around content of their own creation - then the next step of internet evolution may come in the form of automated systems able to process meaning and patterns out of large amounts of user-created and other information. When structured, free and available programmatically in bulk - that data is like a big pot of gold for developers.

While Amazon offers free access to data sets, transport of the data is still paid for by users. The Talis Connected Commons also offers an API by default (a SPARQl end point, in particular) and is focused specifically on semantic data. The system is made for public sharing - two variations of Creative Commons licenses are supported for the data stored there. Talis is requesting that data set owners email a short description of their content to the company for approval and inclusion on the site.

In other words, there's no gold in the pot yet. Talis is more than well established and this offering is aimed at such a sweet spot that the only way the Connected Commons won't be filled with good data is if the company totally drops the ball. We don't expect that to happen.

The Plot

This project is in the same vein as Nova Spivak's forthcoming ontology authoring and hosting service, the vision of open source microblogging as the future of business intelligence and more.

There's a chain of events that news like this helps fill out. First, massive bodies of data are created or gathered, books are scanned, census data is collected, and patients donate their anonymous aggregate medical data to science. Next, the data is semantically analyzed and marked up (through any number of different semantic processing engines). Then, the data is stored and an API is made available (this is where the Talis Connected Commons comes in). Finally, developers build applications that leverage the smart data offered up through the platform, data visualizers find new stories to tell in images built from the marked up data and new relationships between people, organizations and concepts have the mist cleared away from them through systematic analysis of various permutations of previously unavailable structured data.

Amazon Public Datasets include things like human genomic data, US census data, and data parsed from Wikipedia. What will the Talis Connected Commons provide a home and API for? We look forward to finding out.

]]>Discuss]]>
http://www.readwriteweb.com/archives/talis_takes_on_amazon_with_pot_of_structured_data.php http://www.readwriteweb.com/archives/talis_takes_on_amazon_with_pot_of_structured_data.php Amazon Tue, 31 Mar 2009 14:02:00 -0800 Marshall Kirkpatrick
Look Out TinyURL; Bit.ly Gets Hot Silicon Valley Cash Link shortening services are so common you can't throw a stone online without hitting one, but TinyURL is the undisputed champ. It's one of the oldest, its name says what it does and despite repeated outages - its downtime is small enough that millions of people keep using it.

TinyURL has also allowed incomprehensible amounts of value, both in terms of technology and in terms of money, to sit on the table unclaimed. For years. Now a group of some of the web's hottest investors are betting a few million dollars that a smart TinyURL competitor called Bit.ly can take advantage of being the conduit through which millions of people visit sites of interest to them.

]]>Sponsor

]]> Today Bit.ly announced that it has raised about $2 million in its first round of funding. The round was led by Tim O'Reilly's venture fund and included money from Mitch Kapor (the inventor of Lotus), Jeff Clavier (portfolio), Ron Conway (early Google investor), the Accelerator Group and Howard Lindzon's new fund Social Leverage. All of those names are some of the hottest in the startup scene and all the companies in those various portfolios will now have a close business connection to Bit.ly.

We reviewed Bit.ly when the project launched last July and urged readers to use this service to shorten their long links instead of other services like TinyURL. Why do we care what service people use? Because we're fans of innovation and Bit.ly is aiming to be a platform for innovation like TinyURL should have been. If web 2.0 is about democratizing publishing, the next step is machine leveraging all the resulting data.

The Bit.ly Magic

What does Bit.ly do that's so special? They use all the data they see and make it available to third party developers who want to build on top of it. They keep track of the clickthrough numbers and can tell you what the hottest links on the web are at any time. See this @bitlynow Twitter account for one display of that information. Bit.ly says it resolved 20 million distinct URLs last week. That's the beginning of a really large database.

Bit.ly also uses Reuters Calais to extract semantic terms out of the pages that shortcuts are created to. That's valuable information. Want to see the most popular web pages that talk about Dancing With The Stars, or the Federal Stimulus Package, or some other topic, in the last 30 minutes? Somebody wants to, you'd better believe, and that's the kind of real-time information that the Bit.ly API aims to make available. (Disclosure: Calais is an RWW sponsor.)

We've had some concerns about the clickthrough numbers that Bit.ly has reported but the company says they are going through a list of reporting sources that give them problems and eliminating them one at a time. The company says it is now reporting real-time traffic stats that are within 10% of what Google Analytics reports much later. We've been watching the numbers improve in accuracy when it comes to our numbers and can confirm that they are getting much better.

A number of people have looked at today's news and thought it was ridiculous that a link shortening business could raise $2 million in funding. We don't think it's ridiculous at all. Show us a service that can report in real time how many people are visiting millions of pages around the web and what those pages are about, that exposes that data in an API, and we'll show you a platform we're very excited to see work.

]]>Discuss]]>
http://www.readwriteweb.com/archives/look_out_tinyurl_bitly_gets_hot_silicon_valley_h.php http://www.readwriteweb.com/archives/look_out_tinyurl_bitly_gets_hot_silicon_valley_h.php News Mon, 30 Mar 2009 11:42:44 -0800 Marshall Kirkpatrick
Twine Could Soon Surpass Delicious, Prepares Ontology Authoring Tool Nova Spivack's semantic web company Twine is developing a free service to write and host semantic ontologies; the classification trees that enable machines to put concepts in topical context. Ready to play Aristotle and create an ontology of cheese, model airplanes, global anti-hunger organizations or any other topic?

What blogging was to publishing, a simple tool that made far more people able to participate, Twine's new ontology writing and hosting service could be to the act of teaching machines about new topics.

]]>Sponsor

]]> The company wouldn't let us publish the new service's name but says it is aiming for a launch date this year, as soon as a go-to-market strategy and appropriate partnerships are lined up. The ontologies created won't only work on Twine; they will be referenceable by semantic apps anywhere around the web.

Aplus.net

Twine Could Surpass Delicious in a Matter of Months

Twine's public product lets people bookmark items like web pages and videos into topical collections. The service then analyzes the contents of all the bookmarks to identify the key concepts, people, places and other information automatically. It's like tagging in Delicious but automated and, in theory, more thorough than any human being would be in assigning tags.

Compete.com says Delicious gets about 2 million unique visitors a month and has stopped growing. Twine just passed 1 million uniques and is growing fast. Spivack said that 40% of that traffic comes from Google, and sure enough those Twine pages look awfully juicy from a spider's perspective. Spivack expects Twine to hit 2 million uniques in a matter of months and that looks like a credible claim to us.

twinetraffic.jpg

The number of saved items is far greater in Delicious than in Twine - about 150 million vs. 3 million. Spivack says though that the company will soon turn back on its system that crawls all the links on bookmarked pages. Those linked-to pages will be automatically bookmarked and analyzed too, quickly expanding Twine's total archives.

So by this summer, Twine could be bigger and more visited than Delicious. We wrote a scathing review of the Twine user experience when the long-awaited service began to launch last year. The site has changed a lot since then and we're excited about the company's plans for the future. We are still concerned about the company's ability to make its interfaces really usable -- but if they can, then look out, internet.

Twine and the Semantic Web

The semantic web is a paradigm that adds standardized, structured markup to web content so that savvy applications can comprehend the key topics of any web page. Publishers can do that when they publish, or services like Twine can create the semantic markup from the outside. The automatic tagging Twine does is actually semantic markup.

For example, you can't ask Google today to show you all the book reviews around the web that were written by friends of yours who live in New York - but semantic search engines could make such a query trivial and use that information as the ground level for building more sophisticated features on top. It's a form of standardized metadata. It turns free text into data that can be mashed up.

ontologysite.jpg

Semantics Plus Ontology Equals Meaning

Spivack says that his existing product, Twine, is just one of a number of applications that only extract key concepts (people, places, key terms) out of a web page. Placing those concepts in context is the next step.

Twine can tell you that a web page is about goat cheese, for example, but it doesn't yet know how to infer that the page is also about a dairy product - the larger category that is not explicitly stated in the article. An ontology is that context, be it a dairy ontology, a cheese ontology or a new node in the existing accepted ontology of food.

Those new ontologies can be created using Spivack's simple, open source authoring tool and then hosted on his open source community site for ontologies. It's open source authoring like Wordpress and code hosting and discussion like Sourceforge.

Either Twine or a third party will then combine the extracted "entities" (people, places, key terms) with an appropriate ontology and that company's "inference engine" to build a full picture of what a web page is about and where it stands in relation to everything else.

ontologyscreen2.jpg

Busting Out of the Tech Ghetto

The limited number of ontologies that have been authored to date are largely centered on technology topics. An easy ontology authoring tool could change that radically. A standardized, accessible ontology can shine a light on a whole new part of the world. Once that topic has been illuminated for the eyes of a semantics reading machine, web developers can build services that intelligently make use of the new information.

Spivack says that heavy-duty ontologies that require computationally intensive logic navigation will still need to be built using heavy-duty desktop apps. But web applications that just need data served up smartly will work well with the kinds of ontologies that can be written with Spivack's new authoring tool.

Ready for the whole, diverse internet to be contextually understandable by web applications? Ready to contribute to the creation of those contextual explanations yourself? Keep your eye on Nova Spivack because that's what he's aiming to make happen.

]]>Discuss]]>
http://www.readwriteweb.com/archives/twine_could_soon_surpass_delicious_prepares_ontolo.php http://www.readwriteweb.com/archives/twine_could_soon_surpass_delicious_prepares_ontolo.php Authoring Tools Mon, 16 Mar 2009 16:48:12 -0800 Marshall Kirkpatrick
Happy 20th Birthday, World Wide Web wwwlogo.jpgOn March 13th, 2009 the World Wide Web will turn 20 years old. Sir Tim Berners-Lee invented this world-changing layer on top of the Internet on this day in 1989. It's hard to overstate the impact this young technology has had already and it's even more exciting to think about where it's going in the future.

Berners-Lee has some great ideas about where the web should go next. His vision is of a major advance that could serve as the foundation for innovations that we can't even imagine today.

]]>Sponsor

]]> One year ago Berners-Lee said that all the pieces needed to build a new Semantic Web are now in place. Last month he gave an impassioned talk at the high-profile TED conference about a related concept called Linked Data, a set of ideas he outlined in 2006. The gist of the idea is that we need every institution that can do so to put raw data in a standardized format up on the web.

What's so exciting about raw data? We'll defer to Berners-Lee's 15 minute explanation at this year's TED conference. The video of his talk will be posted on the TED website early Friday morning, but ReadWriteWeb readers can check it out now.

Thank you Tim, for what you've done for the world already.

]]>Discuss]]>
http://www.readwriteweb.com/archives/happy_20th_birthday_world_wide_web.php http://www.readwriteweb.com/archives/happy_20th_birthday_world_wide_web.php News Thu, 12 Mar 2009 23:50:17 -0800 Marshall Kirkpatrick
DEMO Trend: The Smarter Web (Part 2) Part Two of a Two-Part Series. Part one can be found here.

At this month's DEMO 09 conference, one of the most apparent trends was the emergence of several new intelligent web services. In this transitional period between Web 2.0 and Web 3.0 (or whatever it is that comes next), the tools of the future are just now being revealed. Although at first glance some of these services and applications may seem somewhat incomplete, in many cases they actually represent years' worth of work to have reached the point they're at now. These are no simple Web 2.0 applications; these are highly complex and intelligent tools of tomorrow's smarter web.

]]>Sponsor

]]> Yesterday, we examined a handful of services which represent this emerging class of intelligent services and today we'll look at a couple more.

A.I.-Powered Shopping (Gazaro)

Gazaro is a new service that lets you make what they call "personal sales fliers." Instead of sifting through the local paper to find the latest deals, you just tell Gazaro what sorts of products you're interested in. The service then scours the web for the best deals and presents its findings in a clean, easy-to-read interface. But Gazaro isn't simply a price comparison engine. It's a really smart one.

Gazaro knows that a "camera" is a "camera" or that an "LCD" is an "LCD." It's not doing simple keyword matching, it really understands the difference. In other words, you'll never get results for a camera lens or camera accessories when you're searching for just a camera because Gazaro knows those are not the same things.

The reason it can differentiate between items is because it's powered by Artificial Intelligence (A.I.) on the back end. In this case, "A.I." is no buzzword - the company was incubated by Apption Software who had developed A.I. technology for use in the enterprise. They realized that the same technology could deliver value in a consumer application as well, and from there came Gazaro.

When Gazaro goes out and crawls the internet, it compares the items it finds to the items it already knows in order to determine what exactly the new items are. If it encounters something it doesn't know, it makes an educated guess using its A.I. "brain." And the more it crawls, the more it learns.

gazaro.PNG

After identifying what an item is, Gazaro then determines if the item found is actually a good deal. How good of a deal it is or not is represented with the "Gazaro Deal Score." These deal ratings are based on Gazaro's knowledge of historical prices, how often an item goes on sale, what other retailers are selling it for now and what they've sold it for in the past. All that analysis is done using the A.I. technology in order to rate the deal on a scale from 1 to 10, with 10 being the best deal.

To the consumer using the system, the complexity of what the A.I. is doing is all hidden behind the scenes. The end user only sees a simple interface where they can enter in the items they're shopping for and then find the best prices. Gazaro can also alert users to new sales and deals using email, RSS, or Twitter. At the moment, Gazaro is for consumer electronics shopping only, but in time the system could expand and learn more product categories.

Understanding Intentions (Primal Fusion)

Another company of interest is Primal Fusion whose new "thought networking" service is a semantic technology platform designed to help you research the subjects that interest you. Unfortunately, "thought networking" is a buzzword-sounding phrase that doesn't really convey what the system does. Primal Fusion essentially is an alternative to doing traditional web searches when you want to learn about a particular topic.

Once signed up for the Primal Fusion service, you enter in your topic in the search box provided and you'll see a tag cloud of words appear which are relevant to the word you initially searched on. You can either select those words by checking them or you can click on the individual words to further drill down into a more specific aspect of the original topic.

In the example they demonstrated today, a student researching climate change might see a tag cloud featuring words and phrases like "pollution," "co2," "greenhouse gases," etc. In addition, the service can also return relevant photos to your topic from sites like Flickr.

Initially, Primal Fusion searches Wikipedia to deliver the tag cloud, but once you have your specific interests checkmarked you can then change a drop-down box to search the web instead. This web searching is done courtesy of a Yahoo BOSS integration and it's here where Primal Fusion one-ups a normal search engine. Instead of just returning the top 5 or 10 results on the original keyword, it sifts through all the results found and returns only those relevant to your specific interests - even if those results would have been pages deep on a normal search query. Whatever Primal Fusion retrieves can then be extracted to a web page, document, or RSS feed. At the moment, Primal Fusion only extracts to web pages - files and feeds will come later. The web pages created by the service are public sites representing your research around a particular topic and are filled with links and images relevant to your query.

primal_fusion.png

Because Primal Fusion comes off as somewhat of a confusing mind-mapping tool, many folks will probably miss the point: Primal Fusion is infrastructure, not an application. The way it understands the relationships between words and phrases and how it can then extract the most relevant search results based on that understanding is what's most important about the company's technology.

Remember: This Is Only the Beginning

If you go out and try most of the services we've profiled in this series, you might walk away feeling a bit disappointed. You'll probably be thinking of all the things the service doesn't do but that you wished it could. Or perhaps you'll find the UI unappealing or the recommendations provided somewhat incomplete. However, It's important to understand that many of these services aren't ready for mainstream use just yet. Instead, they represent the beginnings of tomorrow's web - a web that better understands the data it contains. And by better understanding itself, the new intelligent web of the future can then better understand and serve you.

]]>Discuss]]>
http://www.readwriteweb.com/archives/demo_trend_the_smarter_web_part_2.php http://www.readwriteweb.com/archives/demo_trend_the_smarter_web_part_2.php Trends Tue, 03 Mar 2009 20:32:17 -0800 Sarah Perez
DEMO Trend: The Smarter Web Part One of a Two-Part Series

We're moving beyond the days of a simple search box in which you type a query and get a list of results. Today, companies are trying to build a smarter web - one that understands what things are, how they relate, and perhaps most importantly, what things you're going to like. But has Web 3.0 arrived in its full semantic glory? No, not yet. But it's clear we are getting closer than ever before.

]]>Sponsor

]]> The Recommended Web (Xmarks + StumbleUpon)

To begin, there's the seemingly minor announcement from Xmarks, the company formerly known as Foxmarks, but now rebranded thanks to their multi-browser support. Xmarks has introduced additional features to their bookmark synchronization product which include things like site suggestions and smarter search. By leveraging their large stash of data (600 million bookmarks), Xmarks is now able to recommend sites right within your search results. This is done by placing an Xmarks icon next to those results which are most popular, meaning most bookmarked, on their service. Also, when you visit a web site and click the Xmarks icon in your address bar, Xmarks will return a list of sites similar to the one you're currently browsing.

xmarks_smarter_search.gif

The data used to deliver these recommendations and suggestions are anonymized - a good thing considering that our browser bookmarks are often the ones we have specifically chosen not to share with others. For bookmarks to become recommended in this fashion, they must be fairly popular on the service - a level that's determined by the number of times saved as a percentage within a particular category.

In a way, what Xmarks is doing is very similar to what StumbleUpon's browser extension does too. Like Stumble, Xmarks annotates our search results highlighting those that may be of value to us. Yet Xmarks takes it a step further by discovering related sites, too.

The Smarter Tracking Tool (Evri)

Another company revealing new innovations here at DEMO 09 is Evri, a semantic search engine which understands what's called "natural language." Evri knows the different parts of a sentence (subject, verb, object) and it knows how those parts are connected to each other.

Although still too raw to be your main search engine, Evri has a new "Collections" feature which lets you follow topics (aka search queries) that are of interest to you. After returning a list of search results which include Wikipedia entries, news articles, videos, and images, you can click the star labeled "Follow this" to continue to track that topic. What's missing from this feature, though, is an alerting system which will inform you of updates via email or RSS. However, the company says that's coming later on.

Evri is also branching out from being a web destination alone by introducing Evri widgets which can now be seen in action on the Washington Post's web site. These widgets parse the content on the page to deliver smart recommendations of similar articles both on the site itself as well as elsewhere on the web. 

Another new feature launching now is Evri's browser toolbar. By clicking on a button next to the Evri search box in the toolbar, the people, places, and things on a web page are highlighted. Click on these items and pop-ups appear with more information about the keyword, what's related to the topic plus news, images, and videos.

evri_highlighting.png

This additional layer of information on top of standard text makes browsing the web and reading articles a deeper and richer experience. No longer do you need to perform web searches in a separate window to understand definitions, context, and meaning. Instead, Evri's toolbar adds an intelligence to the web that was never there before. It's clear that the company is still working towards making that additional layer more accurate and more relevant, though, but conceptually the idea is solid.

The RSS Reader That Learns (Ensembli)

Ensembli, an RSS reader of sorts, takes a different approach to tracking topics than Evri does with its "Collections" feature. Where Evri's UI can sometimes feel a bit cluttered with its multimedia results, Ensembli's interface is simple - you just type in a topic and it will continue searching for new articles related to what you entered. But this reader doesn't simply pull information for you - it learns what you like. Every time you read, ignore, or discard a story, Ensembli gets to know your tastes a  little bit better.

While this feed reader is far too simplified for RSS junkies like us, it's easy to see how Ensembli could be a good introductory tool for RSS beginners. Still, the sources it returns sometimes seem lacking and it's hard to say if this will ever be any more useful that a simple Google Alert, for example. Nevertheless, it's not really the feed reading itself which makes Ensembli intriguing, it's the learning element. Whatever algorithm is at work behind the scenes figuring out your likes and dislikes is what's the most important aspect of this new technology.

Getting Smarter...Little by Little

Taken by themselves, the above announcements may have seemed more evolutionary than revolutionary, but look at them within a broader scope and you can see a pattern beginning to develop. In this transitional period from Web 2.0 to Web 3.0, we're starting to see tools and services that aim to expand upon the traditional search experience in order to deliver us to a more intelligent web. On this new web, we're moving beyond SEO and PageRank to determine relevance and instead are seeing new technologies develop that better understand meaning, context, and personal preferences.

Stayed tuned...part 2 of "The Smarter Web" will continue tomorrow.

Image credit - dominiekth

]]>Discuss]]>
http://www.readwriteweb.com/archives/demo_trend_the_smarter_web.php http://www.readwriteweb.com/archives/demo_trend_the_smarter_web.php Trends Mon, 02 Mar 2009 20:47:12 -0800 Sarah Perez
Yahoo to Enable Custom Semantic Search Engines Yahoo is bringing together two of its most interesting projects today, Yahoo BOSS (Build Your Own Search Service) and SearchMonkey, its semantic indexing and search result enhancement service. There were a number of different parts of the announcement - but the core of the story is simple.

Developers will now be able to build their own search engines using the Yahoo! index and search processing infrastructure via BOSS and include the semantic markup added to pages in both results parsing and the display of those results. There's considerable potential here for some really dazzling results.

]]>Sponsor

]]> We wrote about the genesis of Search Monkey here this Spring, it's an incredibly ambitious project. The end result of it is rich search results, where additional dynamic data from marked up fields can also be displayed on the search results page itself. So searching for a movie will show not just web pages associated with that movie, but additional details from those pages, like movie ratings, stars, etc. There's all kinds of possibilities for all kinds of data.

Is anyone using Yahoo! BOSS yet? Anyone who will be able to leverage Search Monkey for a better experience right away? Yahoo is encouraging developers to tag their projects bossmashup in Delicious. As you can see for yourself, there are a number of interesting proofs of concept there but not a whole lot of products. Of the products that are there, very few seem terribly compelling to us so far.

We must admit that the most compelling BOSS implementation so far is over at the site of our competitors TechCrunch. Their new blog network search implementation of BOSS is beautiful - you can see easily, for example, that TechCrunch network blogs have used the word ReadWriteWeb 7 times in the last 6 months. (In case you were wondering.)

Speaking of TechCrunch, that site's Mark Hendrickson covered the Yahoo BOSS/Search Monkey announcement today as well, and having worked closely on the implementation there he's got an interesting perspective on it. He points out that the new pricing model, free up to 10,000 queries a day, will likely only impact a handful of big sites - not BOSS add-ons like TechCrunch search or smaller projects.

The other interesting part of the announcement is that BOSS developers will now be allowed to use 3rd party ads on their pages leveraging BOSS - not just Yahoo adds. That's hopeful.

Can Yahoo do it? Can these two projects brought together lead to awesome search mashups all over the web? We've had very high hopes in the past. Now the proof will be in the pudding.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_to_enable_custom_semantic_search.php http://www.readwriteweb.com/archives/yahoo_to_enable_custom_semantic_search.php News Wed, 11 Feb 2009 09:14:32 -0800 Marshall Kirkpatrick
Google: "We're Not Doing a Good Job with Structured Data" During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

]]>Sponsor

]]> Google's Deep Web Search

Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google's current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright's technology, said Esterkin, but "Google will have to solve these problems the hard way."

Also mentioned during the speech was how Google plans to organize "aspects" of search queries. The company wants to be able to separate exploratory queries (e.g., "Vietnam travel") from ones where a user is in search of a particular fact ("Vietnam population"). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. "Kosmix will give you an 'aspect,' but it's attached to an information source. In our case, all the aspects might be just Web search results, but we'd organize them differently."

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it's clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, "Google" has become synonymous with web search, just like "Kleenex" is a tissue, "Band-Aid" is an adhesive bandage, and "Xerox" is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it's far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

]]>Discuss]]>
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php Trends Mon, 02 Feb 2009 07:32:07 -0800 Sarah Perez
BooRah Now Selling Semantic Restuarant Review Report Cards How will the semantic web be monetized? How about in the form of monthly reports tracking restaurant reviews on Yelp, CitySearch and hundreds of other websites, for sale to restaurateurs for just $25 per month? That's what semweb startup BooRah is betting on with its new product, the BooRah Restaurant Reputation Report.

When we say that semantic technology has a whole lot of awesome potential, this is a fun example of what we're talking about. If it can be done for restaurants, we expect similar analysis of online sentiment can be sold for all kinds of different real-world sectors.

]]>Sponsor

]]> The idea is that BooRah tracks positive and negative reviews of food, service and ambiance at restaurants across hundreds of online review sites. The service monitors trends toward negative and positive reviews, pulls out key quotes from users and offers other value adds based on its technology.

Now restaurant owners can subscribe to receive a PDF of their monthly reports for an introductory price of $15 and a regular price of $25 per month. (Here's a sample report, in PDF format.)

Simple charts and a straightforward presentation can offer restaurant owners nervous about the Wild West of online opinion a bird's eye view of what's really going on, month by month. On the down side, the reports may enable those business owners to spot and track down negative reviewers to hassle them for the injustices they've no doubt done to a fine eatery.

boorahreport.jpg

Think many restaurants will go for it? That depends on how it's marketed, but we expect that today's coverage in the San Francisco Chronicle will help.

We first reviewed semantic and natural language processing review aggregation service BooRah this Spring and said we could foresee giving up Yelp for it. Then in December we called BooRah one of the Top 10 Semantic Web Products of 2008.

Now this latest offering has got us really excited; its simple utility and mainstream appeal are really compelling.

We love the idea of selling aggregate reports of online activity, intelligently analyzed, to mainstream businesses effected by online activity. Sales, marketing and PR firms have paid hefty sums for these kinds of reports, often clumsily gathered and presented, for years. Aim the semantic web at the problem, give it a good price point and offer it to a very large sector of businesses and we may just see some action in the semantic technology sector after all.

Update: Our original title for this story referenced Yelp, whom we mistakenly thought were included in BooRah's aggregation of reviews. Yelp contacted us to say that they are in fact not included. We hope that will change soon - it would only make both sites more useful.

]]>Discuss]]>
http://www.readwriteweb.com/archives/boorah_tracks_yelp_reviews.php http://www.readwriteweb.com/archives/boorah_tracks_yelp_reviews.php NYT Tue, 27 Jan 2009 10:59:21 -0800 Marshall Kirkpatrick
50+ Semantic Web Pros to Follow on Twitter Here at ReadWriteWeb, we find the Semantic Web fascinating. We write about it a lot. What is the semantic web? The way we explain it is that it's a paradigm advocating that the meaning of content on the web be made machine readable.

Why would you want to do that? Because once the "meaning" of text is automatically discernible, there's a whole new world of things we can do with content on the web. Far out things that full text search for the mere presence of keywords would never be able to accomplish. Who's working on the semantic web and how can you meet them? Read on.

]]>Sponsor

]]> In November, 2007 we published a list of 10 Semantic Web companies to watch. Then, one year later, we published a new list for 2008 of Semantic Web companies to watch.

Based on those lists, and reader suggestions in comments of other companies that should be watched, we present to you a list of 50+ Twitter users who work at Semantic Web companies. If you find this sector as interesting as we do, you might want to add some of these people to your microblogging community. You can click through the arrows in the iframe below to scroll through all the accounts and add the people listed. RSS readers who'd like to see the list should click through to the full post.

Mashery

A handful of these are company accounts, but most are accounts from individual employees. Want to suggest anyone we missed? (We know there are lots we've missed!) Let us know in comments. You can also meet the RWW crew on Twitter.

If this iFrame is driving you batty, see also this old list of links to all the accounts displayed below.

]]>Discuss]]>
http://www.readwriteweb.com/archives/50_semantic_web_pros_to_follow_on_twitter.php http://www.readwriteweb.com/archives/50_semantic_web_pros_to_follow_on_twitter.php Semantic Web Mon, 19 Jan 2009 18:48:45 -0800 Marshall Kirkpatrick