structured data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/structured data en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 18:04:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Beyond Social: Read/Write in The Era of Internet of Things This blog was founded in 2003 on the philosophy of a read/write Web - a Web in which people can create content as easily as they consume it. This trend eventually came to be known as Web 2.0 - although others preferred Social Web - and was popularized by activities like blogging and social networking.

It would be easy to say that the 'social' element is still the primary part of today's Web, since the popular products of this era enable you to say what's on your mind (Facebook), what's happening (Twitter), or where you are (Foursquare). All of these are mostly social activities. But more significantly, these and other products output data that will increasingly be used to build personalized services for you.

]]> The more data there is, the better Web services will be at delivering personal value to you. While part of this increase in data is coming from social data from the likes of Facebook and Twitter, much of it is coming from the Internet of Things and data uploaded by governments and organizations. In short: the read/write Web is now much more than the Social Web.

How We Went Beyond Social

So how did we arrive at a Web that is less about social and more about you?

It's not how much content you consume that is important, it's about what you do with data.

After the peak of Web 2.0, we (meaning all of us) began to get overwhelmed with the choice of content available. We thought we had to actually 'read' as much of that content as possible. So we watched YouTube, chatted on MySpace and Facebook, read blogs, followed lots of people on this new thing called Twitter, and so on. By the end of 2008, we were exhausted by all of this CONTENT. How could we possibly keep up?!

In 2010, we're still struggling to digest all of what social media throws at us. However, a shift has been happening since 2009 which alleviates the problem. We've begun to realize that it's not how much content we consume that is important: it's what we do with all of the social and other data available to us. The social is still important, but the resulting data is - slowly - becoming more important because it can be analyzed, filtered, mashed up and personalized.

Structured Data & Internet of Things

Two relatively new trends are driving this change.

If I was an entrepreneur or developer, I wouldn't be thinking about social anymore. I'd be thinking: How can I use all of this data and build on top of it?

The first is the increasing amount of data being uploaded to the Web by governments, organizations and people. Much of this data is being structured using Semantic Web technologies like RDFa or microformats. In other words, it is categorized and encoded with meaning that machines can process. Recent examples include U.S. and U.K. government data, Best Buy's store and product data and Facebook's Open Graph.

And then we have the Internet of Things: an evolving trend where real-world objects and 'things' are connected to the Internet via technologies such as sensors and RFID tags - everything from cars to houses to roads and more. The upshot is that the Web is about to experience a data explosion, as billions of sensors and other data input and output devices upload exabytes of new data to the Web.

How do We Use This Data?

If we add together social data from the likes of Facebook and Twitter, data from governments and businesses, and data from sensors and RFID, this is a huge amount of data. Most of it isn't for "consuming." Rather, the value of all of this new Web data will be in how it's filtered, mixed together ("mashed up") and personalized in new Web services - most of which haven't yet been built.

Adam Greenfield is one of the leading thinkers of the Internet of Things; I interviewed him earlier this year about his book called Everyware. Greenfield recently wrote a post describing a near future scenario for non-technical people using the Web. He posited a use case where his mother would be able to plan a train trip to see her son, by creating an "ad-hoc service" that tapped into the Web and utilized real-time data sources.

In 2010, his mother would have to find and "read" several different applications in order to plot her travel schedule, and some of that information isn't even currently on the Web. Greenfield envisions a near future where his mother can essentially "write" her requirements into her mobile or other device, and the Web will deliver a personalized schedule to "read." You can view a diagram of Adam's concept here (PDF).

Don't Think Social, Think Data

Successful products in the Web 2.0 era had a strong social element: YouTube, MySpace and Flickr were a few relatively early examples. In the current era of the Web, which began to form in early 2009, the focus has shifted from social to data-driven software. Successful products of this era of the Web will be ones that filter, structure and personalize this vast amount of data coming onto the Web.

So if I was an entrepreneur or developer wondering what to build for this era of the Web, I wouldn't be thinking social. I'd be thinking: How can I use all of this data and build on top of it? There are incredible opportunities out there for you.

This current era of the Web doesn't have a name, which is probably a good sign! One thing is for sure though: It's still a read/write Web - only now you're reading and writing data from much more than just social services. You're increasingly interacting with "things," organizations, governments - virtually anything that can connect to the Web.

]]> Discuss]]>
http://www.readwriteweb.com/archives/beyond_social_web_internet_of_things.php http://www.readwriteweb.com/archives/beyond_social_web_internet_of_things.php Internet of Things Mon, 19 Jul 2010 02:39:39 -0800 Richard MacManus
Extractiv Launches "Semantics as a Service" Platform Extractiv has quietly launched a service that crawls the Web for text on a specific topic, then transforms it into "structured semantic data." It's a direct competitor to Thomson Reuters' Calais product, which has been doing this for a couple of years now. This type of service is potentially valuable to media companies, search services and monitoring applications - because it turns messy, unorganized HTML content into data that is organized into categories and given other semantic 'meaning.'

I sat down with Extractiv CEO Shion Deysarkar at the recent Semantic Technology conference in San Francisco, to find out how Extractiv intends to compete with the more well-known and big media backed Calais.

]]> How Extractiv Works

Extractiv is a joint venture between Houston-based web crawling service 80legs and natural language processing company LCC (which created Swingly, a Q&A service).

Deysarkar explained that Extractiv uses technology from both of its parent companies, to crawl the Web for content on a particular topic and then - using natural language processing - transform it into structured data. This video, produced by Extractiv, explains how the service might be used to crawl the Web for stories about smart phones over the past month.

The output of the crawl and analysis can be JSON or XML, two formats commonly used for structured data. Support for RDFa, a popular Semantic Web standard, will be available "soon" according to the company. Extractive also offers an API, allowing customers to bypass the web site.

Extractiv is free to try, but if you'll be a moderate or heavy user of the service then you'll have to pay (the pricing is as yet unavailable on the web site).

Extractiv vs Calais

Deysarkar told ReadWriteWeb that Extractiv is targeting "mid-market Calais customers" - such as media companies or those developing search applications, monitoring services, recommendation engines or aggregators. He also claimed that Extractiv goes beyond what Calais offers, because it can mine sentiment data (which is data about how people feel about products and services).

Extractiv also wants to "provide access to more types of semantic information than any other provider." As CEO of partner company LCC, Andrew Hickl, put it, "if you're interested in baseball pitchers, a generic type like PERSON just won't cut it."

At launch, Extractiv offers about 250 different types of named entities, but it aims to have more than 3000 different entity types by the end of the U.S. summer.

Preparing For the Future of the Web

The product is not aimed at the consumer market, so it's not for the faint hearted and you need to know what to do with all of that XML or JSON data! It also remains to be seen how competitive it is with Calais, which is a proven performer and has many reputable companies as its customers. Some startups have taken on Calais before, but fallen short.

However, there is undoubtedly a need for products like Extractiv and Calais that turn the Web's unstructured data into meaningful, organized content. This is the future of the Web, because there is going to be a large increase in the quantity of data online over the next 5-10 years - and all of that data will need to be structured if we're going to be make the best use of it.

]]> Discuss]]>
http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php http://www.readwriteweb.com/archives/extractiv_launches_semantics_as_a_service_platform.php Structured Data Mon, 12 Jul 2010 01:58:13 -0800 Richard MacManus
How Twitter Annotations Could Bring the Real-Time and Semantic Web Together twitter_pillow_jun10.jpgJust because the new iPhone arrived in stores today doesn't mean the rest of the technology world shut down. In fact, today in San Francisco the 2010 Semantic Technology Conference continued its week-long series of talks and sessions about the semantic Web - the ability to understand and intelligently interpret content from the Web. A fascinating example of how the semantic Web is colliding with the real-time Web is through Twitter and the impending release of annotations - and Ph.D student Joshua Shinavier provided some fascinating semantic scenarios for their use.

]]> Twitter posts already contain plenty of metadata that allows for smart filtering and organization, including date and location. With annotations, however, the metadata possibilities will be literally endless. Tweet metadata could eventually contain information or links based on words or phrases in the tweet itself, other options added to the tweet, or even other external data like the weather in the senders location at the time it was sent. Imagine being able to add an infinite number of hashtags to a post without wasting precious characters.

As Shinavier points out in his presentation (see slides above), Semantic databases could then plug into the annotation metadata and provide real-time semantic information to those who seek it. Using existing databases like GeoNames, Linked Movie Database and FOAF (Friend of a Friend), very specific searches for genres of tweets can be collected. Searchers could ask for tweets about "places in developing countries," "English-language movies starring Chinese actors," or "songs by artists my friends like," says Shinavier.

semantic_firehose_jun10.jpgShinavier likens annotations to the real-time version of attributes from RDF (Resource Description Framework), which provide websites with extended semantic metadata. Since Twitter's annotations will be easy to implement for developers, the sheer size of the network of use will create the "long tail" of real-time semantic data, he says. The application of the semantic Web to annotations will make it easier for developers to create richer applications, which benefits the end user.

In basic terms, the Web is getting smarter. Not Skynet smart, but smart, and with the mashup of the real-time fire-hose of information coming from services like Twitter, the semantic Web can provide even deeper and richer interactions for users. Personally, I am highly anticipating the release of annotations because I know brilliant developers are going to create amazing applications that leverage metadata. Throwing in semantic recognition only sweetens the pot.

Image from Flickr user Colectivo Mambembe.

]]> Discuss]]>
http://www.readwriteweb.com/archives/how_twitter_annotations_could_bring_the_real-time_semantic_web_together.php http://www.readwriteweb.com/archives/how_twitter_annotations_could_bring_the_real-time_semantic_web_together.php Real-Time Web Thu, 24 Jun 2010 18:40:00 -0800 Chris Cameron
David Siegel: From Killer Web Sites to Semantic Web One of the first web design books I bought was Creating Killer Web Sites, a 90s classic by David Siegel. That book was known for pushing visual style over HTML standards. It also encouraged the use of HTML hacks, for example using tables to create layouts. Siegel's techniques were basically workarounds, but they just worked in an era when building web pages was painful due to browser incompatibilities.

In Siegel's latest book, Pull, he tackles the Semantic Web. Once again, Siegel plays loosely with existing web standards.

]]> Siegel's definition of 'Semantic Web' is much broader than that of many technologists. So, just as many Web standards advocates derided Siegel's version of web design back in the 90s, will they also cry foul of his version of the Semantic Web?

Pull is being positioned as a business guide to the emerging Semantic Web. It has similarities to Creating Killer Web Sites, which caught the wave of an emerging big trend of the mid-90s (web site design) and became a bestseller. Siegel is attempting to catch a second big online wave, with the Semantic Web in 2010.

Siegel explains the title in the introduction:

"This book describes the pull era, where customers pull everything to them on demand - products, services, information, knowledge, and advice. Much of the foundation for pulling is called the semantic web, a new way of packaging information to make it much more useful and reusable. Over the next ten to twenty years, it will change business from a lead-push model to a pull-follow model of interacting with customers."

It's hard to argue against the vision that the book outlines. However for many Semantic Web proponents, the foundational technologies are Resource Description Framework (RDF), Web Ontology Language (OWL), and Extensible Markup Language (XML). These standards allow web publishers to encode meaning - semantics - into their sites.

David Siegel's definition of Semantic Web is far broader. On the book's accompanying website, The Power of Pull, there is a "Semantic Web Acid Test." It defines a semantic web business as one that has an "unambiguous" structure for its data. The book states that "some technologists feel that semantic web data must be expressed using a language called RDF," but Siegel disagrees. Instead, he believes that "simple, unambiguous formats are part of the semantic web."

The book is ultimately about how structured data will change how we do business. Frankly, the use of the term 'Semantic Web' in this book feels forced. Even so, I think it's a very useful book and offers detailed scenarios of how structured data will improve business. For example, chapter 4 is about retailers and outlines the benefits of RFID tags in retail - including describing a visit Siegel made to forward-thinking German retailer Metro Group.

Overall Pull is a solid and well-researched book. It's a good introduction for business people to structured data and the Semantic Web.

My one issue with the book is that Siegel's appropriation of the term 'Semantic Web' leaves me feeling a little uneasy. On the home page of his personal website is a blog post (entitled 'Why I Should be Apple's Next CEO'), in which Siegel claims that he "started talking about the Semantic Web in 1998, before Tim Berners-Lee coined the term." Whether that's true or not, it does beg the question: is Siegel's definition of the Semantic Web the same as Tim Berners-Lee's?

Disclosure: David Siegel posted me a copy of his new book.

]]> Discuss]]>
http://www.readwriteweb.com/archives/david_siegel_pull_semantic_web.php http://www.readwriteweb.com/archives/david_siegel_pull_semantic_web.php Analysis Tue, 30 Mar 2010 02:33:43 -0800 Richard MacManus
Why Google and Other Humans Don't Read Your Book Reviews bookreview_tag_0210.jpgThe book and media industries are going through interesting times, to put it mildly. As physical books prepare for their demise, the confusion around pricing of digital ones grows. Yet, whether physical or digital, to sell books you need marketing. People need to hear about a book before they buy it.

This is where the book review come in. Every publicist and publisher's dream is to land a positive review with an authoritative source. A good review in The New York Times or the L.A. Times used to be a pass to big figure sales. Sounds like it still should be, but it is not, because most book reviews are poorly formatted and cannot be recognized by Google and other software.

]]> The Book Review That Nobody Saw

Lets take a look at this edgy review of the Manhood, by the L.A. Times. It is a pure joy to read - it is elegant, clever and gets to the heart of the issue. There is only one problem with it - nobody is going to read it, because Google can't find it.

bookreview_tag5.png

Try running this very specific Google search - "Manhood" by Mels van Driel review - and you will not find the L.A. Times among the results - at least not within first three pages that humans would care to flip through. How come might you ask? Well the answer is simple - there is nothing whatsoever that tells Google that this post is a book review about this particular book.

And this is not just an isolated problem with this book review from this particular newspaper. The issue is widespread across all major U.S. and international media outlets. Either due to lack of tools or lack of understanding how search engines and other software works, people notoriously don't make their content discoverable.

A Simple Way to Please Google

So how should be the book reviews tagged?

To start with, the title needs to make it clear, that this is a book review. Of course humans may find a more subtle title more enticing, but for the sake of machine: Book Review: Manhood by Mels has to be present. It would be even better to mark up that this is a book review, and here is the book title and here is the author.

Next, the post needs to be adorned with the right tags and keywords. L.A. Times' reviews are certainly very clever, but again, Google does not get humor. A better tag would the title of the book, the name of the author and the non-conspicuous phrase "book review".

A Better Way to Please Google and Tim Berners-Lee

The tagging system described above is still error prone. A computer might not interpret it correctly and would miss this post in the search results. This is because that kind of description is not structured. Humans enjoy a wonderful ability to deal with fuzzy things; computers simply can't do it.

bookreview_tag2.png

For a computer to understand content, it needs to be described using a markup language. This is a broad and complex topic that has been a focus of the so-called Semantic Web and structured data.

The right way of marking up content so that it can be understood by Google, other search engines and semantic technologies is by using a structured format such as ePub, hReview Microformat, abmeta or one of the other structured formats. Using a structured format removes the ambiguity and enables computer to "know" what the review is about.

Making the content discoverable by Google in turn makes it discoverable by humans.

Tagging: It's All About the Money

Could it just be that book reviewers in major newspapers would get more page views if they did a better job tagging content? And then in turn, could it also be that if more people discovered clever and elegant reviews then more books would be sold? Even if you don't think so, there is way too much risk of getting this one wrong.

Doing appropriate, standard tagging and markup of book reviews is cheap and simple and should be part of the daily publishing routine. Each media company needs to invest in standards and guidelines around content markup. This is not just a matter of being good citizen of the Web, it is a matter of making money.

Photo credit: Ivan Petrov]]> Discuss]]> http://www.readwriteweb.com/archives/why_google_and_other_humans_dont_read_your_book_reviews.php http://www.readwriteweb.com/archives/why_google_and_other_humans_dont_read_your_book_reviews.php Semantic Web Sun, 14 Feb 2010 18:00:00 -0800 Alex Iskold ReadWriteWeb's Top 5 Web Trends of 2009 Over the last week we ran a series of posts outlining the five biggest Internet trends of this year: Structured Data, Real-Time Web, Personalization, Mobile Web / Augmented Reality, Internet of Things. Effectively this was ReadWriteWeb's State of the Web 2009.

We've now compiled the main points into a single presentation, available on Slideshare and embedded below. You can view the presentation in full screen by clicking the "full" button at the bottom of the presentation. You can also download the presentation as a Powerpoint file. All of the links in the presentation are clickable, should you wish to explore a certain topic more.

]]>

Editor's note: This story is part of a series we call Redux, where we'll re-publish some of our best posts of 2009. As we look back at the year - and ahead to what next year holds - we think these are the stories that deserve a second glance. It's not just a best-of list, it's also a collection of posts that examine the fundamental issues that continue to shape the Web. We hope you enjoy reading them again and we look forward to bringing you more Web products and trends analysis in 2010. Happy holidays from Team ReadWriteWeb!

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/readwritewebs_top_5_web_trends_of_2009.php http://www.readwriteweb.com/archives/readwritewebs_top_5_web_trends_of_2009.php 2009 Redux Thu, 31 Dec 2009 14:00:00 -0800 Richard MacManus
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the five biggest, most cutting-edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]>

Editor's note: This story is part of a series we call Redux, where we'll re-publish some of our best posts of 2009. As we look back at the year - and ahead to what next year holds - we think these are the stories that deserve a second glance. It's not just a best-of list, it's also a collection of posts that examine the fundamental issues that continue to shape the Web. We hope you enjoy reading them again and we look forward to bringing you more Web products and trends analysis in 2010. Happy holidays from Team ReadWriteWeb!

Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data_1.php 2009 Redux Sat, 26 Dec 2009 14:00:00 -0800 Richard MacManus
Factual Makes Publishing Open Data Easy factual_logo_oct09.pngFactual, a new open data project founded by Gilad Elbaz, just launched its public beta today. Elbaz's last company, Applied Semantics, was acquired by Google in 2003 and became one of the core components of the search giant's AdSense contextual advertising product. Factual, which is mostly geared towards developers, is somewhat similar to Freebase, though Factual allows for a more free-form approach to building a database than Freebase. Factual provides users and developers with tools to create, contribute and mash up open data on any subject.

]]> Factual also announced that Esther Dyson has joined the company's board of advisors.

For now, Factual obviously only offers a relatively small repository of databases, though the company's current focus is on getting more developers to use its service and on bringing as much data as possible into the system.

Getting Data into Factual.

To enter data, users could obviously tediously enter the data field by field, or upload spreadsheets in most of the standard formats. The service also provides a number of easier ways to import data. You can, for example, give Factual a URL of any website or Wikipedia page that includes tables and the service will automatically create a new table based on this data. We tried this with tables from a number of sites and it generally worked well and only required a few edits. For advanced users, Factual also includes a number of more advanced extraction tools.

Once the data is available on Factual, developers can obviously use the API to read, write and mash this data up in any form they like. Users can also edit tables directly on the site or through an embedded table. In addition, users can mash up and combine existing tables.

Currently, Factual only offers one relatively basic embeddable widget that can only display the table without any graphical embellishments. The company plans to rely on developers to create other ways to access and display the data available on the service.

Not a Wiki

While Factual allows any user to make changes to the database, Factual's model is slightly different from the standard wiki approach where only the last edit is generally visible to the public. Changes made to a fact in a Factual database are more like votes for a certain entry. If three users or data sources say a restaurant doesn't offer vegetarian food, for example, and one user says it does, then the table will display the fact that the majority of users entered. Factual, however, will also display a question mark next to this disputed entry. Users can click on this question mark to see all the editors and data sources.

Factual will obviously try to weed out spam here as well, though given how new the service is, it's hard to evaluate how effective Factual's spam filters are.

License

Users who enter data into a Factual database do not automatically give up their copyright - though given that Factual focuses on facts, which typically can't be copyrighted anyway, this shouldn't be too much of a problem. Users can, however, choose an open license for their work, which might be necessary if the table they used to seed their database was licensed under a Creative Commons license, for example. Factual's FAQ explains this issue in greater detail.

Would You Use an Open Data Service?

With regards to the question of why businesses would open up their data, Gilad Elbaz told us yesterday that he believes open data could eventually go the way of open source, which also had a hard time to get acceptance among businesses. While open source software is a tool that a lot of companies now use, data is usually what is at the heart of a company's products and it remains to be seen how many companies would really want to put their data into an open database. For now, we mostly expect non-profits and government organizations to make use of this service.

]]> Discuss]]>
http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php News Tue, 13 Oct 2009 05:00:00 -0800 Frederic Lardinois
Google Squared Gets Some Much Needed Improvements GoogleSquaredLogo.jpgGoogle Squared launched to a lot of hype earlier this year, but the initial reaction from most pundits was rather negative. Squared, which gathers and displays structured data, often returned rather nonsensical results, and we would venture to guess that only a few people are actually using it now. Today, Google announced some updates to Squared that should make it more useful. Now, if you do a search on Squared, for example, the results will contain up to 120 facts - up from 30 in the initial release.

]]> As Google points out, a search for US presidents, for example, initially returned a table with only five presidents and three categories. Now, however, this table includes data on 20 presidents and lists up to six attributes. Squared also now gives users the option to sort columns - a feature that was sorely lacking in the first iteration of this product.

squared_improved_oct09.png

Squared is now more selective about the data it includes. And it also learns from edits and corrections that users make.

New: Export Data to Google Spreadsheets and CSV Files

In addition, Google gives users the option to export data to a Google Spreadsheet or a CSV file. This should make it a lot easier to actually do something interesting with this data. As an example, Google explains how to build a list of African countries and then create a scatter plot that examines the relationship between GDP and literacy rate in these countries.

Will You Give it a Second Try?

Overall, the data that Google Squared now returns does indeed look more accurate than in earlier versions, though some results are still rather strange (to be fair, this is still a Google Labs product). We do wonder how useful a service like this really is. Are you likely to head over to Google Squared for research? Would you trust its results?

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_squared_gets_some_much_needed_improvements.php http://www.readwriteweb.com/archives/google_squared_gets_some_much_needed_improvements.php Google Fri, 09 Oct 2009 11:43:07 -0800 Frederic Lardinois
ReadWriteWeb's Top 5 Web Trends of 2009 Last week we ran a series of posts outlining the 5 biggest Internet trends of this year: Structured Data, Real-Time Web, Personalization, Mobile Web / Augmented Reality, Internet of Things. Effectively this was ReadWriteWeb's State of the Web 2009.

We've now compiled the main points into a single presentation, available on Slideshare and embedded below. You can view the presentation in full screen by clicking the "full" button at the bottom of the presentation. You can also download the presentation as a Powerpoint file. All of the links in the presentation are clickable, should you wish to explore a certain topic more.

]]>

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009.php Trends Mon, 14 Sep 2009 22:10:00 -0800 Richard MacManus
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the 5 biggest, most cutting edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]> Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]> Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php Trends Mon, 07 Sep 2009 05:30:00 -0800 Richard MacManus
Everything You Wanted to Know About Semantic Technology, But Were Afraid to Ask (at SemTech 09) Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.

Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.

]]>

Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:

Structured Data

Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.

Morphology

If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax

A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.

Statistics

An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.

Scalability

Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.

These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.

]]> Discuss]]>
http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php Sponsors Fri, 26 Jun 2009 05:00:18 -0800 RWW Sponsor
Google Squared is Live: Who Knew Structured Data Could Be So Unhelpful? GoogleSquaredLogo.jpgThree weeks ago Google demonstrated a new product in Labs called Google Squared; it's a search engine that creates structured data from big piles of information and lets users compare various things by their attributes. There have been suggestions that Google Squared will crush Wolfram Alpha. Well, Google Squared went live today and while it's a great idea, in reality the service doesn't look very useful. It doesn't look like it's going to crush anyone.

The user interface is inflexible, the data is odd looking and it's hard to imagine using Squared regularly. It's a great idea but we'll see where it goes.

]]> Check out this example below, a Square for the search "dog breeds." It's cool that you can add major or minor medical concerns to the list of columns, but the selection of examples is really strange. The Labrador Retriever (surely the most common dog in this country) doesn't appear until you click through the #47 on the list and German Shepherds aren't in the top 50. Call it structured data if you like, I call it a surefire recipe for making a bad dog buying decision.

squareddogs.jpg

All the other queries we tried were similarly "almost helpful." The dog breed example is actually unusually good. Sorting by a particular column isn't possible, when I define a content type you don't get to see it unless I share it with you, and the user experience is an off mix of intriguing and maddening. The description fields would benefit from borrowing the first few lines of a Wikipedia article on a topic.

It is very impressive that when you request a square for a concept Google is unfamiliar with, you're prompted to offer up to five examples and then it goes out and builds the data set for you! Unfortunately, when I tried to explain to Squared who some examples of "tech bloggers" were it brought back a terrible picture of me and said that CNet's Caroline McCarthy is sixty four years old. I'm pretty sure that's not true.

We're as excited as anyone about the future of creating structured data from the sea of information online, but Google Squared isn't very inspiring so far. We've been looking forward to it since interviewing Marissa Mayer, VP of Search Products and User Experience at Google, about Squared. When the day comes that you can slap a .xml or .csv to the end of one of these Squared URLs and pull out data programatically, that will be impressive.

Here's our review of Wolfram Alpha, which we said was likely to be a good service for engineers but not for anyone else. Hopefully it's still early days for all of these kinds of tools.

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_squared_is_live_who_knew_structured_data_co.php http://www.readwriteweb.com/archives/google_squared_is_live_who_knew_structured_data_co.php Data Services Wed, 03 Jun 2009 12:29:32 -0800 Marshall Kirkpatrick
Google: "We're Not Doing a Good Job with Structured Data" During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

]]> Google's Deep Web Search

Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google's current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright's technology, said Esterkin, but "Google will have to solve these problems the hard way."

Also mentioned during the speech was how Google plans to organize "aspects" of search queries. The company wants to be able to separate exploratory queries (e.g., "Vietnam travel") from ones where a user is in search of a particular fact ("Vietnam population"). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. "Kosmix will give you an 'aspect,' but it's attached to an information source. In our case, all the aspects might be just Web search results, but we'd organize them differently."

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it's clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, "Google" has become synonymous with web search, just like "Kleenex" is a tissue, "Band-Aid" is an adhesive bandage, and "Xerox" is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it's far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php Trends Mon, 02 Feb 2009 07:32:07 -0800 Sarah Perez
Semantic Tagging with Faviki Faviki is a new social bookmarking tool that offers something that services like Ma.gnolia, del.icio.us, and Diigo do not - semantic tagging capabilities. What this means is that instead of having users haphazardly entering in tags to describe the links they save, Faviki will suggest tags to be used instead. However, unlike other services, Faviki's suggestions don't just come from a community of users and their tagging history, but from structured information extracted straight out of the Wikipedia database.

]]> About Faviki

Faviki's backend uses DBpedia, a community-maintained database created by extracting structured info from Wikipedia and turning that into a database which you can query. (You can read our previous coverage on DBpedia here).

This means that instead of just being words, the tags in this data model become references to objects which are categorized automatically. An example from the Faviki blog cited an example using the tag "Coca-Cola." An item you tagged with this concept would actually reference the unique URL http://dbpedia.org/data/Coca-Cola (the tag is the last part of that URL). Under other tagging systems, the same item may have been tagged with cocacola, coca-cola, coca+cola, CocaCola, but in Faviki, it's simply "Coca-Cola." And because the tags structure is already emanating from the largest collection of concepts in the world - Wikipedia - their format is already standardized and agreed upon by the community.

Using Faviki

Despite Faviki's lofty goals, it's just as easy to use as any other bookmarking service. Once you sign up, you can install a browser bookmarklet which you can use to save links and tag them. You can also search your tags or click through the site's tag cloud to view some of the most popular saved links from the Faviki community.

A Search on Faviki

Unfortunately, there is no way to import your bookmark collection from another service. This is probably because doing so would necessitate completely re-tagging every link-  that would certainly require too much effort on the part of a user if it was a manual process and I imagine it's also difficult to create a service that would automatically scan each link and tag it appropriately. However, without this option, it will be hard to get users to completely switch over from whatever service they are using now.

What Problem Faviki Solves

Because Faviki uses structured tagging, there is more that can be learned about a particular tag, its properties, and its connections to other tags. The system will automatically know what tags belong together and how they relate to others.

There has been a lot of discussion around this topic lately. At the recent Next Web conference in Amsterdam, Nova Spivack, the founder of Twine, predicted that over the next 10-15 years, tags will play an increasingly important role in the structure of the web, while keywords disappear.

If that turns out to be true, then Faviki represents a big step in that direction by offering a transitional service between social bookmarking and a purely semantic-based bookmarking service that would automatically know how to tag any content saved by discovering the semantic aspects already associated with that web page.

]]> Discuss]]>
http://www.readwriteweb.com/archives/semantic_tagging_with_faviki.php http://www.readwriteweb.com/archives/semantic_tagging_with_faviki.php Product Reviews Mon, 26 May 2008 10:33:12 -0800 Sarah Perez