structured data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/structured data en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sun, 22 Nov 2009 19:36:29 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Factual Makes Publishing Open Data Easy factual_logo_oct09.pngFactual, a new open data project founded by Gilad Elbaz, just launched its public beta today. Elbaz's last company, Applied Semantics, was acquired by Google in 2003 and became one of the core components of the search giant's AdSense contextual advertising product. Factual, which is mostly geared towards developers, is somewhat similar to Freebase, though Factual allows for a more free-form approach to building a database than Freebase. Factual provides users and developers with tools to create, contribute and mash up open data on any subject.

]]>Sponsor

]]> Factual also announced that Esther Dyson has joined the company's board of advisors.

For now, Factual obviously only offers a relatively small repository of databases, though the company's current focus is on getting more developers to use its service and on bringing as much data as possible into the system.

Getting Data into Factual.

To enter data, users could obviously tediously enter the data field by field, or upload spreadsheets in most of the standard formats. The service also provides a number of easier ways to import data. You can, for example, give Factual a URL of any website or Wikipedia page that includes tables and the service will automatically create a new table based on this data. We tried this with tables from a number of sites and it generally worked well and only required a few edits. For advanced users, Factual also includes a number of more advanced extraction tools.

Once the data is available on Factual, developers can obviously use the API to read, write and mash this data up in any form they like. Users can also edit tables directly on the site or through an embedded table. In addition, users can mash up and combine existing tables.

Currently, Factual only offers one relatively basic embeddable widget that can only display the table without any graphical embellishments. The company plans to rely on developers to create other ways to access and display the data available on the service.

Not a Wiki

While Factual allows any user to make changes to the database, Factual's model is slightly different from the standard wiki approach where only the last edit is generally visible to the public. Changes made to a fact in a Factual database are more like votes for a certain entry. If three users or data sources say a restaurant doesn't offer vegetarian food, for example, and one user says it does, then the table will display the fact that the majority of users entered. Factual, however, will also display a question mark next to this disputed entry. Users can click on this question mark to see all the editors and data sources.

Factual will obviously try to weed out spam here as well, though given how new the service is, it's hard to evaluate how effective Factual's spam filters are.

License

Users who enter data into a Factual database do not automatically give up their copyright - though given that Factual focuses on facts, which typically can't be copyrighted anyway, this shouldn't be too much of a problem. Users can, however, choose an open license for their work, which might be necessary if the table they used to seed their database was licensed under a Creative Commons license, for example. Factual's FAQ explains this issue in greater detail.

Would You Use an Open Data Service?

With regards to the question of why businesses would open up their data, Gilad Elbaz told us yesterday that he believes open data could eventually go the way of open source, which also had a hard time to get acceptance among businesses. While open source software is a tool that a lot of companies now use, data is usually what is at the heart of a company's products and it remains to be seen how many companies would really want to put their data into an open database. For now, we mostly expect non-profits and government organizations to make use of this service.

]]>Discuss]]>
http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php News Tue, 13 Oct 2009 05:00:00 -0800 Frederic Lardinois
Google Squared Gets Some Much Needed Improvements GoogleSquaredLogo.jpgGoogle Squared launched to a lot of hype earlier this year, but the initial reaction from most pundits was rather negative. Squared, which gathers and displays structured data, often returned rather nonsensical results, and we would venture to guess that only a few people are actually using it now. Today, Google announced some updates to Squared that should make it more useful. Now, if you do a search on Squared, for example, the results will contain up to 120 facts - up from 30 in the initial release.

]]>Sponsor

]]> As Google points out, a search for US presidents, for example, initially returned a table with only five presidents and three categories. Now, however, this table includes data on 20 presidents and lists up to six attributes. Squared also now gives users the option to sort columns - a feature that was sorely lacking in the first iteration of this product.

squared_improved_oct09.png

Squared is now more selective about the data it includes. And it also learns from edits and corrections that users make.

New: Export Data to Google Spreadsheets and CSV Files

In addition, Google gives users the option to export data to a Google Spreadsheet or a CSV file. This should make it a lot easier to actually do something interesting with this data. As an example, Google explains how to build a list of African countries and then create a scatter plot that examines the relationship between GDP and literacy rate in these countries.

Will You Give it a Second Try?

Overall, the data that Google Squared now returns does indeed look more accurate than in earlier versions, though some results are still rather strange (to be fair, this is still a Google Labs product). We do wonder how useful a service like this really is. Are you likely to head over to Google Squared for research? Would you trust its results?

]]>Discuss]]>
http://www.readwriteweb.com/archives/google_squared_gets_some_much_needed_improvements.php http://www.readwriteweb.com/archives/google_squared_gets_some_much_needed_improvements.php Google Fri, 09 Oct 2009 11:43:07 -0800 Frederic Lardinois
ReadWriteWeb's Top 5 Web Trends of 2009 Last week we ran a series of posts outlining the 5 biggest Internet trends of this year: Structured Data, Real-Time Web, Personalization, Mobile Web / Augmented Reality, Internet of Things. Effectively this was ReadWriteWeb's State of the Web 2009.

We've now compiled the main points into a single presentation, available on Slideshare and embedded below. You can view the presentation in full screen by clicking the "full" button at the bottom of the presentation. You can also download the presentation as a Powerpoint file. All of the links in the presentation are clickable, should you wish to explore a certain topic more.

]]>Sponsor

]]>

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]>Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009.php Trends Mon, 14 Sep 2009 22:10:00 -0800 Richard MacManus
Top 5 Web Trends of 2009: Structured Data This week ReadWriteWeb will run a series of posts detailing what we think are the 5 biggest, most cutting edge Web trends to come out of 2009. We'll be posting one trend analysis per day. Then at the end of the week we'll publish a major update to our standard presentation about web technology trends.

The first major Web trend we're looking at is Structured Data. In prior presentations, this has sometimes been referred to under the umbrella term of 'Semantic Web'. However the way 2009 has panned out so far, it's become clear that this trend is much more than the Semantic Web. In this post, we'll analyze the developments in Structured Data this year and provide you with 3 product examples: OpenCalais, Google, Wolfram Alpha.

]]>Sponsor

]]> Web of Data, Not Documents

Tim Berners-Lee said in February this year that we're now in a Web of Data, rather than a Web of Documents. The organization that Berners-Lee heads, the W3C, has heavily promoted two key initiatives that are helping to build this Web of Data: the Semantic Web and more recently Linked Data.

However over the past few years, we've seen that there are many other ways to structure data and enable others to build off it. The best current example is surely Twitter, whose API has historically been responsible for around 90% of Twitter's activity - via third party apps.

The basic principle of the Web of Data is still the same as what Alex Iskold articulated on ReadWriteWeb back in March 2007: "unstructured information will give way to structured information - paving the road to more intelligent computing."

Example 1: OpenCalais

Our first example product, OpenCalais, is probably the best current example of Linked Data (which is a type of structured data endorsed by W3C). Thomson Reuters, the international business and financial news giant, launched an API called OpenCalais in Feb '08. In a nutshell, OpenCalais turns unstructured HTML into semantically marked up data. It orders data into groups such as 'people,' 'places,' 'companies' and more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

For a full explanation of Linked Data, read Alexander Korth's technical introduction The Web of Data: Creating Machine-Accessible Information from April 2009. I also explained the background and benefits of Linked Data in a May '09 post entitled Linked Data is Blooming: Why You Should Care.

Example 2: Google Rich Snippets

In May this year, Google added structured data to its core search, in the form of a feature called 'Rich snippets.' Essentially this feature extracts and shows useful information from web pages, by way of structured data open standards such as microformats and RDFa. On launch in May, Google invited publishers to mark up their HTML. While it will take a while for this markup to become widespread, the fact that a huge company like Google implemented it shows the increasing importance of structured data on the Web.

Other big companies are also heading in this direction - in particular, Yahoo was an early leader.

Example 3: Wolfram Alpha

Ever since Wolfram|Alpha's much hyped launch in May, we've been tracking this innovative product closely. It's a self-described "computational knowledge engine" and while it's not quite the Google killer some predicted, it has many potential uses.

Wolfram|Alpha has a search engine-like interface, allowing you to type natural language statements into it. But the main part of the product is the computations you can do on data. The product is premised on using and computing data. If Web 2.0 was about creating data (a.k.a. user generated content), then the next generation of the Web is all about using that data.

Conclusion

We can see from the above three examples that structured data is rapidly becoming a feature of today's Web. Companies like Thomson Reuters and Google are enabling data to be structured, and new types of products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now.

ReadWriteWeb's Top 5 Web Trends of 2009:

  1. Structured Data
  2. The Real-Time Web
  3. Personalization
  4. Mobile Web & Augmented Reality
  5. Internet of Things
]]>Discuss]]>
http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php http://www.readwriteweb.com/archives/top_5_web_trends_of_2009_structured_data.php Trends Mon, 07 Sep 2009 05:30:00 -0800 Richard MacManus
Everything You Wanted to Know About Semantic Technology, But Were Afraid to Ask (at SemTech 09) Editor's note: we offer our long-term sponsors the opportunity to write 'Sponsor Posts' and tell their story. These posts are clearly marked as written by sponsors, but we also want them to be useful and interesting to our readers. We hope you like the posts and we encourage you to support our sponsors by trying out their products. This one is by Hakia, one of the participants in the recent 2009 Semantic Technology Conference.

Participants in the 2009 Semantic Technology Conference walked away considering fundamental questions about what is and isn't semantic technology. The relevance of this post's title will hopefully become clear by the end to those of you mischievous readers who may have stumbled upon it with other ideas. The conference was a great and well-organized affair in San Jose, California. One of the highlights was the Semantic Search Keynote panel, with all of the major players on stage (Ask, Bing, Google, Hakia, TrueKnowledge, and Yahoo!), as seen in the picture below.

]]>Sponsor

]]>

Bear in mind that semantic technology can be as heavy and stifling for any audience as stem-cell research can be to high-school students. But Carla Thompson of Guidewire did a terrific job of coming up with discussion topics and moderating the panel. Everyone survived the ordeal without any sign of dozing.

Despite the positive outcome, some responses from the panelists made me wonder if we should go back to the basic question of, "What is semantic search?" Or, better yet, what isn't semantic search? Here is my list:

Structured Data

Folks, semantic technology is not structured data. A database that can, given the query "social drinking," pull up a list of beer brands, their manufacturers, and their contact information has nothing to do with semantics. Some people seem to have the impression that a search engine somehow uses semantic technology if it retrieves structured data for its results. It is a trick as old as the ancient Egyptians who used beats to organize harvesting information. Organized information is not semantic information.

Morphology

If a search engine is robust and returns the same results for the query "top ten" as it does for "top 10" (i.e. it recognizes that "ten" means 10"), calling the search engine semantic would be a stretch. Anyone could come up with a substitution list like this without a drop of linguistic knowledge. Similarly, distinguishing the name "Fisher" from the noun "fisher" by detecting the capitalization of the first letter does not go beyond the application of simple linguistic rules. These capabilities are not semantic search capabilities.

Syntax

A certain amount of semantic information can be salvaged from syntax. Unfortunately, if syntax were enough for us to detect the meaning of text, then an 8-year-old with perfect reading ability (i.e. who is able to syntactically parse strings of English-language letters) could be expected to understand the meaning of Shakespeare's works. The difference between reading and understanding is the difference between syntax and semantics. The former requires the skill to parse things out, while the latter requires vast amount of associative knowledge.

Statistics

An infinite number of monkeys typing on an infinite number of keyboards would eventually come up with the complete text of the Declaration of Independence. This is a scientific statement; it is not a joke. However, if a search engine is expected to be semantically relevant using statistical algorithms, one would have to wait until the monkeys finished their job. Statistics have no place in semantic technology. A simple test would reveal that. For example, your brain is able to understand a unique sequence of words that you have never seen before, such as "Polar bears don't eat alligator eggs before dawn." If semantics were built on statistics, computers and algorithms would not understand this and billions of other sentences.

Scalability

Scalability is the narrow bridge between science and technology. What you can carry from science to technology over this bridge determines the level of capabilities in the real world. The science of semantics is huge and stems from the roots of philosophy. But Web search is a very particular problem with stringent constraints (a narrow bridge). Designing semantic algorithms to drive a Web search engine is like walking on egg shells and requires a completely new approach. Thus, a semantic search algorithm could be very sophisticated but still not suitable for the Web.

These five areas cover what isn't semantic search and should help readers understand the questions that emerged from the Semantic Technology Conference. Structured data, morphology, syntax, statistics, and scalability are key areas to discuss moving forward. Of course, contrary to the title of this post, no one was actually afraid of asking these questions. But if you caught the reference in the title, that was your semantic brain in action, one last example of what is semantics technology.

]]>Discuss]]>
http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php http://www.readwriteweb.com/archives/everything_to_know_about_semantic_technology_at_semtech_09.php Sponsors Fri, 26 Jun 2009 05:00:18 -0800 RWW Sponsor
Google Squared is Live: Who Knew Structured Data Could Be So Unhelpful? GoogleSquaredLogo.jpgThree weeks ago Google demonstrated a new product in Labs called Google Squared; it's a search engine that creates structured data from big piles of information and lets users compare various things by their attributes. There have been suggestions that Google Squared will crush Wolfram Alpha. Well, Google Squared went live today and while it's a great idea, in reality the service doesn't look very useful. It doesn't look like it's going to crush anyone.

The user interface is inflexible, the data is odd looking and it's hard to imagine using Squared regularly. It's a great idea but we'll see where it goes.

]]>Sponsor

]]> Check out this example below, a Square for the search "dog breeds." It's cool that you can add major or minor medical concerns to the list of columns, but the selection of examples is really strange. The Labrador Retriever (surely the most common dog in this country) doesn't appear until you click through the #47 on the list and German Shepherds aren't in the top 50. Call it structured data if you like, I call it a surefire recipe for making a bad dog buying decision.

squareddogs.jpg

All the other queries we tried were similarly "almost helpful." The dog breed example is actually unusually good. Sorting by a particular column isn't possible, when I define a content type you don't get to see it unless I share it with you, and the user experience is an off mix of intriguing and maddening. The description fields would benefit from borrowing the first few lines of a Wikipedia article on a topic.

It is very impressive that when you request a square for a concept Google is unfamiliar with, you're prompted to offer up to five examples and then it goes out and builds the data set for you! Unfortunately, when I tried to explain to Squared who some examples of "tech bloggers" were it brought back a terrible picture of me and said that CNet's Caroline McCarthy is sixty four years old. I'm pretty sure that's not true.

We're as excited as anyone about the future of creating structured data from the sea of information online, but Google Squared isn't very inspiring so far. We've been looking forward to it since interviewing Marissa Mayer, VP of Search Products and User Experience at Google, about Squared. When the day comes that you can slap a .xml or .csv to the end of one of these Squared URLs and pull out data programatically, that will be impressive.

Here's our review of Wolfram Alpha, which we said was likely to be a good service for engineers but not for anyone else. Hopefully it's still early days for all of these kinds of tools.

]]>Discuss]]>
http://www.readwriteweb.com/archives/google_squared_is_live_who_knew_structured_data_co.php http://www.readwriteweb.com/archives/google_squared_is_live_who_knew_structured_data_co.php Data Services Wed, 03 Jun 2009 12:29:32 -0800 Marshall Kirkpatrick
Google: "We're Not Doing a Good Job with Structured Data" During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

]]>Sponsor

]]> Google's Deep Web Search

Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google's current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright's technology, said Esterkin, but "Google will have to solve these problems the hard way."

Also mentioned during the speech was how Google plans to organize "aspects" of search queries. The company wants to be able to separate exploratory queries (e.g., "Vietnam travel") from ones where a user is in search of a particular fact ("Vietnam population"). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. "Kosmix will give you an 'aspect,' but it's attached to an information source. In our case, all the aspects might be just Web search results, but we'd organize them differently."

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it's clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, "Google" has become synonymous with web search, just like "Kleenex" is a tissue, "Band-Aid" is an adhesive bandage, and "Xerox" is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it's far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

]]>Discuss]]>
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php Trends Mon, 02 Feb 2009 07:32:07 -0800 Sarah Perez
Semantic Tagging with Faviki Faviki is a new social bookmarking tool that offers something that services like Ma.gnolia, del.icio.us, and Diigo do not - semantic tagging capabilities. What this means is that instead of having users haphazardly entering in tags to describe the links they save, Faviki will suggest tags to be used instead. However, unlike other services, Faviki's suggestions don't just come from a community of users and their tagging history, but from structured information extracted straight out of the Wikipedia database.

]]>Sponsor

]]> About Faviki

Faviki's backend uses DBpedia, a community-maintained database created by extracting structured info from Wikipedia and turning that into a database which you can query. (You can read our previous coverage on DBpedia here).

This means that instead of just being words, the tags in this data model become references to objects which are categorized automatically. An example from the Faviki blog cited an example using the tag "Coca-Cola." An item you tagged with this concept would actually reference the unique URL http://dbpedia.org/data/Coca-Cola (the tag is the last part of that URL). Under other tagging systems, the same item may have been tagged with cocacola, coca-cola, coca+cola, CocaCola, but in Faviki, it's simply "Coca-Cola." And because the tags structure is already emanating from the largest collection of concepts in the world - Wikipedia - their format is already standardized and agreed upon by the community.

Using Faviki

Despite Faviki's lofty goals, it's just as easy to use as any other bookmarking service. Once you sign up, you can install a browser bookmarklet which you can use to save links and tag them. You can also search your tags or click through the site's tag cloud to view some of the most popular saved links from the Faviki community.

A Search on Faviki

Unfortunately, there is no way to import your bookmark collection from another service. This is probably because doing so would necessitate completely re-tagging every link-  that would certainly require too much effort on the part of a user if it was a manual process and I imagine it's also difficult to create a service that would automatically scan each link and tag it appropriately. However, without this option, it will be hard to get users to completely switch over from whatever service they are using now.

What Problem Faviki Solves

Because Faviki uses structured tagging, there is more that can be learned about a particular tag, its properties, and its connections to other tags. The system will automatically know what tags belong together and how they relate to others.

There has been a lot of discussion around this topic lately. At the recent Next Web conference in Amsterdam, Nova Spivack, the founder of Twine, predicted that over the next 10-15 years, tags will play an increasingly important role in the structure of the web, while keywords disappear.

If that turns out to be true, then Faviki represents a big step in that direction by offering a transitional service between social bookmarking and a purely semantic-based bookmarking service that would automatically know how to tag any content saved by discovering the semantic aspects already associated with that web page.

]]>Discuss]]>
http://www.readwriteweb.com/archives/semantic_tagging_with_faviki.php http://www.readwriteweb.com/archives/semantic_tagging_with_faviki.php Products Mon, 26 May 2008 10:33:12 -0800 Sarah Perez
Australian Museum Uses Open Calais to Tag Collection The Powerhouse Museum of Science and Design in Sydney, Australia has begun to utilize the Reuters Open Calais API (our coverage) to tag their collection. The museum's online collection database houses some 66,303 objects, so tagging them all by hand would be quite a task. By using the Open Calais web service, the museum is able to automate much of the process.

]]>Sponsor

]]> That the museum has so much of its collection online is actually quite impressive in its own right. About 70% of the museum's electronically documented collection is online in the database which went live in June 2006. Museum objects are searchable, taggable (by humans) and painstakingly described.

However, there are so many objects, that even though users can help to tag them, many of them haven't yet been tagged. Sebastian Chan, who is the Manager of Web Services at the museum, told us that Open Calais is being used to compliment the people-powered tagging they've had running for two years. "What Open Calais lets us do now is connect people, places and companies across our collection and has already revealed many new pathways through our dataset (navigating by designer or inventor is now much easier for example)," he said.

The automatically generated tags at right were created by the API for some swim wear designed by Speedo for the 1991 Australian swimming team that competed at the World Swimming Championships in Perth. Open Calais was correctly able to identify some important locations in the document -- Perth where the competition took place, and Sydney where Speedo is based -- as well as an important corporation (Speedo). It also picked up the name of the designer, and the name of the person who owned the suits before the museum.

However, as you can see, the API made some mistakes too -- it classified "World Championships" as a company, and mistook the general text "international swimming organisation" as an actual organized body. It missed the actual organization (FINA) and probably should have picked up the MacRae Knitting Mills company, which was a predecessor to Speedo. Further, because Open Calais is built around people, places, and companies, general information about items may be lost on it. Tags that would be obvious to humans, such as swimming, swim wear, Olympics, or the year 1991, are beyond the scope of Open Calais.

"These errors and other like them reveal Open Calais' history as Clearforest in the business world," said Chan. "The rules it applies when parsing text as well as the entities that it is 'aware' of are rooted in the language of enterprise, finance and commerce." On the other hand, according to Chan, the technology has already revealed "many new connections between objects," even though it has so far been deployed only very sparingly across the collection.

Powerhouse's use of Open Calais may be the first large scale deployment of the technology across a large public data set. It will be interesting to see the results as they evolve. "It is important to remember that there is no way that this structured data could be generated manually - the volume of legacy data is too great and the burden on curatorial and cataloguing staff would be too great," reminded Chan.

]]>Discuss]]>
http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php http://www.readwriteweb.com/archives/australian_museum_uses_open_calais.php Trends Tue, 01 Apr 2008 16:45:34 -0800 Josh Catone
Yahoo! Pushes 26.5 Million Microformats Into the Wild It was just a couple of weeks ago that Yahoo! announced that it would begin indexing semantic markup language such as microformats in its search engine. That's a huge win for the bottom-up approach to building the Semantic Web, and provides an incentive for publishers to start adopting semantic markup like RDF and microformats. As a publisher, Yahoo! is also eating its own dogfood, so to speak, and putting microformats to use on its own sites.

]]>Sponsor

]]> Yesterday, Yahoo! announced that it had begun using microformats on its European shopping search engine Kelkoo. Specifically, Yahoo! Europe pushed out the biggest deployment yet of the draft hListing format, which is a new format used for marking up classifieds listings.

The actual number of hListing's Yahoo! put out there was 26,456,448, as well as an additional 6,500 hCard listings describing merchants. "This bumper injection of structured data into Kelkoo’s pages makes it ripe for re-use, be that browser extensions to draw out product information on our pages, indexing services aggregating product listings together or mashing up the data for reuse in widgets," said developer Ben Ward of Yahoo! Europe.

Ward also indicated that Yahoo! hoped that other sites would adopt the hListing microformat. "After years of waiting for technology to move the web forward, it’s happening. There’s information our there now to pull of functionality we never had before. As web developers, there’s little to do but slip in microformatted mark-up wherever we can, and start having fun in consuming it," he said.

]]>Discuss]]>
http://www.readwriteweb.com/archives/yahoo_kelkoo_microformats.php http://www.readwriteweb.com/archives/yahoo_kelkoo_microformats.php Products Fri, 28 Mar 2008 10:59:40 -0800 Josh Catone