open data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/open data en Copyright 2012 Richard MacManus readwriteweb@gmail.com Mon, 13 Feb 2012 17:00:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss How the DC's Metro Opened Up Its Data metro-150.jpgThree years ago, the Washington Metropolitan Area Transit Authority looked lost, and so did many of its riders.

Those who hadn't memorized Metro's schedules had to employ its persnickety Trip Planner, a clunky Web form that not only won't let you click on a map to specify your location but also chokes on cities, states, Zip codes and even commas if you add them to a street address. Meanwhile, other U.S. cities had enjoyed transit directions from sites like Google Maps since at least 2005. But not DC.

]]> Worse yet, after the first step to share schedules, converting data to the standard General Transit Feed Specification format, Metro had halted the effort. In December 2008, a spokesman told the urban-development blog Greater Greater Washington that continuing it was "not in our best interest from a business perspective." That left Metro riders with kludgey, screen-scraping workarounds like DCist's text-messaging service.
trip.jpg
But now, Metro rail and bus directions are clicks or taps away in third-party sites and applications, allowing passengers to benefit from such innovations as Google's stop-by-stop transit navigation on Android phones. The changes in between suggest a road map for other organizations having their own open-data debates.

Start lobbying to open a conversation. Greater Greater Washington editor David Alpert quickly had hundreds of signatures on a petition protesting the decision. That persuaded Metro to detail its objections: fear of losing $68,000 in yearly ad revenue from the Trip Planner page, a wish to be paid for its data, and the legalese around data sharing. That then widened the discussion from an argument over APIs to one over the proper use of taxpayer dollars.

"David came along and started this campaign," said Christopher Zimmerman, an Arlington County Board member who chaired Metro's board in 2008. "It wouldn't have happened without the public pressure." Alpert could also lobby Metro from closer in after joining its Riders' Advisory Council in January of 2009.

Rob Pegoraro worked for more than a decade covering technology for the Washington Post. His blog can be found here.

Flipping the debate from potential profits to actual expenses. Gordon Linton, a former head of the Federal Transit Administration who served alongside Zimmerman on the Metro board, focused on the opportunity cost of giving data to other sites. He said the agency had given away resources - for example, free parking for car-sharing services - that it later realized could yield income.

"While we were raising fares and cutting service if we were investing staff time and energy for a product that would reap some financial benefit for those who would use it and sell it, then we in turn should get some money back," Linton said.

Zimmerman took the opposite argument: the ease of upgrading one aspect of the Metro experience. "We were having a lot of difficulties," he said. "If some of these things don't cost us anything or don't cost us a lot [to fix], we should do them right away."
WMATAmap.jpg
Alpert suggested this debate encouraged Metro staffers to rethink things. "Staff may have felt they were under orders from the board to maximize revenue. Zimmerman gave them permission not to worry about that."

Oh, and Google never had any interest in paying for a schedule feed. Wrote spokeswoman Anne Espiritu: "We do not pay agencies for their data."

A change in leadership can help. All of this effort got sidetracked on June 22, 2009 when two Red Line trains collided and killed nine passengers. Things were set back further when general manager John Catoe unexpectedly resigned in early 2010 and his interim replacement Richard Sarles had to focus on safety upgrades.

But Sarles' earlier employer, NJ Transit, had provided rail schedules to Google back in 2008. He wanted to follow suit here, said Metro spokesman Dan Stessel.

Metro and Google signed a data-sharing agreement in July of 2010, once Google had dropped earlier demands for an indemnification clause. Metro directions showed up on Microsoft's Bing Maps in September 2010; they arrived on Google in May after a final shove from Sarles following his appointment as WMATA's full-time general manager in January.

(But even now, you must return to Metro's sites or a few third-party apps for bus and train arrival predictions. Google only announced a standard format for that data, GTFS-realtime, in August, and Metro is still weighing support for that. And some regional bus systems that Metro's Trip Planner includes have yet to provide their own GTFS data to mapping sites.)

What about the original financial arguments? We may never know how the math worked out: without detailed surveys, you can't draw a line from clicking on maps to boarding trains. But Stessel, who didn't provide a dollar cost for the work involved, suggested that the rationale merchants invoke to invest in intangibles like store or site designs works for public transit too: "Our primary motivation is improving the customer experience." Zimmerman had a more philosophical justification for how a government agency should act: "Putting information in the public domain is part of what we do."

Sometimes, being open isn't easy. You could say that there were many stops on the route that Metro took.

]]> Discuss]]>
http://www.readwriteweb.com/archives/three_years_ago_the_washington.php http://www.readwriteweb.com/archives/three_years_ago_the_washington.php Data Portability Fri, 18 Nov 2011 03:00:00 -0800 Rob Pegoraro
On Data Markets and Their Evolution More than two years after President Obama's memorandum on his open government initiative, thousands of public authorities and organizations worldwide have embraced the main idea behind it. Opening up data and making them publicly available on the Web has been recognized as a key to fostering transparency and collaboration within public administrations and with citizens.

From census data, to cadastrial maps, everyday a new data set pop ups on the Web, as a quick glance at the #opendata hashtag on Twitter shows.

]]> Discovering and consuming open data
Davide Palmisano is a Semantic Web software engineer based in the emerging Silicon Roundabout of London. An open data enthusiast, Davide's highest ambition is to speed-up the rise of a new data economy. He is founder and CEO of Smartetics.
Since the open data movement has shown no declining signals, several hubs, or data markets, have been released. This was a direct consequence of the need for ways to search all different data sets. According a wide definition, data markets are platforms where the users may search, browse and discover new data sets to fulfill their needs. The added value they bring varies according the functionalities they offer, making them something more than a simple vertical search engine.

For example, the Icelandic startup Datamarket.com provides a fully flavored set of functionalities to visualize the data. Data time series could be visualized with several different types of charts, allowing the users to add dates, grabbed from the Guardian archive. The result is a handy way to make explicit the correlations between trends and historical events. Then, end users could access the data through REST APIs or export them in CSV or XML. Links to diagrams could even be shared on Twitter or Facebook, making Datamarket a fancy and pragmatic tool.

Factual, another platform recently raised $25 million dollars in a Series A funding, mainly impresses for its ability to join data sets. Different data sets are represented with different tables, slightly similar to a relational database where end-users could make projections, selections and joins on table fields. Then, some applications could be build on top of the aggregated data and the result embedded in a third-party website.

CKAN Data Hub is a remarkable initiative led by the Open Knowledge Foundation. It is probably the largest hub in terms of indexed data sets. Released as an open source project, it offers API access to search and browse the index, but it's not equipped with an explicit mechanism to directly manipulate the indexed data. However, what strongly differentiates CKAN from the others, is the emphasis it puts on data licensing: every data set can be published using any of a number of open licenses. Most of them are directly endorsed by the Open Knowledge Foundation, a group that is playing a leading role in this field.

The last goodie is Talis's Kasabi, which was demoed at the last Semantic Technology Conference in San Francisco. Kasabi offers interesting innovations powered by a pragmatical use of Semantic Web technologies. For any given data set, users can design their own REST APIs and re-publish them on the market, hinting at a forthcoming revenue model. What makes Kasabi one step ahead of others is the powerful mechanism provided by SPARQL to slice, select and remix the data. Once a query has been defined, the user could completely customize the response using an XSLT transformation. Under a certain perspective, Kasabi could be seen as an engineered showcase for the potentials of the entire Semantic Web technologies stack.

Heterogeneity raises development costs and acts as a barrier to the development of an enterprise reuse of the data. Costs raised by all the implementation tasks needed to access the data, make them coherent with a specific application domain, curate them and, finally, generate business value from their usage.

Following the 5 stars

Even if those platforms, and other well-known products such as Microsoft Azure or Infochimps, are concretely sustaining the tendency to open up the data, there are still obstacles to a harmonious and integrated data consumption. Data publication techniques, for example, vary from simple database exports with CSV files to sophisticated Semantic Web-powered platforms, such as data.gov.uk. This heterogeneity raises development costs and acts as a barrier to the development of an enterprise reuse of the data. Costs raised by all the implementation tasks needed to access the data, make them coherent with a specific application domain, curate them and, finally, generate business value from their usage.

Even if some markets are facing the need to standardize their internal representations, there's still a lack of Web-wide integration among different data sets. It's nearly impossible to link and access to different data sets published on different markets.

In addition, even when data are directly published on the Web, they do not really benefit from the web model. This lack has been recently pointed out by Tim Berners-Lee who proposed the 5 stars of Linked Open Data: a handy way to judge the data quality with regards to the license and the syndication technology used to expose them. The Linked Data paradigm is seen as a key to tackle the main issues related to data integration. The "webby way to link data" foresees unique URI-referenced entities linked together, machine-readable representations and open licenses as the main foundational ingredients to achieve web-scale open data consumption.

Tim Berners-Lee (has) proposed the 5 stars of Linked Open Data: a handy way to judge the data quality with regards the license and the syndication technology used to expose them. The Linked Data paradigm is seen as a key to tackle the main issues related to data integration.

Datamarkets: quarterbacks in the emerging data economy

We can reasonably expect data markets, or whatever we'd like to call them, will play a prominent role in the emerging data economy. Once the revenue models for data publishers are clearly defined and accepted, and once a critical mass of 5-star quality interlinked data sets become available, a new wealth of opportunities for developers will emerge. In some sense, a virtuous revenue model should encourage big owners to open their data sets, consumers to offer their APIs with flexible pay-by-use fees and the various markets to compete on the value-added services they will be able to provide. The mission is all about building an ecosystem, rather than merely develop vertical search engines.

]]> Discuss]]>
http://www.readwriteweb.com/archives/on_data_markets_and_their_evolution.php http://www.readwriteweb.com/archives/on_data_markets_and_their_evolution.php Data Services Wed, 20 Jul 2011 12:30:00 -0800 Davide Palmisano
HistoryPin Links Past, Present, Place, & Photos in a Powerful New Location App historypin150.jpgThere's an exhibit on display at the Museum of the City of New York currently, a series of photographs that chronicle some of the history of food carts in the Big Apple. It's an interesting retrospective, a way to think about the "then and now" - the immigrant experience, our changing (and unchanging) dietary habits, the history of New York.

The exhibit made for a great backdrop this evening for the official launch party for HistoryPin, a website that aims to link our personal family histories and photographic records with a larger story and to pin those photos and stories to Google Maps.

Who was here, what was here before us? Where were our families? What did their world look like? What artifacts remain, and how can we connect these cultural remnants to people and places today? These questions can uncover a wealth of information of both personal and cultural value.

HistoryPin has been in beta for a year, a creation of the U.K. non-profit We Are What We Do. HistoryPin officially launches today with the release of its Android app. (An iPhone app is on its way.) The site and the app let you view the history of a particular location, by taking historical photos and pinning them, as the name suggests, to Google Maps. You can also contribute their own photos - both present-day and family heritage photos - to the site.

]]> The app takes full advantages of many of the features of Google Maps, including not just pins but Street View and timelines. Using HistoryPin, you can search for photos, as well as for video and audio content, by place or by time. Historic images can be overlaid onto contemporary views of a particular location so you can see what happened there and what has changed.

But "it isn't just about the tech," says Jesse Friedman, product manage for Google Earth and Google Maps, speaking at tonight's launch event. "It's about the stories."

Pinning Family History, Community History, National History

Granny and Auntie Jenny.pngStories were what motivated HistoryPin co-founder Nick Stanhope to undertake the project, as he described the time spent with his Gran going through old family photos and listening to her stories. "How can we use history to start these sorts of conversations," wondered Stanhope, to build relationships and to encourage "little bits of understanding."

That understanding is important both on a personal level, on a neighborhood level, and on a national level, as another speaker at this evening's launch event pointed out. Community activist Martin Luther King III, son of the civil rights leader, spoke about the importance of sharing family history "as a universal experience."

Sharing Photos, Sharing Places: The Importance of Open Data

Linking culture, memory, and justice was reiterated by the final speaker at tonight's event, Harvard professor Laurence Lessig who spoke about the importance of sharing openly licensed photos rather than locking down content under regulations and licenses that deny the sorts of insights about the past that HistoryPin can uncover.

For his part, Stanhope stresses the importance of open and linked data as part of HistoryPin's project, making it possible for people and archives to share their photographs and for others in turn to reuse and remix that content for non-commercial purposes. (I've written about linked open data, location, and archival photographs before in my coverage of LookBackMaps, whose founder Jon Voss has joined HistoryPin.) Stanhope contends that this openness will enable the project to expand beyond just the "history buffs" to include the stories and the photos that we all have tucked away in our personal and family archives.

Indeed, the open licensing might just do that. But to echo what Google's Friedman said tonight: it's not just the tech or the licensing that's most powerful about HistoryPin. It's the stories.

]]> Discuss]]>
http://www.readwriteweb.com/archives/historypin_links_past_present_place_photos_in_a_po.php http://www.readwriteweb.com/archives/historypin_links_past_present_place_photos_in_a_po.php Location Mon, 11 Jul 2011 19:16:12 -0800 Audrey Watters
Search & Display Over 10 Million Historical Government Records, Thanks to the National Archives nationalarchiveslogo150.jpgThe National Archives and Records Administration launches an Online Public Access prototype today, making available to the public millions of digitized government records. The effort is part of the National Archives' plan to provide better online services and better access to historical government documents.

The Online Public Access prototype provides access to and information about the National Archives' records. It is a centralized mean to search and display information from multiple National Archives resources. "People have asked us for a Google-like search," says the National Archives' Pam Wright, "which I think this really provides."

]]> Currently, the prototype contains all the data from the Archival Research Catalog and several series from the Access to Archival Databases - around 10.9 million electronic records. In addition, the new search engine provides access to 1 million records from the Electronic Records Archives, which aren't available elsewhere online.

It isn't simply the breadth of the collection that makes this a great tool; it's the presentation of the information. The digital copy of the item is large and central, and all the pertinent catalog information is also easy to read.

nixonpresley.jpg

The National Archives says it plans to add additional functionality in the coming year, including the ability to zoom into images and pan through the archives' holdings.

]]> Discuss]]>
http://www.readwriteweb.com/archives/search_display_over_10_million_historical_governme.php http://www.readwriteweb.com/archives/search_display_over_10_million_historical_governme.php Government Mon, 27 Dec 2010 12:01:27 -0800 Audrey Watters
U.S. Congress Comes to Android A mobile application which connects Android phone owners to their representatives in the U.S. Congress has just been released by the non-profit, non-partisan organization Sunlight Labs, a group dedicated to government transparency. After months of public beta testing, the newly finished application is now a comprehensive toolset that helps you stay on top of congressional activity, voting records, new bills and laws, and more. It even provides one-touch access to your Congressional representatives, allowing to you to call their office directly from within the application, watch their YouTube videos or read their latest updates on the microblogging social network, Twitter.

]]> Different from the iPhone Version

The Android application is similar in some ways to its iPhone counterpart, Real Time Congress, released at the beginning of the year. Like the the Apple version, the Android app makes it easy to see what's happening inside Congress in a timely fashion.

However, unlike the iPhone app, the Android version offers a greater focus on your representatives and their activity. This is something which iPhone users already had access to, explained Sunlight Lab's Clay Johnson back in January: there are "at least a half-dozen" third party applications for iPhone that do the same, he said . But in the Android Marketplace, there's only the one: Congress.

Congress: App Details

From the app's main screen, Android users can enter in their location, either by tapping into the phone's GPS or by manually entering a State or zip code. Search functions for finding a particular representative or committee are also present and, at the top, there are sections for tracking votes and nominations.

Each representative has an easy-to-use profile page where their office's phone number is prominently featured. Here, you're also one tap away from voting records, sponsored bills, committee details, news articles, Twitter updates and YouTube videos, assuming your rep participates on social media. The rep's own webpage is also linked by way of an icon found next to their profile picture.

Government in Your Pocket

For mainstream users who don't try software in beta (aka "we're still testing it") format, Congress for Android may be their first peek into the power of mobile combined with the power of open data, specifically open governmental data. The application was built using the Sunlight Congress API and GovTrack.us, the former a tool to programmatically access basic information on members of Congress, and the latter a civic project for tracking Congressional activity.

Like all Sunlight projects, Congress is open source software, meaning other developers can view and reuse the code, stored here on Github.

Since the app's launch into public beta late last year, over 250,000 Android owners have downloaded it. Now that the app has officially and publicly launched, that number is sure to rise.

In the future, the app will be updated to support real-time notifications and other "exciting features," says Sunlight Labs. Those interested in downloading the app can do so now from the Android Market: just search for "Congress."

]]> Discuss]]>
http://www.readwriteweb.com/archives/us_congress_comes_to_android.php http://www.readwriteweb.com/archives/us_congress_comes_to_android.php Government Thu, 29 Jul 2010 07:01:27 -0800 Sarah Perez
The Modigliani Test for Linked Data: Results In a recent post, I outlined a kind of layman's test for the Semantic Web. I wrote that the tipping point for the Semantic Web may be when anyone can query a set of data about a historical figure and get a long list of structured results in return. I called this 'The Modigliani Test,' after my favorite artist Amedeo Modigliani. To pass this test, you must deliver - using Linked Data - a comprehensive list of locations of original Modigliani art works around the world.

A developer named Atanas Kiryakov gave the test a good crack. In doing so, he illustrated the core issues facing the Semantic Web currently.

]]> The challenge of this test is that there isn't currently enough linked data on the Web about Modigliani. Also the key data in this test is the locations of art works, which probably isn't one of the main data fields for art data when it's uploaded to the Web (artist name and art work title would be the two key data fields).

Kiryakov wasn't the only person who attempted to pass the test; and in fact his results mirror what can be found already on the popular open database Freebase. However Kiryakov, who is the Executive Director of Bulgarian Semantic Technology company Ontotext AD, did a great job of explaining his methodology and noting the issues he faced.

The Current State of Linked Data Queries

The result of Kiryakov's attempt is a relatively short list of locations of Modigliani paintings around the world. He admits that the list isn't long enough, but says that it's the closest he could get - not just because of the limited amount of data in the Linked Data Web, but because it's "hard to query and use today."

Essentially Kiryakov created code to query a few known Linked Data sets, with custom manipulations to output location data. This is what he came up with:

PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>

PREFIX dbp-prop: <http://dbpedia.org/property/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ot: <http://www.ontotext.com/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
WHERE {
  ?p fb:visual_art.artwork.artist dbpedia:Amedeo_Modigliani ;
     fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ;
     ot:preferredLabel ?painting_l.
     ?ow ot:preferredLabel ?owner_l .
  OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
  OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
  OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }
}

That query was executed in a tool called LDSR, a "Linked Data Semantic Repository" created by Kiryakov's company Ontotext. He calls LDSR a "search engine for part of the linked data web." Ontotext's LDSR includes data from existing Linked Data repositories such as DBPedia, Freebase, Geonames, UMBEL and Wordnet.

Here is a screenshot of Atanas Kiryakov's attempt to pass the Modigliani Test. He spent over an hour formulating the code used to generate this result.

As you can see, the resulting list was just 8 items long and most of the locations are in major U.S. cities. This falls well short of a comprehensive list of Modigliani art work locations. For example, there's no data about Modigliani paintings in Europe - where Modigliani lived all his life.

Other Sources of Modiglidata

Kiryakov wrote that most of the data returned in the Modigliani example came from Freebase. Indeed, as RWW commenter Brian Karlak pointed out in our original post, you can get much the same result within Freebase itself. Another commenter, Michael, pointed to a non-technical results page. Kiryakov's result has a little more data, but not much more.

However the point of Kiryakov's attempt and blog post was to point out the difficulty of passing the Modigliani Test right now. He noted that "getting useful information from LOD [Linked Open Data] quite often requires a lot of efforts to analyze and post-process them in order to get reasonable answers to structured queries." In other words, it's much more than just inputting a natural language query (note that the Freebase example was provided by a user there named masouras, so it's not something an average user could do).

I should also mention that in the comments to the previous post, Bruce Wayne pointed to his company Factoetum's effort to pass the test - which had 7 results, including some different ones to Ontotext/Freebase. Like Kiryakov, Wayne noted that it's "nearly impossible" for non technical people to use the current solutions.

Finally, to address an issue that some commenters raised in the previous post: yes it would be possible to pass the Modigliani Test with some manual human effort to track down location data. But that's cheating - we want to see this done using Linked Data. And not just for Modigliani works, but for any other artist.

Much Work to Be Done

Atanas Kiryakov concluded that "there is still a lot of work to be done, because we cannot expect wide usage and interest in the Semantic Web if writing such a query takes more than an hour and a lot of technical knowledge."

While that's true, I thank Atanas for giving the Modigliani Test a crack. At least now I know to visit the Museum of Modern Art when I next go to New York!

Let us know your thoughts on the Modigliani Test in the comments. Or perhaps you're a developer willing to take on this challenge?

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php Structured Data Mon, 26 Apr 2010 01:56:34 -0800 Richard MacManus
The Modigliani Test: The Semantic Web's Tipping Point In our recent posts about Structured Data, we've emphasized that most of the current initiatives have been around uploading new data to the Web - whatever the format. The U.S. and U.K. governments have led the way with their 'open data' websites, but much of that data isn't 'linked' yet. In other words, it's online - but siloed. So how do we get to the next stage of the Semantic Web, linking disparate data sets together so that people can begin to use that data?

The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return. I've decided to term this 'The Modigliani Test.'

]]> Amedeo Modigliani is one of my favorite artists. He was moderately famous during the early 20th century and has something of a cult following nowadays. But he's not Da Vinci or Picasso famous. What I'd like to do in a Semantic Web is type the following query into a search engine and get back a large list of results: tell me the locations of all the original paintings of Modigliani.

As of today, there's no place to type that query in and get a list of structured data. The closest I can find to doing that is the Artcyclopedia entry for Modigliani, which has a list of locations for Modigliani artworks. It's great that they have the location data listed on one web page. However it's not structured data, so we can't query it. There's also not much order to the data, we have no idea if this is a comprehensive list, it's not verified data, and so on.

In summary, there's a lot of data on the Web about the location of original art works - but much of it is in traditional 'document' web pages. What we're after is a giant database of art works, which anybody can query and re-use.

Here's an early, overly geeky view at what a Linked Data of painting locations would look like (hat-tip @dakoller):

The above is a far from comprehensive list of art works by Hieronymus Bosch (a search for Modigliani, by the way, brought up zero results). Plus of course we need a much more intuitive UI, so that non-geeks can use it too.

What do you think, when will The Modigliani Test be passed on the Web?

]]> Discuss]]>
http://www.readwriteweb.com/archives/the_modigliani_test_semantic_web_tipping_point.php http://www.readwriteweb.com/archives/the_modigliani_test_semantic_web_tipping_point.php Structured Data Fri, 16 Apr 2010 00:06:00 -0800 Richard MacManus
10 Ideas For Web of Data Apps At the end of last week, we posted an open thread asking what application you'd build (or would like someone else to build) using linked data or open data. The thread was inspired by Georgi Kobilarov. In this post, we list 10 of the best ideas we received.

A number of the suggested apps were for social good, for example apps for improving sustainability and finding missing persons. Other apps were more lifestyle-oriented, for example for cooking and genealogy. A few were business focused, such as a brand marketing app and a point-of-sale system. Of course a couple were just plain ol' geeky, which we love too! You can find all 10 ideas below.

]]> Firstly, a quick refresher course on the terminology. Linked data is data that has been uploaded to the Web and linked to other sources, but is not necessarily open for other developers to re-use. Often when people use the term "linked data," they mean data that has been uploaded in a structured format, for example RDF. Open data is data that has been uploaded to the Web and is freely available to use, but isn't necessarily linked to other data sources. The term "open data" is often used for unstructured data, for example CSV files (spreadsheets). The ideal, of course, is data that is both linked and open. We should note however that these definitions are not universally agreed on, but they're good enough for the purposes of this post.

Missing Persons

Juan Sequeda, co-founder of Semantic Web Austin, has an idea for using linked data "to integrate data from displaced populations, specifically in Colombia." He references a BBC report from September 2009, about using semantic Web technology to enable people to search currently incompatible databases of missing persons in Columbia.

Sustainability

Bernard Vatant from Guillestre, France, wants to see "the Web of Data enable people anywhere in the world to find out smart, sustainable and low-cost solutions to their local development issues." For example, success stories in farming, water supply, energy, education and health "in environments similar to mine, anywhere in the world."

In short, Bernard wants a linked data equivalent to WiserEarth - an online community for people interested in sustainability.

A Better World

Aldo Bucchi from Chile wants an app to tackle "negligence, corruption and lack of accountability." Specifically he mentioned a recent 8.8 magnitude earthquake in Chile, which resulted in hundreds of deaths. Aldo believes that some of those deaths were avoidable, because of what he claims was "corruption and malpractice in the construction business." He thinks that a Web of data would help identify such things, as well as help "rebuild the country faster and in a more agile manner [with] the "loose-coupled coordination" that is naturally derived from a shared data substrate and a single world view."

Genealogy App

Sherry Main from Orange County, California would like an app for genealogy. Wrote Sherry:

"It would be amazing to be able to map and locate where your family is from, has been, and what notable events happened. If there was an application on a mobile device that pinged you when you are within a particular radius of say, my great-grandmother's birthplace, as I walked around a town, that would make real-world experiences more meaningful [...] As photos become geo-tagged going forward, imagine being able to get a push notification that showed an important family or historical photo to you as you stood or walked by that location."

Cooking App

Bart Stevens wants to be able to "select a (difficult) recipe and submit this to a service." He wants the following information back:

1. Where can I find the ingredients.

2. Place an order/make a reservation (@bakery, butcher or fish shop) for certain ingredients.

3. A route (street) map, per store.

4. Maybe a payment system.

Point-of-Sale & Inventory System

Daniel O'Connor would like to see a point-of sale-system and inventory system, for example for a small office supplies store.

He beckons us to imagine this: "I receive a new product, scan the barcode of it. My system queries the web for the supplier name, product data, etc [...] recognizes the supplier and hits their URI for the product. It assimilates all of the recommended price information (ie: good relations); depictions and populates my system." You can read the full scenario in his extended comment.

Brand Marketing

John Davidson suggests that linked data can be used to assist brand marketers, specifically to find out more about their customers. He offers this example:

"A customer becomes a fan of a popular hair care brand on Facebook. She separately opted-in on the brand site to receive email alerts for new products, promotional offers that she can redeem in stores, etc. Are these distinct, separate events or are they somehow connected? By integrating these streams from the "Web of Data" the brand marketer can understand that she is an advocate for their brand. She also has several dozen or more friends she regularly interacts with in social channels. The marketer can engage her with special offers to promote their cool new products with friends in her network. The subsequent buzz and chatter sends friends to their stores to buy the new hair care products and the cycle repeats."

Research Assistant

A comment on Georgi's blog suggests an app to review literature. "Professor Aloha" wrote that he/she would create an application that could "take any research topic and backtrace (through articles, dissertations, presentations, and their accompanying reference lists) all published research articles on that topic, sorting them by year of publication, author, country of origin, journal and major findings."

Enriched People Profiles

Atif Latif from Austria would like to build an aggregator for all of the possible resources related to a person on the Web. The end result, said Atif, "will be [a] highly semantified and enriched profile of a person." Atif is working on this as we speak, with a beta app named CAF-SIAL. Good luck Atif!

In a separate comment, Kingsley Idehen of semantic Web company OpenLink Software mentioned "Verifiable Identity," noting that "all databases (including the Web of Linked Data) need verifiable identity."

Website-less websites

Nathan suggested a number of things, our favorite being "Website-less websites". Nathan wrote that "when all the data is typed and in a single format (let's say rdf) then the need for websites and webpages can completely be disposed off, rather we can view the information in an array of clients side applications each with there own benefits (like we do currently with twitter clients), The entire web can theoretically and quite easily just be one big API."

Bruce Wayne of Factoetum wrote in a separate comment that he is developing "services that will have in impact in bringing about a Website-less web." He gives an example of a list of book titles.

Those are 10 suggestions from the ReadWriteWeb community. Perhaps some enterprising entrepreneurs or developers will pick up a few of these ideas for their next startup!

]]> Discuss]]>
http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php http://www.readwriteweb.com/archives/10_ideas_for_web_of_data_apps.php Structured Data Thu, 15 Apr 2010 00:05:02 -0800 Richard MacManus
Open Thread: What Would You Build With a Web of Data? Recently we looked at the state of Linked Data in 2010, noting developments such as governments putting public data online and Thomson Reuters putting structure around commercial data using OpenCalais. In a follow-up post, we explained the distinction between Linked Data, Open Data and the Semantic Web.

Georgi Kobilarov, who runs a Linked Data startup from Germany called Uberblic Labs, recently issued an interesting challenge on his blog. He asked: if we had a Web of Data, what would you build? Not to steal Georgi's thunder, but we think this is a great question to put to ReadWriteWeb readers too.

]]> Here's Georgi's idea:

"If we had a Web of Data, I would built an application for painless travel planning. It would integrate flight plans, train timetables, bus routes, car rental offers, etc. And the user would be able to just say: I want to go from A to B: Find me the best/cheapest/fastest routes. [...] With a Web of Data, an application could do all that combining for me, the same way flight booking sites do that today for just flights."

Here's my idea for an app that uses the Web of Data. I'd like a web site or app that allows me to discover the locations of original art works by my favorite artists, and then create travel itineraries for me to see some or all of those art works (most famous artists have their art works scattered around the world, in various museums and galleries). It's possible that there is a web directory of artists somewhere that has some or even all of this data already, but if so I haven't found it.

I ask for this because every now and then I search the Web for a painting that I saw in a book. A recent example was a Modigliani painting that I was attempting to create a copy of, for my beginners acrylic painting class. The original painting was called "Portrait of Madame Hanka Zborowska." One of the results from Google told me that the original painting is located at the National Gallery of Modern Art, Rome, Italy.

I could potentially spend hours hunting down the locations of Modigliani's paintings, using Google - and it's likely that some of the data isn't currently online. So it would be great if I could query one web site or app: tell me where all the originals of Modigliani's paintings are in the world, and draw me an itinerary for visiting all or some of them. Heck, maybe even book my flights and hotels!

That's my example of what I'd build from a Web of Data. Now tell us what site or app you would like built, if the data was available on the Web.

]]> Discuss]]>
http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php http://www.readwriteweb.com/archives/web_of_data_what_would_you_build.php Open Thread Thu, 08 Apr 2010 22:15:43 -0800 Richard MacManus
It's All Semantics: Open Data, Linked Data & The Semantic Web Yesterday we summarized some of the main developments in the Linked Data world over the past year. Linked Data is a W3C-backed movement that is all about connecting data sets across the Web. It can be viewed as a subset of the wider Semantic Web movement, which is about adding meaning to the Web. However, there is some confusion in the Semantic Web community about the crossover. To add to the confusion, there is a term called 'Open Data' that is being bandied around too. This commonly describes data that has been uploaded to the Web and is accessible to all, but isn't necessarily "linked" to other data sets.

So what's the beef with all of these terms? In this post we seek clarity!

]]> The Difference Between Open Data and Linked Data

In the discussion over yesterday's post, a few people tweeted that the U.K. government's public data website Data.gov.uk is mostly populated with 'Open Data' and not 'Linked Data.' But what does that mean? It means that much of the data on the site is available to the public, but it doesn't link to other data sources on the Web. It could be data that has been uploaded in CSV format (i.e. spreadsheet data), which Sir Tim Berners-Lee said in an interview with me last year is a common occurrence with government departments. Or it could be data in another non-Web format.


Screen from a Tim Berners-Lee presentation on Linked Data, circa 2008

Titti Cimmino put it nicely: Open Data is simply 'data on the web,' whereas Linked Data is a 'web of data.'

However, the idea of Open Data is to turn it into Linked Data. As John S. Erickson pointed out, the first priority of Data.gov.uk (and its U.S. counterpart) is to publish lots of Open Data. The next step is to work towards linking it all up. This is already starting to happen. Answering a question I posed on Twitter, Kingsley Idehen confirmed that Data.gov.uk is currently a combination of Open Data and Linked Data.

Linked Data and The Semantic Web

So may we then suggest that the idea of Linked Data is to turn it into a Semantic Web? Or are they the same thing already?

Lorna Campbell from the University of Strathclyde in Scotland tackled those and other questions in an excellent post earlier this month. She started by warning of the potential for another "holy war" about terminology. I won't delve into that in this post, however this excerpt from Campbell's post gives you a flavor of the terminology angst:

"Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case "linked data", or "linkable data", for data that uses other technologies."


Even Wikipedia can't define Semantic Web...

Campbell quotes from a number of other articles, in trying to come to a conclusion about how Linked Data and the Semantic Web relate. Perhaps the best definition she found was this one by Paul Walk:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked"

Why This Matters

So there you have it, Linked Data is NOT the same as the Semantic Web. It's also not necessarily open, in other words accessible to developers.

Whatever the definitions, the key points about all of Open Data, Linked Data and the Semantic Web, are:

  1. data is being uploaded to the Web that wasn't online before (e.g. much of the data on Data.gov.uk).
  2. structure is being added to the data using Linked Data and/or Semantic Web technologies.

The bottom line is that the more data we have on the Web that is linked and has defined meaning, the smarter our web applications will be. This is why these activities are so exciting, despite the terminology confusion!

Image credit: Semantic Web Rubik's Cube, dullhunk

]]> Discuss]]>
http://www.readwriteweb.com/archives/open_data_linked_data_semantic_web.php http://www.readwriteweb.com/archives/open_data_linked_data_semantic_web.php Structured Data Wed, 31 Mar 2010 23:00:00 -0800 Richard MacManus
UK Launches Open Data Site; Puts Data.gov to Shame A new website dedicated to making non-personal data held by the U.K. government available for software developers has launched today with the help of Sir Tim Berners-Lee, the inventor of the World Wide Web. Data.gov.uk is being slammed with traffic but six months after the U.S. government opened its Data.gov site the U.K. site already has more than three times as much data than the U.S. site offers today.

At launch, Data.gov.uk has nearly 3,000 data sets available for developers to build mashups with. The U.S. site, Data.gov, has less than 1,000 data sets today.

]]>

The UK government has been a big supporter of innovation built on top of public data. It sponsored a contest called Show Us a Better Way, giving cash prizes to people who came up with the best ideas for mashups they would like to create if they had access to the right government data. Charles Arthur at the Guardian has good coverage of the U.K.'s open data work (the Guardian has been working hard to open public data as well).

The U.S. government, on the other hand, has been lackluster in its move to open data to facilitate outside innovation. If Twitter is the poster child for building a thriving ecosystem around a streaming set of data, then the Obama administration has earned about 140 characters worth of praise for its fledgling efforts so far. The U.S. government's efforts to advance agencies' use of cloud computing may work in conjunction with opening data to the public and thus may improve the state of things, but time will tell.

Congress didn't even ask U.S. CTO Aneesh Chopra any questions about President Obama's Open Government initiative during his confirmation hearings. When the U.S. government's Data.gov site launched, critics pointed out that it was filled with relatively non-controversial data sets; plenty of USGS data but no DOJ or military data, for example. The U.K.'s data site, in contrast, includes 22 military data sets at launch, including one called Suicide and Open Verdict Deaths in the U.K. Regular Armed Forces.

One request that users of both sites still have is for data to be made available in standardized formats. The U.K. site does include a prominent promotion of the Semantic Web, no doubt a tribute to Berners-Lee's focus on the paradigm as the next step for the future of the web. More standardized, structured data is expected to be the direction that the program tries to get government agencies to move toward in the future.

]]> Discuss]]>
http://www.readwriteweb.com/archives/uk_launches_open_data_site_puts_datagov_to_shame.php http://www.readwriteweb.com/archives/uk_launches_open_data_site_puts_datagov_to_shame.php News Wed, 20 Jan 2010 11:57:00 -0800 Marshall Kirkpatrick
Factual Makes Publishing Open Data Easy factual_logo_oct09.pngFactual, a new open data project founded by Gilad Elbaz, just launched its public beta today. Elbaz's last company, Applied Semantics, was acquired by Google in 2003 and became one of the core components of the search giant's AdSense contextual advertising product. Factual, which is mostly geared towards developers, is somewhat similar to Freebase, though Factual allows for a more free-form approach to building a database than Freebase. Factual provides users and developers with tools to create, contribute and mash up open data on any subject.

]]> Factual also announced that Esther Dyson has joined the company's board of advisors.

For now, Factual obviously only offers a relatively small repository of databases, though the company's current focus is on getting more developers to use its service and on bringing as much data as possible into the system.

Getting Data into Factual.

To enter data, users could obviously tediously enter the data field by field, or upload spreadsheets in most of the standard formats. The service also provides a number of easier ways to import data. You can, for example, give Factual a URL of any website or Wikipedia page that includes tables and the service will automatically create a new table based on this data. We tried this with tables from a number of sites and it generally worked well and only required a few edits. For advanced users, Factual also includes a number of more advanced extraction tools.

Once the data is available on Factual, developers can obviously use the API to read, write and mash this data up in any form they like. Users can also edit tables directly on the site or through an embedded table. In addition, users can mash up and combine existing tables.

Currently, Factual only offers one relatively basic embeddable widget that can only display the table without any graphical embellishments. The company plans to rely on developers to create other ways to access and display the data available on the service.

Not a Wiki

While Factual allows any user to make changes to the database, Factual's model is slightly different from the standard wiki approach where only the last edit is generally visible to the public. Changes made to a fact in a Factual database are more like votes for a certain entry. If three users or data sources say a restaurant doesn't offer vegetarian food, for example, and one user says it does, then the table will display the fact that the majority of users entered. Factual, however, will also display a question mark next to this disputed entry. Users can click on this question mark to see all the editors and data sources.

Factual will obviously try to weed out spam here as well, though given how new the service is, it's hard to evaluate how effective Factual's spam filters are.

License

Users who enter data into a Factual database do not automatically give up their copyright - though given that Factual focuses on facts, which typically can't be copyrighted anyway, this shouldn't be too much of a problem. Users can, however, choose an open license for their work, which might be necessary if the table they used to seed their database was licensed under a Creative Commons license, for example. Factual's FAQ explains this issue in greater detail.

Would You Use an Open Data Service?

With regards to the question of why businesses would open up their data, Gilad Elbaz told us yesterday that he believes open data could eventually go the way of open source, which also had a hard time to get acceptance among businesses. While open source software is a tool that a lot of companies now use, data is usually what is at the heart of a company's products and it remains to be seen how many companies would really want to put their data into an open database. For now, we mostly expect non-profits and government organizations to make use of this service.

]]> Discuss]]>
http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php News Tue, 13 Oct 2009 05:00:00 -0800 Frederic Lardinois
City of Portland, Oregon Officially Backs Open, Structured Data portland max by Stu Seeger.jpgThe City Council of Portland, Oregon unanimously approved a resolution today that directs the city government to open data to outside developers and encourages adoption of open source solutions in technology procurement.

Like the creation of railroads and highways fostered economic development in the past, giving software developers access to a landscape of municipal data could be the beginning of a foundation for a new era of innovation.

]]> "This [resolution] will increase efficiency in local government... democratize public data itself... and it will foster innovation among Portland's world class software community," said Skip Newberry with the Mayor's Office in his testimony according to a report on the local tech blog Silicon Florist.

The full text of the resolution has been posted as text (from a PDF) on the same blog.

Portland joins San Francisco, Chicago and Vancouver, British Columbia as cities with major initiatives to offer municipal data in formats that will enable independent developers to build new applications leveraging that data. Making municipal data openly available for developers could be the contemporary economic equivalent of paving roads and installing electricity that can be used to open new businesses and better serve the people living in that city.

Portland, Oregon isn't new to tech innovation, of course. It's a place where the city bus system has its own app store, it's home to red-hot mobile development shops like Small Society (built iPhone apps for Starbucks, WholeFoods etc.) and Urban Airship (iPhone push infrastructure) and it's the home of Linux creator Linus Torvalds, wiki inventor Ward Cunningham and one third of the staff of ReadWriteWeb - amongst other geekery.

What could come next? How about more cities getting on board, a national or international standard for municipal data and delivery of that data in real time? One Prefecture in Japan has announced that it will promote the mobile Augmented Reality app Sekai Camera to display historical data about locations in the area. Seeing individual cities move in this direction is a great start.

What US city will move in favor of open source and open, structured data next? Seattle? New York? Someplace in the Mid West? Place your bets now as these are unlikely to be isolated developments.

Photo: "Max" Creative Commons by Stu Seeger.

]]> Discuss]]>
http://www.readwriteweb.com/archives/city_of_portland_oregon_officially_backs_open_stru.php http://www.readwriteweb.com/archives/city_of_portland_oregon_officially_backs_open_stru.php NYT Wed, 30 Sep 2009 12:54:12 -0800 Marshall Kirkpatrick
Weekly Wrapup, 7-11 April 2008 Here are some of the highlights from the week's Web Tech action on ReadWriteWeb. The big news was Google App Engine - we provided extensive coverage and analysis. Also this week we looked into further use cases for Twitter, we analyzed the pros and cons of offline access to web apps, as well as why we need web apps on the desktop. We gave you seven tips to make the best use of your RSS Reader, we advised on the best places to find open data, and we looked at business development 2.0 and marketing 2.0 trends.

]]> For those of you reading this via our website, note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email.

Web Apps

Google App Engine: History's Next Step or Monopolistic Boondoggle?

The big news this week was the launch of Google App Engine, "a developer tool that enables you to run your web applications on Google's infrastructure." This will allow startups to use Google's web servers, APIs, and other developer tools to build a web app on top of. Google clearly has the scale and smarts to provide this platform service to developers. However, it begs the question: why would a startup want to hand over that much control and dependence to a big Internet company? Check out Marshall's analysis.

A new feature at ReadWriteWeb is an interactive game helping you to understand and contextualize web tech news in a fun way. This week we posted, via a new app from Impact Games, an interactive game that will let you model Google App Engine's impact in the marketplace. You can play the game here.

Our other coverage: Google App Engine: Cloud Control to Major Tom; HuddleChat: Did Google Just Rip Off 37Signals? and Google Takes Down HuddleChat After Complaints About 37Signals Ripoff

Related: Red Dog: Microsoft's Answer to App Engine and AWS?

How to Get Customer Service via Twitter

There has been a lot of talk lately of companies monitoring social media, be it Twitter, blogs, or social networking sites, for mentions of their company name and responding to customer service issues. Some of this interaction has been in the Twitter community, with Comcast being one of the more active participants as of late. Although in some cases, customers twittered their frustration after failing to receive the support they needed through traditional methods, in many cases, Twitter was the first place the customers vented their frustration, and then were surprised when they received a response from a support rep or company spokesperson.

Related: 5 Ways to Find More Friends on Twitter and Twittermethis Is A Brilliant Marketing Experiment

Seven Tips for Making the Most of Your RSS Reader

Picture 62.pngRSS is a big deal, as anyone who's subscribed to even a few feeds probably knows. Once you get past just a few feeds, though, it can quickly get overwhelming. RSS can leave you feeling inadequate, brain-dead and uninspired. Trying out new things will help you discover new, magical experiences, though. Letting go of the stress caused by any obligation to read everything will go a long way.

Here are seven tips for making the most of your RSS reader, from simple to more complex.

SEE MORE WEB APPS COVERAGE IN OUR WEB APPS CATEGORY

Web Trends

How Important is Offline Access, Anyway?

In today's world, you're never too far from an internet connection. In developed countries, broadband access is available in more places than ever, and even poorer countries have internet cafes sprouting up left and right. Modern web workers and business travelers even take extra precautions to maintain always-on connectivity - packing air cards in their laptop bags or buying laptops that already have built-in EVDO access.

Despite the broad availability of internet access, it's the dead spots that have been pushing forward the need for offline access to web apps. For how can a web office suite like Google Docs or Zoho compete with desktop software if they become unusable when the internet connection disappears?

Why We Need Web Apps on the Desktop

Sarah Perez conclued in the above post that offline access is important now, but not as important as it once was. And that with the increasing ubiquity of Internet access, it is growing less important every day. However Josh Catone thinks there is an important distinction to be made between offline access to web apps (as Google Gears provides) and desktop access to web apps (as Mozilla's Prism and Adobe's AIR provide). The latter is a very important step in the evolution of web apps.

Where to Find Open Data on the Web

This week there was a story on Techmeme entitled "We Need a Wikipedia for data". The article, written by X-Googler Bret Taylor, discussed the difficulty of finding open data sets on the internet, something which could spur innovation, allowing programmers to build new applications the likes of which have never been seen before. What was interesting about this story, in addition to, obviously, the concept of a Data Wiki itself, was the amazing and insightful commentary around this concept, not just on the blog, but all over the net, something which led to the discovery of some pretty good data sources that are already available.

A Guide to Business Development 2.0

At least once each day I get a call from someone trying to sell me outsourced development services. It's difficult to not be frustrated with these calls and it is increasingly hard to be polite, because they come so frequently. Yet, more than frustrated, I am just puzzled. Does this tactic still work? Who in this day and age would give business based on a cold call? These companies could definitely use a dose of business development 2.0.

Related: Marketing 2.0: Can Meebo Make it Real?

SEE MORE WEB TRENDS COVERAGE IN OUR TRENDS CATEGORY

That's a wrap for another week! Enjoy your weekend everyone.

]]> Discuss]]>
http://www.readwriteweb.com/archives/weekly_wrapup_7-11_april_2008.php http://www.readwriteweb.com/archives/weekly_wrapup_7-11_april_2008.php Weekly Wrap-ups Sat, 12 Apr 2008 12:30:00 -0800 Richard MacManus
Where to Find Open Data on the Web Today, a story on Techmeme caught our eye. It was entitled "We Need a Wikipedia for data," and the article, written by X-Googler Bret Taylor, discussed the difficulty of finding open data sets on the internet, something which could spur innovation, allowing programmers to build new applications the likes of which have never been seen before. What was interesting about this story, in addition to, obviously, the concept of a Data Wiki itself, was the amazing and insightful commentary around this concept, not just on the blog, but all over the net, something which led to the discovery of some pretty good data sources that are already available.

]]> In Bret's story, he mentioned some of the common data sources currently available, like the US Census Bureau's map data and the Reuters corpus, but his commenters came up with a few more. (See? This is why blog comments matter).

In addition, as CNet and Ryan Stewart's blog spread the story, more people chimed in with suggestions. And of course, the Hacker News guys had some more ideas themselves.

So what did everyone come up with? A lot of data sources are already freely available on the net, as it turns out, if you just know where to look. Here's a summary, do you have anything to add?

CKAN (Comprehensive Knowledge Archive Network)

The CKAN site is a registry of open knowledge packages and projects. Here, you can find open knowledge resources or register one of your own. What kind of stuff can you find at CKAN? They mention a set of Shakespeare's works, a global population density database, the voting records of MPs, or 30 years of US patents as some examples, but they also point you to some useful URLs, like flickr's Creative Commons page, where photos can be searched by license type.

CKAN

Infochimps.org

This project is attempting to assemble and interconnect the world's best repository for raw data - like a giant, free, open almanac. The best way to describe it comes from MetaFilter, where the project was spotted recently: "Just as Wikipedia will help you find out something about everything, infochimps.org will help you find out everything about something." What can you find there? Every wikipedia infobox, each infobox type in its own table, 50 years of global hourly weather data, all the tables from the US Census Statistical Abstract, oh and 100,000 official crossword words, too.

Infochimps.org

OpenStreetMap

Not a data set in the traditional sense, but definitely a useful tool, OpenStreetMap is a free, editable map of the world where you can view, edit, and use your own geographical data. The project was started because most maps actually have legal or technical restrictions on their use.

OpenStreetMap

MusicBrainz

A user-maintained community metadatabase site which collects music "metadata" like artist name, release title, list of tracks, etc. You can browse through the site or you can use a client program, like their own taggers, to help identify music collections. 

Musicbrainz

Jigsaw

Dismissed by the blogosphere as a bad idea, if not downright evil, Jigsaw, the marketplace that pays you to give up other people's contact info now boasts 7 million complete contacts for the taking.

DBpedia

This site is a community effort to extract structured info from Wikipedia and make that data publicly available on the web, essentially turning Wikipedia into a database you can query. Is this the beginnings of a semantic web? Check out their downloads section for the datasets and then scroll to the bottom for even more links to data sources on the web.

DBpedia

flickr wrappr

Where DBpedia takes Wikipedia and makes it semantic, flickr wrappr extends DBpedia with RDF links to photos posted on flickr. Here's an example. Here's another. This is pure geek hotness.

Freebase

Freebase, an open, shared database of the world's knowledge, received a lot of mentions in the comments, so this must be a good one. Community built and maintained, it pulls from open data sources like Wikipedia, MusicBrainz, and the SEC archives to create structured information on many topics, including more popular ones like movies, music, people, and locations. The site, unlike some of the others in this list, is also easy to navigate and well-designed, which makes it that much better to use.

Freebase

Opentick

Perhaps one of the less interesting items due to its dry subject matter - financial data - it's certainly worth a mention because a free database of real-time and historical market data for trading systems and platforms is the kind of thing that really floats some people's boats.

ThingISBN

Thanks to LibraryThing, ThingISBN is the site's first API, and even though its competitor became a paid service, ThingISBN is still free for non-commercial use. The API doesn't just return the usual book data, but also something called "edition disambiguation," meaning it also returns a list of "related" ISBNs—other editions, other media, and translations.

Numbrary

Like the title suggests, Numbrary is a library for numbers. This free service helps you find, use, and share numbers from public record data sets, like census data or the CIA World Factbook.

Numbrary

theinfo.org

This site isn't just a place to build or collect data sets, of which they have quite a nice list, but a place where you can interact with other number-lovin' folks like yourself.

theinfo.org

The Data Wrangling blog

This blog post lists a bunch, and I mean a bunch, of open datasets on the web, which just goes to show how much of a cursory list my post really is.

]]> Discuss]]>
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php Product Reviews Wed, 09 Apr 2008 09:46:53 -0800 Sarah Perez