data management - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/data management en Copyright 2009 Richard MacManus readwriteweb@gmail.com Tue, 24 Nov 2009 12:40:23 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss The Web of Data: Creating Machine-Accessible Information In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable, respectively. In this post, we will look at the first of these Webs (of Data) and see how making information accessible to machines will transform how we find information.

]]>Sponsor

]]> The amount of information and services available is growing exponentially. Every day, it is getting harder to find the information we are actually looking for. Still, we have to learn how to tell machines what we want. Why can't a machine understand which website, recent tweet, Flickr photo, Facebook message, or restaurant we are currently looking for?

Because it can't. It does not understand. It has no access to most sources. It lacks the semantic understanding and common sense to build bridges between information.

It is critical that machines gain a new level of understanding. Instead of statistically computing how well a search term matches a document, a machine must literally be able to understand. Therefore, knowledge bases are needed to look things up. Examples of these knowledge bases include:

  • an encyclopedia containing knowledge to look up the semantic meaning and context of a particular term (e.g. to understand that Berlin is a city, how many people live there, and where it is),
  • Yellow Pages or a service pool to query often-changing and more complex information (e.g. a route from Berlin to Porto by car, or the current temperature of Porto in Celsius),
  • a people database to look up profile information, with user permissions, which could improve personalization and recommendations.

The Web of Data

The idea of the Web of Data originated with the Semantic Web. People tried to solve the problem of the inherent inability of machines to understand web pages. Initially, the aim of the Semantic Web was to invisibly annotate web pages with a set of meta-attributes and categories to enable machines to interpret text and put it in some kind of context. This approach did not succeed because the annotation was too complicated for humans who had no technical background. Similar approaches, like microformats, simplify the markup process and thus help bootstrap this chicken-egg problem.

These approaches have in common the effort to improve the machine-accessibility of knowledge on web pages that were designed to be consumed by humans. Furthermore, these sites contain a lot of information that is irrelevant to machines and that needs to be filtered. What is needed is a knowledge base for machines to look up "noiseless" information. But wait! Who said that machines and us humans need to share one web anyway?

The idea of the Web of Data came about as a result of both this limitation and the existence of countless structured data sets distributed all over the world and containing all kinds of information. These data sets are the property of companies that trend to make them accessible. Typically, a data set contains knowledge about a particular domain, like books, music, encyclopedic data, companies, you name it. If these data sets were interconnected (i.e. link to each other like websites), a machine could traverse this independent web of noiseless, structured information to gather semantic knowledge of arbitrary entities and domains. The result would be a massive, freely accessible knowledge base forming the foundation of a new generation of applications and services.

Linking Open Data

One promising approach is W3C's Linking Open Data (LOD) project. The above image illustrates participating data sets. The data sets themselves are set up to re-use existing ontologies such as WordNet, FOAF, and SKOS and interconnect them.

The data sets all grant access to their knowledge bases and link to items of other data sets. The project follows basic design principles of the World Wide Web: simplicity, tolerance, modular design, and decentralization. The LOD project currently counts more than 2 billion RDF triples, which is a lot of knowledge. (A triple is a piece of information that consists of a subject, predicate, and object to express a particular subject's property or relationship to another subject.) Also, the number of participating data sets is rapidly growing. The data sets currently can be accessed in heterogeneous ways; for example, through a semantic web browser or by being crawled by a semantic search engine.

To get a feeling of how this machine Web of Data feels like, you may want to look up:

With every fact available on the Web of Data, more general and specific knowledge is made accessible to machines that will enable a whole new generation of services to be created. Highly sophisticated queries become machine-processable and accessible to the next generation of, say, search services.

Check out Tim Berners-Lee's talk at TED about the Web of Data. How do you think about it? Do you encounter the same issues being overloaded by information or too much noise?

(Photo by zorro-art. Graph by the Linking Open Data project.)

]]>Discuss]]>
http://www.readwriteweb.com/archives/web_of_data_machine_accessible_information.php http://www.readwriteweb.com/archives/web_of_data_machine_accessible_information.php Semantic Web Sat, 18 Apr 2009 10:00:00 -0800 Alexander Korth
XBRL: Mashing up Financial Statements Amid the dark days on Wall Street and in global markets, it seems to be up to technology to step up and deliver solid analysis and rational scrutiny. The US market regulator, the Securities and Exchange Commission (SEC), ratified a proposal on Wednesday for public companies and mutual fund companies to file their financial statements in XBRL (eXtensible Business Reporting Language). The XML-based language is also known as "Interactive Data" in financial circles and promises faster analysis with wider coverage. All things being equal, it will mitigate the poor analysis and regulation that's been contributing to stupendously bad financial decisions.

]]>Sponsor

]]> This is a guest post by Derek Abdinor.

Companies have traditionally filed on paper, in ASCII, or in HTML: all essentially lifeless formats for conducting any meaningful comparison or analysis. With XBRL, every line item is given a tag that identifies it and its role in the financial statement. Imagine that a line item -- say, "Net Income" -- is tagged like a migrating goose (which frequently happens, I'm told). That goose is part of a flight of geese, which may change their course mid-flight, fly over national borders, have babies, and even join another flight. But thanks to the vital information on the goose's tag, we never lose the original information, and we are even able to see it in the context of other information.

Financial accounts are the same. Figures get re-purposed all over the place, which leads to input errors, or worse. It's easy to cover up information or fail to notice business risks when the analysis is relegated to a footnote somewhere, and you're reading the annual report like a "Choose Your Own Adventure" book.

When financial data is tagged, it's begging to get mashed up. Take a look at this comparison of executive pay, this dynamic charting, and the SEC's own repository and viewer. Software exists that can be first used upstream with the creation of management accounts and go all the way through to taxonomy design, document tagging, and viewing. One would be able to call up income statements of two or more companies in different sectors and different countries and compare line items in seconds.

But to see XBRL simply as a means of marking up financial statements at the end of a financial reporting period is to miss the rest of the iceberg. If financial items are automatically tagged upon their creation using a system like SAP, the rich analysis can be filtered through the enterprise and to suppliers. Triggers and reports can be generated on the fly. Knowledge workers will be manipulating XBRL without knowing it by its accurate, albeit consonant-heavy, name.

XBRL, by its nature, has largely escaped the wave of Enterprise 2.0 functionality. But the openness of the data, its ability to be mashed up and displayed in previously unthought-of ways, will impress itself upon a public disillusioned with poor financial management -- management that has itself partly relied on poor data. It's time for some developer rock stars to step in and make those spreadsheets sing.

More about XBRL

Often described as being simply complex, XBRL should be approached from a technological, as well as an accounting, perspective. XBRL is simply a flavor of XML. Financial line items, totals, text, and metadata are XML elements that are mapped to a predefined schema (called a "taxonomy" in XBRL). In all cases, these taxonomies are the financial rules of accounting for that jurisdiction. Throw in XPath, XLink, and more, and you have a mature language for tagging and submitting your financials.

An introductory resource to begin with is Wikipedia, which links to the various regulatory bodies, IT initiatives, and current issues. The "XBRL in Plain English" video is specific to executive summaries.

This was a guest post by Derek Abdinor, a divisional director at motiv - the Investor and Branding agency of Ince, a large communications concern from South Africa.

]]>Discuss]]>
http://www.readwriteweb.com/archives/xbrl_mashing_up_financial_statements.php http://www.readwriteweb.com/archives/xbrl_mashing_up_financial_statements.php Financial Mon, 22 Dec 2008 19:00:00 -0800 Guest Author