data mining - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/data mining en Copyright 2009 Richard MacManus readwriteweb@gmail.com Sat, 21 Nov 2009 05:00:00 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Facebook Data Mining: Truth in Association? facebook_datamining_sept09.jpgWith a product as ubiquitous as Facebook, the public has raised a number of privacy-related concerns including optional settings, privacy policies and data mining. In the past, ReadWriteWeb covered Facebook's plans to sell user data for market research purposes. However, today's article in the Boston Globe suggests that user information can be mined for more than just advertising purposes.

]]>Sponsor

]]> facebook_socialgraph_sept09.jpgAn MIT experiment dubbed, "Gaydar" by creators Carter Jernigan and Behram Mistree has employed computational analysis to identify user traits based on information listed by their Facebook friends. Through friend profiles, the program predicts the likelihood of your religious affiliations, political leanings and even your sexual orientation. Essentially the idea is that friends are likely to share traits. So if you're in the closet, but you've got loads of vocal friends, a program of this nature could potentially out you.

Said Hal Abelson, a professor who co-taught the course, "[It] pulls the rug out from a whole policy and technology perspective that the point is to give you control over your information - because you don't have control over your information."

With the service being used to catch tax evaders, in addition to a conspiracy theory citing CIA ties, it'll be interesting to see how the public reacts to this latest show of Facebook data mining capabilities. While it's unlikely that terrorist suspects are friending each other on Facebook, there are a number of associations that need not be publicized to corporate partners or governments.

Photo Credit: Steve Jurvetson

]]>Discuss]]>
http://www.readwriteweb.com/archives/facebook_data_mining_truth_in_association.php http://www.readwriteweb.com/archives/facebook_data_mining_truth_in_association.php Facebook Sun, 20 Sep 2009 19:41:26 -0800 Dana Oshiro
Status.net Could Point to the Future of Business Intelligence statuslogo.jpgFew companies have captured the world's attention online in recent years as much as Twitter has. Rapid, structured, public communication between groups of people is not only a personal paradigm changer for many who have seriously explored the service - it's also an incredible opportunity to analyze a rich and dynamic set of data about interpersonal conversation.

First the Web, then email, then instant messaging and SMS all helped speed up the world we live in. Twitter made that rapid communication public and easier than ever for machines to mine for connections. Just as Facebook will never be Twitter because of the lack of clear access it offers outsiders to social data, so too does Twitter have its own limitations. A service called Status.net will launch in May that could overcome some of Twitter's limitations and make a significant impact on the world we work in.

]]>Sponsor

]]> Laconica, the Canadian company offering the most popular Open Source alternative to Twitter, announced plans today to begin selling subscriptions to hosted microblogging installations for businesses. The default address of these new sites will be yourname.status.net. We suspect that this could be a very big deal. (We found out about it from coverage on Microblink on Techmeme.)

Step One, People Will Want It

twitarmyscreen.jpgLaconica already allows anyone to install its software on their own servers, for free (see Leo Laporte's Twit Army for example), but the easy paid offering from Status.net could catch on much faster. The service provider will be responsible for maintenance, upgrades will come automatically, the URL is clear and dignified and the fact that the software is open source could enable a plug-in and extension community to grow around the architecture as soon as it gets large enough for that to be viable.

Companies will pay to have either public or private microblogging installations hosted and branded for them. They will do so because if they do not - their employees will have no group of allied professionals to securely cry out to for help with work problems. Their departments will remain out of touch and unfamiliar with the people and work being done around their own company. Companies without a microblogging system will seem as silly and disadvantaged in the future as companies do today that say "we don't need Instant Messaging, we have email," or "we don't need email, we have a fax machine."

Step Two, People Will Build on It

Some companies will use the hosted Status.net platform, others will decide to put Laconica on their own servers and others still will decide to use some other provider's business oriented but developer friendly microblogging service.

Once that fundamentally structured layer of social conversation has spread throughout a substantial portion of the business world, hopefully as interoperable Open Source software, here's what will happen.

We discussed one of the most potent applications analyzing Twitter social connection data in a recent post titled The Inner Circles of 10 Geek Heroes on Twitter.

These are the kinds of birds eye views through data parsing that an Open Source microblogging platform for businesses will enable. All of the following is based on nothing more than cross referencing user profiles, friend connections and public replies between users. Any parts of this vision that aren't simple will be simpler for someone to build once there's adoption and Open Source code.

In private networks, a company will be able to receive automatic notification when one of its employees has begun conversing with another particular employee more than they had before. Perhaps they'll consider putting them in the same work group.

If one sales person doesn't converse with the technical team as often as other sales people do, a company might wonder whether that salesperson is less comfortable explaining technical matters to customers. It will be trivial to determine which technical staff are friendliest and most appropriate to introduce a sales person to, because those kinds of connections will be fully graphable.

In public business networks, community managers will be able to identify the customers most engaged in conversation with diverse groups of other customers with the snap of the fingers. Those are the kinds of community members that companies hire. Companies will be able to see if groups of people with similar traits in their profiles are asking for customer service more often than other groups, and when they seek to engage with those communities in order to improve product usability for them - the contours of that community will be easier than ever to understand.

People say that the phrase Social Graph is too vague, but when it comes to structured, open microblogging - social connections through conversation and content are literally graphable. Here are the users, here are their friends, here are their public messages and here are their replies to one another - just drawn a line from one column to one row and a narrative will be formed by the data. Repeat that process and you'll be able to build stories around trends.

Is this creepy? It doesn't have to be. There's a whole lot of exciting potential here and if an increasingly open technology world can help the business world understand the value of open over control (as it is) then this kind of analysis could be democratized and used for good.

Let's look at this from the perspective of Twitter right now. When I'm away from my computer and think of a question I need answered, I can send that question out to my Twitter network by SMS. Three people might post a public reply answering my question. When I get back to Twitter, I see those three replies and I publicly thank one of those people in particular for providing such a good answer.

Now repeat. Again and again, throughout an organization, across multiple organizations. Knowledge sharing paths get worn in the virtual grass of the public field of microblogging. Smart companies want their people creating those paths and only a fool would neglect an opportunity to illuminate these connections in the eyes of management.

It won't happen on Twitter alone, though. It's too public, the company is too bound by its own limitations on how much data it really wants anyone else to pull out of the river of Tweets and relatively small groups are a very important part of the future of microblogging.

We expect that hosted or free company-specific microblogging installations will become huge sources of Business Intelligence data and we hope that happens through interoperable, Open Source software. We're excited to see what Laconica can do with Status.net.

]]>Discuss]]>
http://www.readwriteweb.com/archives/statusnet_could_point_to_the_future_of_business_intelligence.php http://www.readwriteweb.com/archives/statusnet_could_point_to_the_future_of_business_intelligence.php Analysis Mon, 30 Mar 2009 18:09:21 -0800 Marshall Kirkpatrick
ShareThis.com Aims to Become A Big Data Platform in the Next Web Have you noticed those little links next to blog posts and news stories that say "Share This"? Click on that link and you get a pop-up with options to share an article on Delicious, Facebook, StumbleUpon or other services. Did you know that ShareThis.com has raised $21 million from venture capitalists for its version of that service?

If you think that's crazy - you're wrong. ShareThis is a great example of the kind of company that could become a key foundation for innovation in the next era of the web. If it doesn't sell out to advertisers too quickly or too completely. The company released a new version of its widget today and I took the opportunity to talk to CEO Tim Schigel about where the company is headed in the future.

]]>Sponsor

]]> The Sexy New Widget

ShareThis Widget 2.0 from Dave Donohue on Vimeo.

Little changes can mean tens of millions of click-throughs won or lost for a company like ShareThis. The new widget seems like a real improvement. I especially like the one-click buttons to share items with frequent contacts - I use a similar feature on the StumbleUpon toolbar to email things to my wife sometimes, because it's so much faster than email.

Context

Ok, it's a sharing widget with a fat bankroll. What does it really mean though?

Here's how I see it. If the current iteration of the web is based on everyday people creating and distributing content, many people believe that the next iteration will be based on the use of machine learning to build new layers of value on top of that content. What's hot, with what audiences and what kinds of data parsing magic can we work with that information? Few companies are as well positioned to do interesting things with that kind of data as ShareThis. The company says its service is now live on more than 80,000 sites, from scores of small blogs to some very big brands on the web, like ESPN.com, FoxNews, AccessHollywood and Boston.com.

The company has a lot of opportunity for data-centric innovation and CEO Tim Schigel says that's the direction he's looking to take things. You have to hope that companies like this can pull it off and turn into the platforms they say the want to be - and not just advertising platforms, either.

Schigel says that he's watching OpenID closely and that he was pushing Facebook for something like Connect before the service existed. He also told us that there would "soon" be a way for users to easily export their history of shared items, especially now that ShareThis is putting a new emphasis on bookmarking for later retrieval and not just sharing items. I hope that's all true, but when there are tens if not hundreds of millions of dollars on the table I never hold our breath about such high minded statements being anything more than PR.

In the meantime, though, it's fascinating to think about what ShareThis is going to do with a big pile of user data and a big pile of money.

The Business Plan For Data

ShareThis gets to see a whole lot of interesting things about the ways we share content online. In August the company published a report about the most common tools people use for sharing. The big takeaway? Email still totally dominates online sharing, even through the ShareThis widget. The second most popular method of sharing was for people to publish content into their Facebook news streams. That data told content producers everywhere that if they want to help readers share their content with larger numbers of people, it's important to make email and Facebook as easy to access as possible.

Beyond different methods of sharing, though, ShareThis has obviously got a lot of data about what kind of content is being shared. I asked Schigel whether ShareThis would be sharing this kind of data it collects, in aggregate, with marketers. "That's ultimately where we go with the business model," he said. The company is talking with selected marketers about sharing access to market insights now, but Schigel emphasized that a few conditions needed to be respected. "We need to make sure that publishers can build trust with their readers," he said, "and we need something unique that marketers can't get elsewhere."

What does ShareThis have that Facebook, for example, doesn't have? Schigel says his company can offer platform independence and a much lower price point. By doing nothing but facilitating sharing, ShareThis simply doesn't have the kind of overhead that Facebook requires to run its entire social networking site.

Obviously ShareThis, even with the success its had in spreading its service so far, is going to need to be in a whole lot more places. Part of that increased reach, the company hopes, will come from its developer platform.

ShareThis for Developers

ShareThis already has a developer Application Programming Interface (API) but Schigel says there will be multiple APIs made available soon. The current offering already allows developers to rewrite attributes, like the title, of shared content objects. Hopefully future APIs will give maximum freedom to developers to do things with shared content data that can't even be imagined yet.

Both marketers and developers will soon be getting access to much more sophisticated data streams than mere bulk popularity. Schigel says that ShareThis is filling its Mountain View office with data wonks and PhDs who are aimed at taking ShareThis data beyond the most immediately obvious opportunities, like content recommendation. The company's Principal Scientist, Huitao Luo has worked as a data scientist at LinkedIn, Yahoo! and at the innovative HP Labs. At HP Lou published on research in algorithms developed for cascading classification systems. Recently hired research architect Gordon Rios came from Inktomi/Yahoo, innovative white label calendaring company ZVents and a list of other companies. Rios has a background in data classification, determination of content's international relevance and spam detection.

These are heavy hitters who should offer up some really innovative APIs for the developer community to process user "attention data" and for marketers to monitor trends in interesting and granular ways.

Hopefully it won't all be done in crass service to the interests of advertisers alone. In order to build that trust that Schigel says he wants with publishers and with developers, ShareThis is going to have to offer some of the network effects its capturing to its non-advertiser partners - not just a handy little widget for distribution. That's not unique enough.

Could ShareThis end up turning its little widget into a big company? I wouldn't bet against it. Will Schigel and his crew of scientists also take advantage of the opportunity to facilitate value creation by a larger web of data-centric content and development innovators, thus growing the total pie that the ad market wants a piece of? We can only hope.

]]>Discuss]]>
http://www.readwriteweb.com/archives/sharethiscom_aims_to_become_a_paltform.php http://www.readwriteweb.com/archives/sharethiscom_aims_to_become_a_paltform.php Market Analysis Wed, 18 Feb 2009 15:04:42 -0800 Marshall Kirkpatrick
How a Facebook "Sentiment Engine" Could Be Huge Rumors of a Facebook "sentiment engine" analyzing aggregate user data, or a new form of the company's Engagement Ads that offer rapid polling to advertisers, have been flying around the web since the start of this week's World Economic Forum in Switzerland. The reality behind these rumors seems to be much less exciting (or creepy, depending on your perspective) than many people claimed - but we'd like to entertain some thoughts on just what Facebook could do with such a system.

Remember when Google published the most popular searches being performed during the Presidential debates? That represented a sea change in real-time awareness of what people care about. A Facebook "sentiment engine" has that same kind of potential, and we'd love to see some of this data be put to use in service of such innovation.

]]>Sponsor

]]> Best Case: A Living History

Photo by Flickr user OiMax.jpgFacebook is a place where 150 million people are conversing in real time, with people that they know, about the world around them. Those conversations are going on primarily in text. While most of the discussions around a Facebook sentiment engine have referenced the collection of data through active engagement in polls, there's also a whole lot of passive discourse that could be mined in interesting ways.

This author would gladly opt-in, especially through the kind of granular privacy controls that Facebook is so good about offering, to allow the company to add what I say on Facebook to a big collection of data for analysis. Our individual identities need not be tied to our words, but the demographic data associated with our confirmed identities is invaluable.

Think of the non-commercial, public interest kind of data that could be acquired. When the economic stimulus plan of 2009 was first announced on national television - what was the reaction of people in their mid twenties who lived in the Mid West of the US? Was that collective reaction substantially different from the reaction of self-identified queer people of color living in the North East US? How did the public reaction to the proposed plan change one hour, one day or one week after the announcement? This is all very interesting and potentially valuable data that could be, for the first time in history, available in near real time. Just by listening to what people are talking about in status updates and comments.

Privacy concerns are very important, but with enough opt-in required, there could be a system set up that could funnel data straight from Facebook into the US Library of Congress or historical archives.

Facebook is one of very few companies we can think of with the access to public sentiment to make something like this happen - plus the imagination and willingness to survive backlash for a good idea.

More Likely: Polls and Product Feedback

Photo by Flickr user Kthypryn.jpgThe above scenario is a pretty far-out one, unfortunately. Far more likely is that Facebook will perform political polls and sell access to product feedback survey systems to advertisers. Those paths could be interesting as well, though they fail to solve the problem of public discussion data running unused down the drain.

We'd love to see even these approaches used seriously. Polls could be performed quickly, even as part of corporate crisis management processes, if companies have the guts and Facebook can handle the sales requirements. The age of quieting down a controversy are over - why not buy some polls on Facebook to see how real people are actually responding to bad news about your company?

Likewise, both active and passive consumer sentiment about products could lead to some very cool features. RWW reader Scott Aikin says he thinks a sentiment engine would work well in powering a consumer goods search engine. I need a pair of headphones, which ones are most liked by my friends, people in my own demographic group or people who are older and wiser? I'll do the search on Facebook and they can have affiliate revenue when I make a purchase.

That sounds like a great idea. There are a number of great ideas that could be actualized by a Facebook "sentiment engine." Most promising, though least likely, is a system for monitoring public reaction to historic events. We have huge concerns about Facebook's being a proprietary set of technologies, but we're fascinated by the company and its potential at the same time. A Facebook sentiment engine would be like a real-time, hyper detailed census - and as such Facebook should make freely available as much of the data it gathers as it can.

Will the company do anything like this? Only time will tell, but if 3 years ago we had told you that the Facebook Newsfeed would become the way that 150 million people found out when a friend's relationship status changed, when a photo of a friend was posted online or when two people you knew were having a conversation - would you have believed it?

]]>Discuss]]>
http://www.readwriteweb.com/archives/facebook_sentiment_engine.php http://www.readwriteweb.com/archives/facebook_sentiment_engine.php Analysis Mon, 02 Feb 2009 16:21:54 -0800 Marshall Kirkpatrick
Zoetrope: New Web Crawler Allows For Searching, Analyzing The Ever-Changing Web Does Adobe think they can out-Google Google? Perhaps. The company is involved with Zoetrope, a joint project with researchers at the University of Washington. What they're building is a tool that allows for manipulating the web over time. Instead of the snapshot of the web you see today when googling, Zoetrope will let anyone use keyword searches to discover archived web information and look for patterns in the data found.

]]>Sponsor

]]> About Zoetrope

As with the Internet Archive, the data in Zoetrope's database is a backup of the entire web, including those pages which have changed over time. But this archive won't be limited to the somewhat inconsistent periodic snapshots of the web's content like the Internet Archive offers. It will encompass everything.

Using the intuitive Zoetrope interface, a user could compare historical changes of various data through time by comparing snapshots of different pages on the web. Analyzing different, changing elements on web pages, side-by-side and over a period of time is downright difficult today - if not impossible. But Zoetrope makes it happen.

The process is done using Zoetrope "lenses" to draw boxes around elements, connect data from one site to another, and pull up charts of relevant data, all while manipulating a slider to scroll back and forth through time. That may sound hard, but if you watch this video, you'll see that it looks surprisingly easy.

For Everyone, Not Just The Computer Savvy

In a way, this project is similar to Google's new visualization API, which lets developers use historical web data to build charts, graphs, gadgets, and the like. However, where Google's tool is aimed at the technically savvy programmer, Zoetrope, on the other hand, is for the average user. Says Dan Weld, a UW computer science and engineering professor who worked on the project, "Zoetrope is aimed at the casual researcher. It's really for anyone who has a question."

As noted in the Washington University article on the project, example uses of Zoetrope could range from the basic: checking historical rankings of favorite players on a sports team, to the advanced: comparing daily air pollution levels in Beijing to number of world's records broken each day in the 2008 Olympics. 

"Your browser is really just a window into the Web as it exists today," said Eytan Adar, University of Washington computer science and engineering doctoral student who's also a co-author of the research paper on the project.

"When you search for something online, you're only getting today's results...This is really a new way to think about storing information on the Web."

The researchers hope to offer Zoetrope for free as early as next summer.

Image credits: Color, Torley; Others, University of Washington

]]>Discuss]]>
http://www.readwriteweb.com/archives/zoetrope_new_web_crawler_searches_analyzes_ever_changing_web.php http://www.readwriteweb.com/archives/zoetrope_new_web_crawler_searches_analyzes_ever_changing_web.php Products Fri, 21 Nov 2008 07:47:01 -0800 Sarah Perez
5 Ways To Visualize The U.S. Elections The U.S. presidential elections are right around the corner and it seems that just about everyone is looking for news, poll results, and other political coverage both online and off. For those of you who are still eagerly devouring anything related to the elections, you'll want to check out these five tools for visualizing election data. From earmarks to electoral votes, there's a lot you can learn from the apps listed here.

]]>Sponsor

]]> 1) Visualize Political Contributions By Industry

The non-profit organization called Sunlight Foundation, whose mission is to use the Internet to make information about the U.S. government more accessible,  just released a visualization of campaign contributions from 1990-2008, broken down by industry sectors and party lines. From this app, profiled on Programmable Web, you can see how the finance, insurance, and real estate industries spend more than others. The visualization is interactive - just push the play button after configuring the settings. It was built using Google Motion Chart and data from OpenSecrets.

2) Visualizing Earmarks

Earmarks are a hot topic in the current U.S. Presidential election. You can visit  the web site earmarkwatch.org to investigate those spending measures inserted by members of Congress into bills that direct taxpayer dollars to their pet projects. But an even easier way to track which states are the worst for using earmarks, this visualization over on ManyEyes is useful. Wow, look at Alaska!

3) Visualizing Election Polls

University of Utah computer scientists have written software they hope will eventually allow anyone to interactively and visually analyze election results, political opinion polls or other surveys. The software displays data in the form of "radial" charts that are doughnut-shaped and include features of traditional pie charts and bar graphs. The charts are interactive and animated, too. You can watch a video demonstration over here, but unfortunately, the poll-analysis software isn't quite ready for prime time. What a tease!

4) Electoral College Prediction Tracker

This interactive visualization widget provides an overview of the predicted outcome of the U.S. presidential election. The rows depict the results from different news agencies (The Washington Post, The New York Times, CNN, etc.) and the columns represent the different U.S. states. The states width is based on the number of electoral votes they have available. Political bloggers will really like this one, too - it's embeddable!

5) The 2008 Presidential Election In The Blogosphere

This next visualization, perspctv.com, is an informational dashboard that summarizes and graphs the Internet activity relating to the 2008 presidential elections. The charts compare the similarities as well as the differences between the mainstream media and user-generated content, such as that found on political blogs. Currently, the graphs include CNN polls, new mentions, blogosphere mentions, Twitter mentions, a U.S. electoral map, and Google Trends-based timelines. (via information aesthetics)

]]>Discuss]]>
http://www.readwriteweb.com/archives/5_ways_to_visualize_the_us_elections.php http://www.readwriteweb.com/archives/5_ways_to_visualize_the_us_elections.php Products Wed, 08 Oct 2008 06:00:00 -0800 Sarah Perez
Government Report Finds Data Mining an Ineffective Way To Smoke Out Terrorists nrclogo.pngRemember the "pre-cog" cop-things in Minority Report, able to figure out who was going to commit a crime before they committed it? If that's ever going to happen it looks like it's going to have to be something super-natural - because at least these days, technology is a long way from able to predict who's going to commit a crime.

A new 350 page report released today, written by heavyweights like former US Secretary of Defense William Perry, National Academy of Engineering President Charles Vest and sponsored by the Department of Homeland Security, argues that large scale data mining of consumer and other records is of "limited effectiveness" in finding suspects preparing to commit acts of terrorism.

]]>Sponsor

]]> The report was published by the National Research Council and was titled All Counterterrorism Programs That Collect and Mine Data Should Be Evaluated for Effectiveness, Privacy Impacts; Congress Should Consider New Privacy Safeguards. CNet's Declan McCullagh says the report offers a retort to the aims of the office of Total Information Awareness, whose duties were dispersed throughout the Federal government after extensive controversy several years ago.

The report notes that while credit agencies have been able to use data mining to find fraudulent financial activities, the tactic is of limited effectiveness in finding would-be terrorists for two reasons. First, because so little about the psychology and behavior of terrorists is known and second, because the resulting data is so rife with false positives that it's of very low quality.

The report argued that it was much more effective to use data mining to track known terrorists or to find people exhibiting very specific behavior. It warned against using such tactics as tracking emotional or psychological states as those are things the authors believe individuals should not be called to account for. Apparently that doesn't go without saying anymore.

perrypalin.pngMuch of the report's summary, and clearly its title, focused on the privacy implications of these false positives in particular and of this kind of data mining in general. Presumably the report was written in a different era, before it became appropriate to try out for the Vice Presidency of this country with words like "Al Qaida terrorists still plot to inflict catastrophic harm on America, and [Barack Obama] he's worried that someone won't read them their rights." (Palin acceptance speech) Evidently we live in a post-rights world now.

Thus what's most significant in today's report is the finding that pre-emptive data mining just doesn't work. Surely the ineffectiveness of pre-emptive actions is significant, isn't it? The report warned against using anti-terrorism data mining as an opportunity to find other actionable information.

The report offers a series of recommendations that include close monitoring of any such programs, possibly even including subjecting data-mining activities to regular data-mining based assessments of thier effectiveness. The report said that "legislation to clarify private-sector rights, responsibilities, and liability in turning over data to the government" was an area "ripe for congressional activity." At a time when neither party running for the US Presidency is willing to mention anything like this, such recommendations might seem either refreshing or insane.

]]>Discuss]]>
http://www.readwriteweb.com/archives/government_report_finds_data_m.php http://www.readwriteweb.com/archives/government_report_finds_data_m.php Analysis Tue, 07 Oct 2008 13:23:07 -0800 Marshall Kirkpatrick
Mememoir: A Better Wiki For Science mememoir.pngThanks to successful projects like Wikipedia or Wikitravel, wikis have quickly become a standard tool on the Internet, but in academia, the anonymity often associated with publishing in wikis is a key factor that works against them. Tracking down the exact history of changes in a wiki entry can be a convoluted process, yet being able to exactly attribute a certain statement to one writer is at the heart of the academic enterprise. Mememoir aims to provide a wiki that is heavily focused on authorship and can help to dispel the prejudices scientists have against publishing in a wiki-like format.

]]>Sponsor

]]> Wikis in Science

Mememoir is a completely new development and as of now, its only deployment is in the form of the WikiGenes wiki. Both Mememoir and Wikigenes, a database of literature about genetic information, were created by Robert Hoffmann, a fellow at Society in Science in Switzerland and a visiting scientist at MIT.

For scientists in academia, publications are the lifeblood of their careers. Having published in a wiki is not going to persuade a tenure track committee anytime soon, but the systems that Mememoir puts in place might just make those contributions stand out a bit more. Besides attribution, Mememoir also gives its users the ability to rates authors and their contributions.

The developers are still looking at their options for possibly open-sourcing the code behind Mememoir. As Robert Hoffmann pointed out to us, the project will look at its options at a later time and is mostly focused on running the Wikigenes project for now.

wikigenes.png

WikiGenes

The information in WikiGenes itself was based on iHop, another project by Hoffman (and not the infamous chain of pancake houses). The idea behind iHop is that information about a single gene can often be dispersed over hundreds of different academic papers, which makes finding and synthesizing all this data extremely hard. IHop used algorithms to parse all this information and bring it together in one database, which was then used to seed WikiGenes.

According to Hoffmann, the idea behind WikiGenes is that it will combat this dispersal of information in the first place, as scientists can enter their research results into the database directly.

Trust and Authorship

WikiTrust, which rates authors on Wikipedia according to an algorithm is trying to do something similar for all of the Wikipedia, but Mememoir takes this to a more personal level. Both systems are, of course, potentially fraught with problems, but it will be interesting to see if scientists will warm up to the wiki model.

We would really like to see Hoffmann and his team open up the code to Mememoir, as the wiki itself is a highly capable piece of code that looks flexible enough to power any kind of wiki - academic or not. In testing it, it turned out one of the easiest to use wikis we have seen so far and it could surely benefit a lot of different projects in the long run. If you would like to see it in action, the project has create a short screen-cast that you can see here.

]]>Discuss]]>
http://www.readwriteweb.com/archives/mememoir_a_better_wiki_for_sci_1.php http://www.readwriteweb.com/archives/mememoir_a_better_wiki_for_sci_1.php Products Fri, 05 Sep 2008 11:10:45 -0800 Frederic Lardinois
The Semantic Desktop? SDS Brings Semantics To Excel When you hear the word "semantic" you likely think of the semantic web - the supposed next iteration of the World Wide Web that features structured data and specific protocols that aim to bring about an "intelligent" web. But the concept of semantics doesn't necessarily apply just to the web - it can apply to other things as well, like your desktop...or even your Excel spreadsheets, according to Ian Goldsmid, founder of Semantic Business Intelligence, whose new app, SDS, brings a semantic system to spreadsheets.

]]>Sponsor

]]> Semantic Spreadsheets

The problem with spreadsheets that their system is trying to address has to do with those who need to derive data from multiple spreadsheets (two or more). Although it's easy enough to perform sorts, build macros, and create formulas within one spreadsheet, when needing to compare values in multiple spreadsheets the process becomes more difficult.

The company's app, The Semantic Discovery System for Excel, or just SDS for short, will look for similar columns or rows between the sheets and then "semantically" connects them. They don't appear to just be throwing that term around either - the app uses the same W3C Semantic Web technologies (RDF, OWL, SPARQL) to help you capture "meaning, intelligence, and knowledge" from the data saved in your spreadsheets.

Do We Need Semantic Desktop Apps?

Does SDS solve a business problem that is not yet being addressed through current technologies? In my experience, the short answer to this question is "no." (But wait, there's more...)

Typically, when a business has need of comparing and analyzing large amounts of data, the solution is to turn to a database product that can then be queried and from which custom reports can be pulled. And a business doesn't need to spend a lot of money on a robust solution to do so - even a smaller business can create a database by using inexpensive desktop software.

However, the difference between using a database technology and "semantically connecting" some spreadsheets comes down to for whom this product is being built. In the past, databases and other business intelligence apps were built as if the creators knew that the only person using them would be an I.T. guy or gal. SDS, instead, aims to satisfy the needs of the non-technical end user.

Is this another example of tech populism at work? It certainly looks like it. Yet, in this case their market is small - a non-technical user who's also a power user with Excel? There's usually some overlap there. Not to mention, by the time you've achieved "power user" status, you've often also figured out how to do more complicated things in Excel...like, say, formulas that work across spreadsheets, for example - the very pain points this app is trying to address.

Still, it's an interesting concept to think of taking the semantic web capabilities and integrating them into everyday programs to add a layer of intelligence to these programs as well. Done correctly, it could improve the capabilities of our favorite software apps without making the programs overly complex, which is what typically happens when you add more features.

What do you think? Is the Semantic Desktop (that is, semantically-enabled desktop apps) right around the corner? Or is this product and those like it too niche to find an audience? Let us know what you think in the comments.

]]>Discuss]]>
http://www.readwriteweb.com/archives/the_semantic_desktop_sds_brings_semantics_to_excel.php http://www.readwriteweb.com/archives/the_semantic_desktop_sds_brings_semantics_to_excel.php Products Wed, 13 Aug 2008 06:30:00 -0800 Sarah Perez
Do You Trust Google to Resist Data Mining Across Services? googlelogo6.jpgGoogle's breadth of services is truly awesome and the amount of information the company touches concerning our lives and world can sometimes feel downright frightening. While almost no one takes the old phrase "Don't Be Evil" seriously anymore now that there are billions of dollars on the table and Chinese autocrats to satisfy - regular evaluations of Google's ethical positions still seem advisable.

One of the big questions being asked with increasing frequency is this: Is Google using data it collects through particular services and using it for its benefit in other services? We know the company scans our GMail and uses the text there to sell ads, but is this a tactic being employed across services? Some people appear to believe it is.

]]>Sponsor

]]> The Fears

When enterprise wiki service Socialtext announced this morning that they were folding Dan Bricklin's SocialCal (Visicalc) spreadsheet into their offerings, the announcement included this interesting customer quote:

"The timing of SocialCalc is perfect - we were in need of a wikified spreadsheet that had all of the utility of Google Docs without the datamining," remarks Brandon Stafford, Principal Engineer at GreenMountain Engineering."

We found it very interesting that a new application would specifically aim at Google's data mining as a weakness. That kind of tactic is likely to become increasingly frequent.

Similarly, when Google's Mark Lucovsky was a guest on last week's Gillmor Gang podcast, he was pressed on the question of data mining concerning the free javascript libraries that Google hosts and offers to developers. Is Google monitoring everything that goes on at the sites that use the libraries and using those observations for market intelligence such as ad sales?

You might remember that was a question people asked about MyBlogLog when Yahoo! bought the widely embedded service. Was Yahoo! using MyBlogLog to spy on AdSense and other activity unrelated to their own technology?

In Google's Defense

The information available cross-application is probably too seductive for Google, or almost any company, to pass up. The search and ad giant's saving grace may be that it has so much information in each silo already that it's uniquely satisfied not cross-pollinating.

Google's Lucovsky told Gillmor that "the Slashdot crowd" might think there's some kind of conspiracy, but that there really isn't. He assured listeners that Google only uses the information it collects from his javascript libraries to improve the service of the javascript library service. "The Slashdot crowd" is old school lingo for nonprofessional writers who post on the web but don't have a vested interest in respecting power - so they point at alleged conspiracies more often than the tamer professional press does.

Behind every alleged conspiracy at a giant company though is just a bunch of people doing their jobs. Only occasionally, we presume, do some of them come up with what would be a great idea as long as they don't get caught.

Data Portability

Some cross pollination of data from one service to another might in fact be great - if users had control over it and could use the same tactic for our own direct benefit. Until that kind of data portability policy and technology are in place, though, may of us would prefer that data remains right where it is and keeps its hands in plain sight.

Perspective

One of the first posts I wrote in my time at TechCrunch was about a Google experiment that would use your computer's microphone to track the ambient audio in a room, determine what TV shows you were watching and then serve up related ads in your browser. Presumably that program hasn't gone anywhere, snooping-obsessed researcher Shumeet Baluja has moved on to other research like monitoring video game players' behavior and psychology for ad targeting and watching how much porn people look at on their mobile phones.

Outside of Google's actions - data integrity (privacy) in hosted services has long been a concern and is now being responded to by some enterprise sales teams with boxes carrying applications locally behind customers' firewalls. As recently as the end of last year, SalesForce.com admitted that one of its employees fell for a phishing scam and handed over the key to that company's customer email accounts.

What if it wasn't wasn't an accident or an outside party though? What if data that was collected in "anonymous aggregate" proved just too juicy for personalization-hungry ad sales teams or security-obsessed government agencies. Do you trust Google to resist mining your data across the various Google services you use? Is avoiding "Google data mining" an effective selling point that would increase your consideration of products from another vendor? We expect that the answers to these questions will change over time and we think it would be wise to revisit them periodically.

]]>Discuss]]>
http://www.readwriteweb.com/archives/do_you_trust_google_to_resist_data_mining_across_services.php http://www.readwriteweb.com/archives/do_you_trust_google_to_resist_data_mining_across_services.php Analysis Tue, 10 Jun 2008 11:05:05 -0800 Marshall Kirkpatrick
i360 Adds Semantics to Everything Tony Sukiennik believes the power of the people trumps the power of the algorithm when it comes to the development of semantic technology. His company, infoGenome, a startup that has been in stealth mode for about four and half years, wants to harness that power by making semantics easy via its innovative drag-and-drop functionality. The i360 software he's developed is essentially the "Mahalo of semantic apps," relying on human knowledge to add meaningful layers of metadata to the information we work with every day. With i360, you can add semantics to everything.

]]>Sponsor

]]> People-Powered Semantics

When you're doing a web search, you instantly know what information is relevant and which isn't. At i360, they call this flash of understanding an "instant of information insight." In a split second you can identify something as being useful, but the problem in today's world is that there are too many ways to store that information - you can tag it, bookmark it, save it to file, email it, blog about it, share it with others, and so on. Overwhelmed by choices, busy people often choose to "just remember it," a decision that leads to the inevitable: forgetting. The human mind is already overloaded with input, so isn't the ideal repository for storing all the complexities of our information-filled lives.

Instead, software should be doing the remembering for us. That's where i360 comes in. The application itself is really just a prototype of this conceptual idea, but one that Tony hopes Google might be interested in. Or maybe Microsoft. (He plans on proposing his ideas to both companies to see who bites.)

What the i360 software does is provide a way quickly add mark up and add meaning to the data you're working with - be it a link on the web, an email, a file, or anything - with semantics. This process is done via a quick drag-and-drop into the app.

That isn't to say that this technology is using semantics in the technical sense of the word - it's not about converting everything into machine-readable formats for use on the semantic web; what it is doing, though, is adding semantics to everything by assigning meaning to that email, that PDF, that link, that note, that spreadsheet, etc. Meaning that only you, and not a computer or an algorithm, could know. In doing so, the technology is not focused on a semantic web per se, but a semantic database of your own, made up of not only web links, but also files, contacts, emails, keywords, and more, and knowing how they all are associated with each other.

Although Tony believes that we shouldn't give up on the algorithm - by all means, research should continue in that area - he feels strongly that his technology, which taps into the power of the human brain, gives people the ability to organize and assign value to information in a way that a machine cannot.

How It Works

What i360 does is complex and sort of hard to understand if you're not working with it directly. In fact, it's easier to understand if you work backwards from the end result of using the technology.

For example, imagine you do a Google Desktop Search or a Google Enterprise Search, and, instead of just links to items that match keywords, you get something a little more like this:

Augmented Search Results

You can see that by using the software, you've managed to associate people, documents, notes, and more with the original file.

The process of making these associations is via a "fire and move on" drag-and-drop methodology. See a useful link? Drag-and-drop it into i360. Highlight some text and drag and drop that as the item's description. Click a button and a screenshot is added automatically. Now associate that link with a person. That  person with a Word document. That document with a search and an email...and so forth, and so on.

Saving a Web Page

Within a company, the i360 technology can also be used to work with internally running applications, like Microsoft's SharePoint, for example...or any other application to which you have the cooperation of the vendor or access to the app's code base. With 100 lines of code, information from these applications can pass data from the app itself back to the i360 environment as just another informational nugget that can be associated with a person, a file, or anything else.

There's more this application can do, too. For example, searches themselves could begin in a more structured format - focusing on just what you're interested in finding (see example below). Each item you're researching can be available with one click from a sidebar - no saving to del.icio.us required.

Focused Searching

The results of your searches can then be transformed into a new file with links (see below), retaining the same structure of your own headings and listed items, and that file can then be emailed to someone else or published as a page available publicly on the web. If you find something new to add to it, be it another link or a file or anything else, you can just drag-and-drop that new item to i360 to update the results on the fly.

Formatted Results Can Be Shared With Others

A project team in the workplace could use the application together, associating people and emails and files and searches with each other, creating a database of content surrounding their project. A year later, an employee in another department could search via their company's enterprise search and find all the information in that project and how it all interrelates, even if all the original team members had moved on to other jobs in other companies. No more would "everything is stored in that one guy's head" be the norm. Employees could move on, but the data they created or found, and the way that data relates to other data, would remain.

Where It Needs Improvement

As a concept - simple drag-and-drop semantics - the technology is fascinating. In practice though, it's still very rough. You couldn't install i360 and be off and running in minutes - you would still need training to know how to use it as it exists in its present form. It today's world of bubbly web apps, anything that isn't immediately intuitive isn't going to be adopted by the majority of users. The whole Enterprise 2.0 trend is about bringing the simplicity of consumer applications into the corporate world, and, although that is this software's goal, unfortunately, I can't say that it achieves it.

The UI itself is confusing. They've made some interesting choices - the address bar is at the bottom, for example; buttons are labeled with things like "E+" - a reference to the name of a portion of the software suite, but one that is meaningless to the new user. The graphics and fonts used look ancient.

The UI

Conclusion

However, that being said, if you can look past the UI to the underlying idea, there's something about this concept - human-powered semantics - semantics over everything - that could be great, if someone could just make it pretty. It could even be the future.

]]>Discuss]]>
http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php http://www.readwriteweb.com/archives/i360_adds_semantics_to_everything.php Products Mon, 05 May 2008 12:55:27 -0800 Sarah Perez
Track Blog Trends with Trendpedia From Brussels-based company Attentio comes a new blog search engine and trend-tracking tool called Trendpedia. The service, now out of beta, lets you scan the blogosphere for trends to see what's getting buzz. Trendpedia also lets you compose visualizations of those trends as charts and graphs, which can then be shared on the social web.

]]>Sponsor

]]> To use Trendpedia, you need only enter the keywords or phrases you wish to search for in the boxes provided. Enter one keyword, like "Twitter" for example, and Trendpedia will return a simple chart showing the ups and downs of that word over time, determined by counting the number of blog posts where the word was mentioned.

Enter in two or more keywords, like "Clinton vs. Obama vs. McCain," and the graph will display a comparison of those terms using a different colored line for each. A pie chart will also display showing the percentages of mentions for each term throughout the blogosphere.

Tracking the Political Candidates

Beneath the charts are the blog search results for the items, with each term as a separate tab. The graphs themselves are interactive, too - you can click anywhere on the chart's lines to see the articles from that particular date.

Trendpedia also offers advanced search tools from a separate page that let you perform searches using the word "AND"  to search for multiple terms in one search to compare to multiple terms in a separate search. (Example: "twitter and jaiku" vs. "wordpress and typepad and blogger") You can also make your own label for the searches which will appear on the chart that displays. (Ex: "microblogging" vs. "blogging"). The advanced search page also lets you specify which language to cull the search results from, if desired.

Blogging vs Microblogging

After performing the search, you can use the provided social media buttons to share the trend on del.icio.us, reddit, Digg, Facebook, StumpleUpon, or via email. However, a glaring omission is absence of an embed code for pasting the chart onto your blog or web site, forcing you to do screen grabs instead.

Trendpedia is clearly meant to be a competitor to Nielsen Media's Blogpulse, a site which Peter Kim points out appears to be on "auto-pilot." The Blogpulse homepage still features a section called "2005 Year in Review" and the latest news section's last update is from April 2007. Trendpedia's homepage, on the other hand, shows featured trends, popular trends last month, and popular trends last week. So, perhaps now with Tredpedia's offering, we'll start seeing some movement and innovation in this space once again.

Trendpedia homepage

]]>Discuss]]>
http://www.readwriteweb.com/archives/track_blog_trends_with_trendpedia.php http://www.readwriteweb.com/archives/track_blog_trends_with_trendpedia.php Products Fri, 25 Apr 2008 08:46:37 -0800 Sarah Perez
Where to Find Open Data on the Web Today, a story on Techmeme caught our eye. It was entitled "We Need a Wikipedia for data," and the article, written by X-Googler Bret Taylor, discussed the difficulty of finding open data sets on the internet, something which could spur innovation, allowing programmers to build new applications the likes of which have never been seen before. What was interesting about this story, in addition to, obviously, the concept of a Data Wiki itself, was the amazing and insightful commentary around this concept, not just on the blog, but all over the net, something which led to the discovery of some pretty good data sources that are already available.

]]>Sponsor

]]> In Bret's story, he mentioned some of the common data sources currently available, like the US Census Bureau's map data and the Reuters corpus, but his commenters came up with a few more. (See? This is why blog comments matter).

In addition, as CNet and Ryan Stewart's blog spread the story, more people chimed in with suggestions. And of course, the Hacker News guys had some more ideas themselves.

So what did everyone come up with? A lot of data sources are already freely available on the net, as it turns out, if you just know where to look. Here's a summary, do you have anything to add?

CKAN (Comprehensive Knowledge Archive Network)

The CKAN site is a registry of open knowledge packages and projects. Here, you can find open knowledge resources or register one of your own. What kind of stuff can you find at CKAN? They mention a set of Shakespeare's works, a global population density database, the voting records of MPs, or 30 years of US patents as some examples, but they also point you to some useful URLs, like flickr's Creative Commons page, where photos can be searched by license type.

CKAN

Infochimps.org

This project is attempting to assemble and interconnect the world's best repository for raw data - like a giant, free, open almanac. The best way to describe it comes from MetaFilter, where the project was spotted recently: "Just as Wikipedia will help you find out something about everything, infochimps.org will help you find out everything about something." What can you find there? Every wikipedia infobox, each infobox type in its own table, 50 years of global hourly weather data, all the tables from the US Census Statistical Abstract, oh and 100,000 official crossword words, too.

Infochimps.org

OpenStreetMap

Not a data set in the traditional sense, but definitely a useful tool, OpenStreetMap is a free, editable map of the world where you can view, edit, and use your own geographical data. The project was started because most maps actually have legal or technical restrictions on their use.

OpenStreetMap

MusicBrainz

A user-maintained community metadatabase site which collects music "metadata" like artist name, release title, list of tracks, etc. You can browse through the site or you can use a client program, like their own taggers, to help identify music collections. 

Musicbrainz

Jigsaw

Dismissed by the blogosphere as a bad idea, if not downright evil, Jigsaw, the marketplace that pays you to give up other people's contact info now boasts 7 million complete contacts for the taking.

DBpedia

This site is a community effort to extract structured info from Wikipedia and make that data publicly available on the web, essentially turning Wikipedia into a database you can query. Is this the beginnings of a semantic web? Check out their downloads section for the datasets and then scroll to the bottom for even more links to data sources on the web.

DBpedia

flickr wrappr

Where DBpedia takes Wikipedia and makes it semantic, flickr wrappr extends DBpedia with RDF links to photos posted on flickr. Here's an example. Here's another. This is pure geek hotness.

Freebase

Freebase, an open, shared database of the world's knowledge, received a lot of mentions in the comments, so this must be a good one. Community built and maintained, it pulls from open data sources like Wikipedia, MusicBrainz, and the SEC archives to create structured information on many topics, including more popular ones like movies, music, people, and locations. The site, unlike some of the others in this list, is also easy to navigate and well-designed, which makes it that much better to use.

Freebase

Opentick

Perhaps one of the less interesting items due to its dry subject matter - financial data - it's certainly worth a mention because a free database of real-time and historical market data for trading systems and platforms is the kind of thing that really floats some people's boats.

ThingISBN

Thanks to LibraryThing, ThingISBN is the site's first API, and even though its competitor became a paid service, ThingISBN is still free for non-commercial use. The API doesn't just return the usual book data, but also something called "edition disambiguation," meaning it also returns a list of "related" ISBNs—other editions, other media, and translations.

Numbrary

Like the title suggests, Numbrary is a library for numbers. This free service helps you find, use, and share numbers from public record data sets, like census data or the CIA World Factbook.

Numbrary

theinfo.org

This site isn't just a place to build or collect data sets, of which they have quite a nice list, but a place where you can interact with other number-lovin' folks like yourself.

theinfo.org

The Data Wrangling blog

This blog post lists a bunch, and I mean a bunch, of open datasets on the web, which just goes to show how much of a cursory list my post really is.

]]>Discuss]]>
http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php Products Wed, 09 Apr 2008 09:46:53 -0800 Sarah Perez
Calculate Your "Digital Footprint" with New Tool from EMC Earlier this month, EMC released their findings regarding the digital universe in a publication entitled "The Diverse and Exploding Digital Universe." Some of the research focused on mind-blowing figures - like the 281 billion GB size of the digital universe or the predicted size of the digital universe by 2011, nearly 1.8 zettabytes (1,800 exabytes). However, what really peaked our interest was information provided on your "Digital Shadow," that is, all the digital information generated about an average person on a daily basis.

]]>Sponsor

]]> You may already be familiar with the term "digital footprint," which you probably take to mean your online data trail. If asked to describe what would comprise this "footprint," likely responses would include things like your social network profiles, your web site or blog, your photos shared on an online service, videos you uploaded to YouTube, perhaps even mentions of you in the local paper or your school's web site. You may even go so far as to include information about you or your businesses that are public record.

Certainly those things are contributing factors to your digitally encoded self, however, this recent EMC-sponsored study discovered that your digital footprint includes far more than just the data related to individual actions.

Out of the 281,000,000,000 GB digital universe, each person's contribution is about 45 GB, and out of that 45 GB, only about half of the digital footprint would be related to these "active" individual actions - taking pictures, making VoIP calls, uploading videos, downloading content, etc.

Awareness of those sorts of self-created data trails has been steadily increasing according to a recent PEW Internet report (Dec. 2007), with nearly half of all internet users (47%) having searched for information about themselves, up 22% from 2002.

But this new research shows that we need to be aware of much more than just online mentions. What we need to concern ourselves with now, is the other half of our digital footprint. This "ambient content," the research team concluded, comprises of passive contributions, something termed as your "digital shadow."

Your shadow includes things like images of you on a surveillance camera, your bank records, your retail and airline purchase records, your telephone records, your medical database entries, copies of hospital scans, information about your web searches, general backup data, information about credit card purchases, etc.

John Gantz, Chief Research Officer and Senior Vice President of IDC explains the digital shadow as simply "information about you," but what's surprising about this shadow, he explains, is that "for the first time your digital shadow is larger than the digital information you actively create about yourself."

While for you this means being aware of the numerous places your information is stored to protect yourself from identity theft, for businesses, especially enterprise IT organizations that gather this information, it means a tremendous responsibility for the security, privacy protection, reliability and legal compliance of this information.

"Society is already feeling the early effects of the world’s digital information explosion. Organizations need to plan for the limitless opportunities to use information in new ways and for the challenges of information governance," said Joe Tucci, EMC Chairman, President and CEO. "As people’s digital footprints continue growing, so too will the responsibility of organizations for the privacy, protection, availability and reliability of that information. The burden is on IT departments within organizations to address the risks and compliance rules around information misuse, data leakage and safeguarding against security breaches."

If you're interested in the current size of your own digital footprint, you can download a copy of the Personal Digital Footprint Calculator. This tool walks you through a questionnaire that calculates your impact based on the responses to questions about your computer usage, email usage, digital camera/camcorder usage, web downloading habits, potential surveillance areas, and geographical information, among other things. The questions do make you think about your online activities, but they may be hard to answer if you're not really aware of your online activities or good at coming up with averages for things like "number of emails sent per week," for example.

Digital Footprint Calculator

However, if you take the time to fill out the Digital Footprint Calculator correctly, you'll be presented with your current "daily digital footprint," in megabytes. You can then click "Start Ticker" to launch your own personal ticker that increments over time according to your digital information creation. You can even upload this, along with the .swf file, to your own web site and share your results with others.

Example Ticker (taking wizard defaults)

Having a digital shadow is not necessarily a bad thing, the study points out, as it's what allows Amazon to make recommendations for you or display your "trustworthiness" as a seller on eBay, the downside is that, in many cases, erasing that shadow is still difficult or impossible: think about the Facebook user rebellion that took place when it was discovered how difficult/impossible it was to remove your profile from the service.

But there are other examples of where people have even less choice in the matter, like government-mandated traffic light cameras or citywide surveillance. And of course, your safety is at the mercy of credit card companies and the like - if they aren't taking security seriously, your digital shadow can be snatched away from you while an identity thief goes on a a rampage with your good name.

In the long run, it will be up to businesses to adapt to these changes and protect their customer's data. Those that don't will pay as their clients take their business to safer, more protective businesses elsewhere. And for us, just being aware of our impact on the digital universe is a good place to start.

]]>Discuss]]>
http://www.readwriteweb.com/archives/new_tool_calculates_your_digital_footprint.php http://www.readwriteweb.com/archives/new_tool_calculates_your_digital_footprint.php Trends Mon, 24 Mar 2008 13:08:12 -0800 Sarah Perez
Earthmine: Building a 3D Datamine of the Urban Environment Earthmine, the Best Technology Innovation/Achievement category winner at tonight's Crunchies, is a company that might seem uninteresting at first glance. When I first saw earthmine I assumed that it was just a Google Maps Streetview knock-off. I was wrong.

This startup is doing something far more interesting than that. While Google Maps and related consumer products have whetted the public's appetite for visualization of specific places on a map, earthmine is making those places machine readable.

]]>Sponsor

]]> How it works

The company uses a proprietary array of still-images cameras to take photos in stereo at regular spacial intervals while driving through city streets. The resulting 3D images can be measured with an accuracy that corresponds to measurements of the physical objects and distances they represent.

The company says it covered San Francisco in just three weeks. Each day's data is processed automatically and is available before the next day begins.

The initially self-funded company recently took an investment from CalTech and secured an exclusive liscence to use 3D image processing technology developed by NASA's Jet Propulsion Laboratory (JPL). Generating dense, accurate 3D data from wide-angle images is a serious technology challenge but one that the JPL worked on to process data returned by the Mars Rover.

What it means

Just as we here at ReadWriteWeb are excited about the potential offered by a machine-readable, or semantic, web - so too are the possibilities countless when thinking about a data rich, accurate and machine-readable 3D representation of the urban environment. earthmine offers a usable looking web interface but that's just the friendly wrapper around a dataset of far greater consequence.

From urban planning to mobile services to security applications, this kind of data and interface has a lot of potential. If the value of mapping and of GIS are clear, the value of a geospatial 3D dataset about urban environments should be clear as well. Combine all three and you'll be able to assemble some very interesting resources on almost any topic.

It is important to me to say that I don't care for the way the company talks about the technology, as "reality mining" and "indexing reality." To call that tasteless would be an understatement. I'm concerned that such reductionism could have substantial adverse political consequences. Maybe I'm just old fashioned to believe that there's far more that's important in "reality" than the things that can be digitized - and that much of it ought not be mined. I should probably stop, though, before a corporate exit puts me in thumbscrews listening to a well-fed Dr. Evil laugh. This technology itself could be put to use for good or ill, I'm sure.

Either way, this is fascinating stuff and worth some thought no matter how you relate to it. In addition to the very well produced company-produced video below this interview with the young earthmine CEO and this one of his time on stage at the DEMO Fall conference is worth a watch.


]]>Discuss]]>
http://www.readwriteweb.com/archives/earthmine_datamine.php http://www.readwriteweb.com/archives/earthmine_datamine.php Products Fri, 18 Jan 2008 19:29:59 -0800 Marshall Kirkpatrick