data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/data en Copyright 2009 Richard MacManus readwriteweb@gmail.com Mon, 23 Nov 2009 10:24:13 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Gmail Users Better-Connected, More Likely to Tweet than Members of other Webmail Services The social media data company Rapleaf has just released the final parts of their 3-part study involving the demographics and online behavior of webmail users. In the first part of the study, gender and age data was examined and revealed some interesting findings...like the fact that Gmail has more female users than male, for example. In the final sections of the study, the company has turned its attention to social networking data to discover more details about webmail users' social media profiles, memberships and network preferences.

]]>Sponsor

]]> Social Network Membership Data

In the latter parts of the study, the company looked specifically at social network membership data for users of the AOL, Gmail, Hotmail and Yahoo webmail services. Not surprisingly, the study found that Facebook was the most popular network across the board. What's more interesting is how well MySpace fared in some cases. On both the Hotmail and Yahoo webmail services, Facebook only had a small lead. Here, around 20% of all Hotmail and Yahoo webmail users were found to be on Facebook and MySpace. What does this reveal about the Hotmail and Yahoo user base? That they're a little more behind the times? Or that they've been around on the net longer and at one time had created (and possibly now abandoned) their MySpace pages? Unfortunately, the study can't provide us with these sorts of answers.

The study also showed that Twitter is far more popular among Gmail users than anyone else. In fact, on the other services, it's 4-5 times less popular than Facebook. We would like to think that's because Gmail users are just more web-savvy and cool, but it's possible that it's because they're just younger than everyone else.

Not surprisingly, LinkedIn is the least popular social network, but as Rapleaf points out, many LinkedIn users may have registered with their business email instead.

Participation Levels - Hotmail Users have Most Profiles, Gmail Users Better-Connected

When it comes to how the webmail users participate on social networks, Rapleaf found that the majority of the users have only one social media profile. But the service where the average number of profiles is the highest might surprise you - it's Hotmail. There the average is 2.5 profiles per user. Hotmail is followed by Yahoo, then AOL, and it's Gmail users who have the least number of social media profiles. That finding seems odd considering that Gmail users are younger and more likely to use Twitter in addition to Facebook. In fact, it almost seems like this data doesn't even fit with the rest of the study.

However, the discovery that Gmail users are better-connected than the other users makes more sense. On average, Gmail users have the most friends on social networks with 46.2 friends while Yahoo users have the least with 40.0.

Since again, Gmail users tend to be younger than the rest, it goes to reason that they would be in a demographic where their peers are more likely to have social membership profiles. Older webmail users, meanwhile, are still signing up for these sites. Although baby boomers and other middle-aged folks are joining sites like Facebook in droves these days, social networks are still dominated by the young.

Methodology

For the Rapleaf study, the company sampled 120,000 webmail accounts from users with @aol.com, @gmail.com, @hotmail.com and @yahoo.com email addresses. They then looked into the users' age, gender and social networking data by collecting information from public social media profiles. Obviously, in doing so, they've skewed their findings a bit, as the company notes in their original blog post. However, the sample size is large enough to form some conclusions about the members of these services, even if it relied on a particular subset of users.

]]>Discuss]]>
http://www.readwriteweb.com/archives/gmail_users_better-connected_more_likely_to_tweet.php http://www.readwriteweb.com/archives/gmail_users_better-connected_more_likely_to_tweet.php Trends Thu, 19 Nov 2009 07:22:49 -0800 Sarah Perez
The Web of Services: Machine-Accessible Services In the last two posts in this series, we discussed the Web of data, which makes structured interlinked data sets machine-accessible, and the Web of identities, which makes data about people machine-accessible while addressing privacy and data volatility.

This time, we'll focus on the Web of services, which makes services accessible to and processable for machines. These Webs all have a semantic architecture in common and follow basic Web principles, such as being decentralized, modular, simple, addressable via URIs, and built for machines.

]]>Sponsor

]]> The services sector has become the world's biggest business sector, accounting for 64% of the worldwide gross domestic product. The sector has pressure on it to make its services easier and more widely accessible, as well as to quickly adapt to ever faster changes in the market environment.

The effort to standardize such things as service-oriented architectures (SOA) and Web services has taken years, but still we have no clear definition of what constitutes a service at a conceptual level. The interface, which is the format of what goes in and out of the service, is often described formally, but what the service is actually doing, semantically speaking, is not. While there are a number of different approaches to semantically describing Web services, such as OWL-S, WSMO and WDSL-S, none so far has managed to break out of its academic confines.

Today, there are already all kinds of services with different levels of complexity, and their number is expected to grow exponentially. The services follow different standards, and a lot of them are proprietary, uni-directional and designed to be used by humans to mash up something new. Editorial catalogs such as ProgrammableWeb and search engines for Web services such as seekda are designed for humans who are searching for a particular service for that reason. For tasks that are unsolvable for machines, there are even Web services such as Amazon's Mechanical Turk, which have humans in the back end answering tricky queries.

The problem with all of this is that each of the tens of thousands of services is accessible but not findable by a machine without a machine-understandable description. Thus, every service nowadays has to be wired to a machine by hand. So, what would machines be capable of if services were annotated with semantic descriptions?

  • Service discovery
    Given an index of Web services, a machine charged with finding the right service for a particular problem could choose one among those that have been indexed.
  • Contracting and execution
    Once a service has been selected, a machine could look up its terms and decide on contracting and execution details. How often would the service be needed? And what would be the cheapest contract then?
  • Billing or revenue sharing
    Depending on the autonomy of the machine, one could imagine something like an Autonomous Agent, which automatically makes the best deal with the service provider on such things as billing or revenue sharing for service usage.
  • Replacement on failure, based on experience
    Of course, the machine would be able to replace a failing service with an equivalent one. It could also rate a service and publish it.
  • Service orchestration
    A machine could, given enough intelligence, split a task into sub-tasks and then discover, contract and orchestrate services to solve these sub-tasks. And after the sub-tasks have been addressed, the main task would be solved. Such orchestration could involve the parallelization of tasks, for speeding up or redundancy purposes, or chaining services (whereby the output of one service is inputted into the next).

Research projects such as TripCom, SUPER, SHAPE and SOA4All are dealing with these ideas and scenarios.

Future scenarios are limited only by our imagination: machines could autonomously pursue goals on behalf of their master user or company, according to a specified level of freedom. These agents could solve increasingly complex problems and be granted increasingly more autonomy (finally ending up as Skynet).

In the next and final post in this series, we will discuss how all of these scenarios could become a reality with the arrival of all three Webs: a revolution in the ability of machines to access, process and apply information.

Do you also count the Web of services as a third Web? Where do you see its limits?

(Photo by zorro-art.)

]]>Discuss]]>
http://www.readwriteweb.com/archives/web_of_services_machine-accessible_services.php http://www.readwriteweb.com/archives/web_of_services_machine-accessible_services.php Semantic Web Fri, 16 Oct 2009 12:00:39 -0800 Alexander Korth
Infochimps: Share and Sell Your Raw Data Invite code available at bottom of the article!

Although the data repository Infochimps has been open for a year or so now, the company is making a big announcement at the DEMOfall 09 conference today. Now, in addition to simply being one of the best sources for finding raw data online, you're able to share your data - or even list it for sale - through new site features which the company hopes will encourage businesses to open up their commercial datasets to the world.

]]>Sponsor

]]> Infochimps originally caught our eye back in April of 2008 when it made our list of the best places to find open data on the web. The beauty of this site, which is essentially a specialized search engine for raw data, is that it makes finding relevant data much easier than if you tried to do so using a traditional search engine like Google. For example, a query for "music" returned, among other things, a listing for Last.fm Artist Tags from 2007. A similar search on Google wouldn't pull in that link until you hit the 42nd page of search results. In other words, you would never find it on Google.

At Infochimps, data can either be hosted on site in a standard format like CSV, XML, or YAML, or it can simply point to an external source. While the data itself cannot be manipulated on the site, the metadata like the description and tags can be edited by anyone who creates an account at Infochimps.org. The site founders have seeded the site with some data already - like the data from the comprehensive knowledge archive network - but the majority of the data is user-submitted. They've also partnered with Amazon to share Infochimps data with Amazon's Public Data Sets service. To date, one-third of Amazon's Public Data Sets were contributed by Infochimps.

New Announcements: Sell and Share Your Data

Today, the company is opening up and allowing anyone to upload their own datasets. The data can be any raw data that has an open license. To get started, users just upload it to the site and Infochimps will then handle the storage and distribution. Also, for any truly valuable data, be that commercial data a company wants to share or some sort of data manipulation - like editing awful census data into a format for use with MySQL databases - users can now charge for downloads, too. The price for the dataset can be set to any amount, however Infochimps makes its money by taking a 20% cut of all data sold.

There really isn't a company that's doing anything like Infochimps. Amazon's Public Data Sets comes close, but isn't nearly as extensive. And although other repositories of data exist, (Archive.org and the newly launched Data.gov come to mind), these resources focus on one particular type of data as opposed to providing a search engine for all data.

Those who have data to share or sell can now do so as of today: just visit Infochimps.org to get started. ReadWriteWeb readers who use this service can use the code RWWrocks to get in.

]]>Discuss]]>
http://www.readwriteweb.com/archives/infochimps_share_and_sell_your_raw_data.php http://www.readwriteweb.com/archives/infochimps_share_and_sell_your_raw_data.php Products Tue, 22 Sep 2009 16:30:00 -0800 Sarah Perez
Who Has the Right VC Numbers and Who Cares? We started tracking VC funding in October 2008, as the financial markets were melting. What caught our eye in those dark and gloomy days was True Ventures' announcement of its Series A investment in Syncplicity. The more we looked, the more we found that the headlines were wrong. It was not all doom and gloom, not in our corner of the universe: early-stage Web tech ventures. So we figured that getting (and passing on to you) good reliable data on a timely basis would be a good idea. Searching for that turned out to be harder than we thought, and herein lies a tale.

]]>Sponsor

]]> A Billion Here, a Billion There

For the quarter ending this past June, we compared the findings of three research firms that reported on the money invested in Q2:

  • July 21, MoneyTree (PricewaterhouseCoopers, with data from the National Venture Capital Association and Thomson Reuters): $3.7 billion, with 612 deals,
  • July 18, VentureSource (DowJones): $5.27 billion, with 595 deals,
  • July 14, ChubbyBrain (a New York City-based startup partnering with ReadWriteWeb): $5.329 billion, with 613 deals.

VentureSource and ChubbyBrain seem to agree on the top line number. But MoneyTree's number is what most people report, and that is about $1.5 billion different.

As the old saying goes, "A billion here, a billion there. Sooner or later it adds up."

Disclosure: Our VC Funding Report

ReadWriteWeb has an interest in this. We sell a report for $299 that has details on the 240 deals done this quarter in the Internet, mobile, and SaaS space (not clean tech or bio tech), and this is powered by data from ChubbyBrain. So we are biased. But it also means that we are engaged and have been looking at this fairly deeply.

Who Cares?

We also think that accuracy matters, and we are trying to figure whom accuracy matters to. We see three main types of participants in the industry:

  1. VCs. They need accurate data for their own fund-raising. They have to be able to benchmark their own funds relative to the broader market.
  2. Entrepreneurs. Data on what funding deals are being made, and why, helps them figure out how much to raise, when, and from which VC.
  3. The startup "community." This is a catch-all for everyone else, who tend to align to either VCs or entrepreneurs. Journalists, the non-aligned fourth estate, want reliable data to key off interesting stories.

Why does this matter? The startup community matters to the health of the overall economy. As the National Venture Capital Association (NVCA, the trade association of VCs) likes to point out:

"Originally, venture-backed companies have created companies that accounted for 10.4 million jobs and over $2.3 trillion in revenue (based on 2006 data)."

So a headline like "VC Investments Falling Off Cliff in the US" really impacts a lot of people. That is the kind of headline that most journalists/bloggers wrote in April 2009, based on data reported by those trusted sources.

We wrote a really boring headline:

"VC Investment in Internet Deals Did Not Fall Off a Cliff."

That's a lousy headline for generating page views. It's a story about "the dog that did not bark."

The point is that headlines drive business behavior to wild excesses on both the down-cycle bust and the up-cycle boom.

Just good reliable data would help.

Innovation Is Global, But It Keys Off US Data

At ReadWriteWeb, we love to track innovation from far-flung corners of the world, and we see the globalization of innovation as a critical trend.

So we want to be able to report on financing trends for early-stage Web technology startups across Europe and Asia, in addition to the US. And we expect any research process to be able to scale to that challenge.

But the reality today is that, globally, entrepreneurs and VCs key off US data. If they were to key off bad data, that would matter to everyone.

Why This Matters

Driving with one's eyes in the rear-view mirror is dangerous. We take action based on what authoritative sources tell us is happening today, and we base our assumptions on what that means will happen next and plan accordingly.

In reality, these sources tell us what has happened in the past, and they may not even tell us that accurately.

When we at ReadWriteWeb look at the macro picture, we favor a contrarian view simply because the reality we see today is often not what the headlines trumpet. When the markets were in the late stage of a boom, we were sounding the warning signals.

When the markets were melting, we began to see surprising signs of life in the early-stage Web technology world we live in.

Whether you are an entrepreneur or an investor, knowing what the crowd is thinking -- and what the headlines are trumpeting -- is valuable. Even more valuable are the underlying facts and trends that may be missing from those headlines. In the disconnect between the two often lies a lot of opportunity.

We hope to ignite a debate that leads to greater accuracy and transparency of these numbers.

]]>Discuss]]>
http://www.readwriteweb.com/archives/who_has_the_right_vc_numbers_and_who_cares.php http://www.readwriteweb.com/archives/who_has_the_right_vc_numbers_and_who_cares.php NYT Sun, 09 Aug 2009 14:00:38 -0800 Bernard Lunn
EFF Calls on Companies to Encrypt Location-Based Data eff_privacy.jpgThe reason why Steven Seagal's 80's movies lack relevance for modern day audiences is because if a group of creepy, rogue mercenaries were to abduct us now, we'd be able to ping 10 nearby friends for backup. If you're like us, you're using one or more location-based services that rely on GPS data, phone signal strength or visibility in relation to nearby wireless networks. In other words, through Twitter, Loopt, Brightkite, Foursquare or Google Latitude, your location is sitting in a database. Nonetheless, according to a recent report by the Electronic Frontier Foundation, you shouldn't have to forgo your locational privacy to find nearby friends or restaurants.

]]>Sponsor

]]> Locational privacy refers to the expectation that as regular citizens our whereabouts are not being monitored. We've all heard of the horror stories about illegal wiretapping and citizen surveillance, but what about the services we opt into? According to the report "On Locational Privacy, and How to Avoid Losing it Forever", it's fairly easy to use cryptographic techniques to ensure your anonymity. Rather than revealing a mobile device's owner to service providers, one way to ensure anonymity is for a mobile device to ping services using a cryptographic proof-of-identity. A University of Waterloo report entitled, "Louis, Lester and Pierre: Three Protocols for Location Privacy" provides a deeper look at identity masking techniques. eff_privacy_aug09a.jpg

This is an important subject for those companies looking to enter into the geo-locational space. Groups that encrypt their data are taking pains to reduce the threat of identity theft, illegal surveillance or for data to be subpoenaed by a court. These companies will be rewarded with customer loyalty when the unfortunate time comes for one or all three of the above scenarios.

Those critical of encryption might suggest that law-abiding citizens have nothing to hide, but that simply isn't true. What if you're in alcoholics anonymous? Or you've simply spent the night at a person's house? And honestly, do you really want your running club to see how often you eat at Arby's? Encryption allows us to ping our friends while maintaining an air of mystique, and at the end of the day, the companies that care about their customers, keep them.

]]>Discuss]]>
http://www.readwriteweb.com/archives/eff_calls_on_companies_to_encrypt_location-based_d.php http://www.readwriteweb.com/archives/eff_calls_on_companies_to_encrypt_location-based_d.php Lifestreaming Wed, 05 Aug 2009 20:00:42 -0800 Dana Oshiro
Journalism Needs Data in 21st Century Journalism has always been about reporting facts and assertions and making sense of world affairs. No news there. But as we move further into the 21st century, we will have to increasingly rely on "data" to feed our stories, to the point that "data-driven reporting" becomes second nature to journalists.

The shift from facts to data is subtle and makes perfect sense. You could that say data are facts, with the difference that they can be computed, analyzed, and made use of in a more abstract way, especially by a computer.

]]>Sponsor

]]> With this mindset, finding mainstream data-driven stories doesn't take long at all. A quick scan of the Guardian's home page tells us that swine flu cases are up by 50%, according to "fresh figures...[that] will be released this afternoon." The story here is that we're in danger because swine flu is on the rise. Reporting the current figures available for swine flu alone wouldn't be all that interesting. The news comes from comparing the current figures to last week's, which is a very simple form of data analysis. By making use of published data and running one's own analysis (and building on the analysis of others), we get something very news-worthy indeed. It moves the definition ever so slightly, from "saying and asserting" to "analyzing and publishing." But it obviously works only for data that is accessible.

There is nothing new about pointing out the importance of public data being made available. Sir Tim Berners-Lee has discussed at length the importance of governments and institutions putting their data online, making it accessible and useful. His TED talk and interviews with ReadWriteWeb and Talis (disclosure: I am a blogger at Talis) all explain his belief that by publishing linked data we can begin to solve many of the problems the world faces. Innovations in medicine, science, and development could all be achieved if only currently hidden data were made available. Data-driven journalism could be the first step in realizing this dream. The best stories would then come from innovators who read about trends reported in news media and are then able to draw new conclusions and solve bigger problems. In his recent discussion with BBC, Berners-Lee said that the next step is to go for low-hanging fruit by just getting the data out there.

Thus far, this has made a lot of sense to me, and I have been tracking the publication of linked data and increasing access to public knowledge as emerging trends over at Talis. But my perspective has shifted a bit in the past few weeks.

First, there was data.gov and President Obama's call for more access to government data. A sitting head of state (and one of some significance) was clearly calling for public access to government data: this was news! But the idea has been discussed, praised, and debated for a while since then and may have lost some of its luster.

Then about a month ago, UK Prime Minister Gordon Brown made it part of his digital strategy to prioritize the publication of government information. He asked Sir Tim personally "to help us drive the opening up of access to Government data in the web over the coming months" and appointed Berners-Lee an official governmental adviser. By now, neither of these stories is news and comparisons between the initiatives have been made.

The Guardian newspaper recently launched its own Data Blog, with the intention of letting readers access, mash up, and reuse much of its information in the form of data, which could in turn drive stories.

What is perhaps not as explicitly recognized is the voracious appetite for data that has been apparent for months. It is less about turning good ideas into stories and more about seeing how data informs our understanding of events happening right now. Each new initiative is another piece of low-hanging fruit picked.

Access to data is important: it drives innovation and even social change. Governments that publish their data have to become more transparent. Humanitarian organizations that make their findings known could spark bigger projects and source innovative solutions from their communities. Scientific findings and raw information could be used to solve bigger problems than the result of a single experiment or trial could ever manage. Even the simple comparison of two or more facts can lead to new insight, and all of these things happen only when the walls around an institution become porous.

2009 could become known as the year of data, the year of open access, or the year of the semantic Web (see links above for how this relates), and it may also be the first year when it becomes news that data wasn't published in a story when it should have been. That a government body isn't being transparent or is blocking access by publishing its findings in PDF or other non-linking formats would make a very interesting story indeed. We can expect to see more and more organizations and public bodies remove their own barriers through initiatives and legislation. Examples have been set, and seeing excuses die along with barriers is not far-fetched.

Do you know of other data-driven stories? We'd love to hear about any insights that were made through publicly accessible data or where this data might come from next.

Guest author: Zach Beauvais is a Platform Evangelist for Talis and editor of Nodalities Magazine.

]]>Discuss]]>
http://www.readwriteweb.com/archives/journalism_needs_data_in_21st_century.php http://www.readwriteweb.com/archives/journalism_needs_data_in_21st_century.php Trends Wed, 05 Aug 2009 02:00:37 -0800 Guest Author
The Web of Identities: Making Machine-Accessible People Data In a previous article, we discussed the Web of data, which is about inter-linking open data sets and, thus, turning them into machine-accessible structured data. In this post, we'll draw a picture of how the emerging social Web could serve as a Web of identities, which is essentially a people-data version of the Web of data.

]]>Sponsor

]]> W3C's Linking Open Data (LOD) project has gotten quite a bit of attention for the good job it does with the Web of data. Currently, all participating data sets are accessible free of charge and can be used without constraints. The project focuses on growth for now. In an email, Chris Bizer hinted that a payment model to charge for particular content may come in future.

The LOD approach is very good for static and encyclopedic knowledge, but what about accessing our personal data? Technically, modeling our identity, profile data, social graph, groups, activity stream, assets, and other kinds of personal data is straightforward. But empowering machines to access this data could present challenges to the LOD approach, because it comes with all sorts of constraints and peculiarities, such as privacy and data volatility. People want control over who has access to their data or parts of their data and want to be able to block access for any reason. And issues such as rapidly changing and outdated data remain unaddressed.

This is where the social Web can help.

The Emerging Social Web

There was a time when we had to create a new digital identity for each social application we wanted to use. A social application provides features based on social attributes. Every application provider implemented its own proprietary ID management to authorize users to log on and implemented its own proprietary user profile system to manage information about its users. Application providers were judged by the size of their user and content base and so erected endless walled gardens to protect their properties.

The most significant issues people had were:

  1. Low conversion rate for user registration,
  2. Users had to register for many accounts,
  3. Users had to re-enter and synchronize profile data,
  4. Privacy, data ownership, and inability to export.

Not much has changed, unfortunately. Most remarkable, perhaps, is the growing number of single sign-on (SSO) solutions that address the first issue for application providers and the second issue for users. New application providers can now outsource this functionality to a third-party SSO provider. Some of the biggest application providers became ID providers themselves to allow their users to log on to third-party applications with the same ID, and this has gained traction beyond these few providers. This has led us to an era of identity wars between the big providers.

Many ID providers, such as Google, Yahoo!, MySpace, and Facebook, have added the OpenID SSO to their own proprietary mechanisms over time. Because of the open nature of OpenID, many third-party providers have found it easy to integrate with the bigger providers, giving them more traction because users are able to access their services so easily using their OpenID credentials. Now, these ID providers can offer read-only access to fragments of profile data that users can look up or copy to third-party applications. Like SSO and OpenID, this began with proprietary solutions, but now exchange formats and protocols are emerging whose open language allows applications to easily exchange and synchronize data. These include:

In the future, ID providers will loosen their connection to social applications and start taking over management of users' social attributes. Users will be able to log in to applications using credentials hosted by their ID providers of choice and grant permissions to these applications to read or even sync selected fragments of their profile data. The borders of these walled gardens will thus blur, and the social Web will become more of a weave than a patchwork quilt.

The Web of Identities

The Web of data is a distributed web of interconnected sets of semantically annotated data. A connection is achieved as a result of data pointing to data contained in another set through a URI, just as websites point to each other with URIs. This way, machines can crawl the sets to read the data. ID providers will most likely refer to their users via URIs in the future as well. A social connection will consist of one user's URI pointing to another user's URI or ID provider. If permitted by users, a machine may very well accomplish its tasks by jumping through the Web of identities from user to user, the way it does through the Web of data.

Why is this needed? The Web of identities is actually a super-social graph that spans multiple ID providers. If we come across walled gardens, this infrastructure would be needed for all of the social-related search functions we perform. The following examples are thus far provided only (if at all) within individual applications:

  • "What is the best book read by friends in my circle?"
    This query might retrieve book purchases and book-related status updates that your friends have made accessible through their privacy settings and then rank the books in a set.
  • "Notify me if a close friend visits Berlin."
    This permanent task repeatedly looks up your friends' geo-locations. You may also have granted your close friends access to this data, too. This task could even be combined with the Web of data to look up the meaning and location of Berlin.
  • "Sync my address book."
    This permanent task continually synchronizes my friends' addresses and numbers with my personal address book.

Now it's your turn. In what ways do you think the social Web and Web of identities are evolving?

(Diagrams by alexkorth)

]]>Discuss]]>
http://www.readwriteweb.com/archives/web_of_identities_making_machine-accessible_people_data.php http://www.readwriteweb.com/archives/web_of_identities_making_machine-accessible_people_data.php Web Future Sat, 11 Jul 2009 14:04:57 -0800 Alexander Korth
Is Twitter Really That Big? Twitter_logo.jpgWeb security SaaS company Purewire evaluated the profiles of millions of Twitter users to show the depth of a new tool it has created called Tweet Grade. While the tool itself is not unlike numerous other Twitter grading services, the company has uncovered some very interesting user statistics. It seems as though far fewer people are actually using and contributing to the site than Twitter's recent hype and massive growth would suggest. In fact, the data shows that a large percentage of Twitter users have not "tweeted" since the first day they joined the service and at least a quarter of its users don't have any followers at all.

]]>Sponsor

]]> Twitter won't give out its own numbers (and apparently won't follow or listen to you either), but Purewire was able to pull together profile data from 7 million user profiles and this is what it found:

First, many Twitter users "have abandoned their accounts shortly after creating them, and a significant percentage are not showing signs of account activity".

* 40 percent of Twitter users have not tweeted since their first day on Twitter (i.e., the account was most likely created and subsequently forgotten about).

* Approximately 25 percent of Twitter users are not following
anyone, while two-thirds are following less than 10 people (i.e., the
account was created but is not actually being used regularly).

Second, the data shows that "Twitter is used more as a mass medium for receiving information, rather than as a way to interact with others. Proof is shown by evaluating the followers and friends of Twitter users".

* More than 1/3 of Twitter users have not posted a single tweet, and almost 80 percent of the users have less than 10 tweets (i.e., while Twitter is billed as a great collaboration tool, a large number of users are there to consume content, not distribute it).

* Approximately 30 percent of Twitter users do not have any
followers, and 80 percent of Twitter users have less than 10 followers
(i.e., for many users, their posts are not being widely tracked or
read).

* 50 percent of Twitter users are following more people than they
have as followers, and another 30 percent of Twitter users are following
the same number of people that are following them (i.e. users are
aggressively trying to attract followers by hoping they will "follow
back" but have been unsuccessful).

It's clear that celebrities like Oprah and Ashton Kutcher have made Twitter the "flavor of the month", but there are many people out there who will never quite get it. That's alright with us. There are bound to be folks trying out different Web apps that won't end up using them, we do it all the time. We still think Twitter has become a valuable mainstream communication platform and its usage will continue to evolve and grow.

Be sure to check out Purewire's new Twitter grading tool Tweet Grade to see if you and your followers pass the test.

You can find ReadWriteWeb on Twitter, as well as the entire RWW Team: Marshall Kirkpatrick, Bernard Lunn, Alex Iskold, Sarah Perez, Frederic Lardinois, Sean Ammirati, Doug Coleman, Dana Oshiro, Steven Walling and Lidija Davis.

]]>Discuss]]>
http://www.readwriteweb.com/archives/is_twitter_really_that_big.php http://www.readwriteweb.com/archives/is_twitter_really_that_big.php Products Sat, 06 Jun 2009 11:51:07 -0800 Doug Coleman
Study: 1 in 3 Smartphone Owners Use Location Based Services compete_logo_mar09.pngAccording to a new report from web analytics firm Compete, 1 in 3 smartphone users use a location based service at least once a month. Weather and navigation apps are currently the most popular location based services, followed by apps that provide store locations, movie showtimes, and local news. Interestingly, there also seem to be a number of highly underserved markets. According to Compete's research, users also want to be able to receive local alerts about topics like traffic jams and gas sales.

]]>Sponsor

]]> According to Compete, smartphone owners who use location based services are also likely to have a higher monthly cell phone bill ($75-$125) than users who don't use these services. Chances are, though, that these users also tend to have data plans, so these numbers are not exactly surprising.

combete_lbs_jun09.png

Currently, there are still a number of technical and privacy issues that are holding back some of the most interesting services. Due to the absence of background processing, the current generation iPhone, for example, can't regularly ping a server with a user's location and then send alerts to the phone based on this information. Alerts you have to actively pull up are, after all, not nearly as compelling as automated messages that tell you that you are heading right for a major traffic jam.

Underserved Markets: Local Alerts, Special Offers

Advertisers will also be happy to hear that a large number of consumers would like to receive special offers tailored to their current location, but only a very small number of current smartphone users are actually aware or able to use these services.

According to Compete's Andy deGaravilla, this means that companies that manage to provide users with more compelling and relevant ads based on their location will "likely see higher clickthrough rates and subsequent engagement." At the same time, though, we can't help but wonder if at least some users would also like to simply receive a text message or another kind of alert on their phones if, for example, a nearby store has an offer for them.

User Initiated vs. Background Services

The current generation of location based apps mostly relies on users to initiate the process. It would be interesting to see how consumers would react to a background service that actively monitors a person's location and sends out alerts when a user enters a certain location, for example. Of course, this could get highly annoying quickly, but there is no reason to believe that it couldn't be done right.

]]>Discuss]]>
http://www.readwriteweb.com/archives/study_1_in_3_smartphone_owners_use_location_based.php http://www.readwriteweb.com/archives/study_1_in_3_smartphone_owners_use_location_based.php News Tue, 02 Jun 2009 10:40:29 -0800 Frederic Lardinois
StatPlot: Create Beautiful Sports Charts in Minutes statplotlogo.jpgStatPlot is the newest project of sports statistic aggregator StatSheet and you're likely to enjoy it whether you're a sports fan or not. The site makes it easy to assemble attractive, dynamic charts for sports statistics in minutes. Navigate through the long list of options by point and click, autocomplete, cut and paste and you're done. Loads of data is already there and available for your use at no charge.

It's a fun site to use. Basketball, football and NASCAR are supported initially - hopefully baseball and hockey will be next. There's OpenID integration, the image selection is really nice and it's just great. It's still a little rough around the edges but given that the service just launched today - we're impressed. This is the kind of democratized data visualization that any field could benefit from with enough open data and a good user interface.

]]>Sponsor

]]> Check out this chart of points made by Lebron James throughout the basketball season. I made it in 2 minutes and I hardly know who Lebron James is! (I saw him on TV at a bar during a playoff game and it was pretty clear he's incredible.)

The Adobe Flash charts that StatPlot produces are even nicer, but they aren't easy to scale down to the size I needed.

The whole StatSheet franchise is an interesting one to watch. See TechCrunch's recent coverage of the company's tussle with Twitter over innovation on that platform.

Bring on the huge data sets and easy charting interfaces! We'd love to see a little of that action over at Data.gov, for example. Heck, let's see these kinds of options put on top of the forthcoming ClearSpring API tracking hundreds of millions of peoples' sharing activities online.

As StatSheet said on its blog today: "There are other services (Swivel, iCharts.net, Many-Eyes) that allow you to upload data and create a variety of visualizations, but these all suffer from the same issue. The average sports fan does not have access to quality sports stats to upload. With StatPlot, you don't need to bring your own data because you can use the expansive StatSheet database!"

It's not just sports fans that could use a hand with data sets. We all could. Thanks for leading the way in truly democratizing data visualization, StatSheet.

]]>Discuss]]>
http://www.readwriteweb.com/archives/statplot_create_beautiful_sports_charts_in_minutes.php http://www.readwriteweb.com/archives/statplot_create_beautiful_sports_charts_in_minutes.php Data Services Wed, 27 May 2009 12:49:58 -0800 Marshall Kirkpatrick
Data.gov Now Live; Looks Nice But Short on Data Data.govlogo.jpgThe long awaited catalog of public data from the US government launched this morning at Data.gov. Developers, watchdogs and data nerds around the world rejoiced - but the initial offering is a bit of a let down.

New federal CIO Vivek Kundra is in charge of the site, which will act as a central repository for government data, including XML, CSV, KML files and more. At launch a mere 47 data sets are included and they appear to lean towards the least controversial matters. None the less, it's exciting to see the effort happening. Hopefully some awesome mashups are on the way!

]]>Sponsor

]]> Data.govscreen.jpg

There are many, many sets of data available from the federal government but the Data.gov site says it was selective about quality and standards when choosing what to include. It's hard not to compare other sources of government data and feel disappointed, though. The privately built USGovXML.com contains far more data and was built by one independent developer over four months. That site lists ten Department of Interior XML feeds, for example, none of which appear on Data.gov. You can find a feed of food recalls there, but not on Data.gov.

Twenty six government agencies are represented in the catalog, though not all are offering raw data. The FBI is listed as a source but only offers a widget that can be placed on websites, not access to raw data.

New York Times data wonk Derek Willis pointed out that the initial offerings are non-controversial. "Most are from USGS, EPA and National Weather Service," Willis observed this morning. "No [data from] Department of Homeland Security, State or DOJ."

Likewise, a search of the data sets for keywords like food, prisons and drug all bring up zero results. Those are examples of particularly important topics because they are matters of justice and injustice - shedding light into dark corners where injustices are being perpetrated is one of the most important things that government data and the subsequent computer assisted reporting can accomplish.

There are no RSS feeds available for the whole catalog or search queries, something that would be very useful for tracking additions of new data. We expect that will change soon.

People will no doubt argue that some data is much better than no data, and while that's true: for a new federal office to engage with such an important topic with the weight of history and the whole administration behind it and then come up with something this limited is disappointing.

API and mashup watcher John Musser of ProgrammableWeb was more generous than we are about the initial offerings:

"They're off to an excellent start. It's a big step in accessibility of government data. As we've been seeing with other v1 gov-data efforts, like the recently available data on senate votes: step one is give people structured data like xml, step two (or later) is to make it available via an API. They have a healthy amount of metadata. The number of data sets is not that large, but of course it's just the beginning."

It is just the beginning and we applaud the launch of this effort. We hope that the initial launch will pale in comparison to the long term value of this collection of data.

The folks at Sunlight Labs, Google, O'Reilly/TechWeb and Craig Newmark just launched a new part of their Apps for America contest to build the best mashups and data visualization tools for data in the new Data.gov site. Check it out!

See also the newly launched Whitehouse.gov/open - launches today just keep popping up.

]]>Discuss]]>
http://www.readwriteweb.com/archives/datagov_finally_launches_looks_nice_but_short_on_d.php http://www.readwriteweb.com/archives/datagov_finally_launches_looks_nice_but_short_on_d.php Mashups Thu, 21 May 2009 09:02:33 -0800 Marshall Kirkpatrick
Google Begins to Make Public Data Searchable Google just announced its first foray into making public data searchable and viewable in graph form. The company is starting with population and unemployment data from around the US but promises to make far more data sets searchable in the future. The potential significance of making aggregate data about our world easy to visualize, cross reference and compare can't be overstated.

Most of us understand the world based on stories we've put together from our own lived experience. Another way to understand things is by finding patterns drawn from everyone's experience in aggregate. Journalists often find big patterns and then zoom in to particular life stories that exemplify those general trends but make them easier for us to relate to as individuals. Those stories then help move public opinion in favor of policies that aim to change the general trends. That's just one way that easily searchable public data can be very, very important.

]]>Sponsor

]]>

These first data sets come from the U.S. Bureau of Labor Statistics and the U.S. Census Bureau's Population Division, but as Google explains in its announcement there are far more sources of information that could be included. Those two government agencies alone have a lot more to offer as well.

The visualization technology is called Trendalyzer, which Google acquired from a company called Gapminder two years ago.

We hope that Google will index as many public data sets as possible. We'd like to see demographic data like race and income made available for cross referencing; infant mortality, education levels, toxic waste reporting and crime statistics are other logical factors that would be great to see included.

It may not be a co-incidence that the new Google Public Data search option was announced on the same day that the much-anticipated Wolfram|Alpha data-centric "expert knowledge" engine was first demonstrated to the public.

The coming era of the web is based on data, on drawing patterns and meaning out of a far larger body of data than the human mind alone could ever comprehend. The explosion of data (much of which is now created by the people formerly known as the audience), combined with commodity level storage and processing power, makes technology like what Google began to unveil today possible and important.

Google made its reputation by showing people the most important web pages on any topic. In the future, search engines will grow in importance as they become more capable of showing us what is most important across all web pages and all other available data, about any given topic. That's why we find the wide open conversations and social connections on Twitter so interesting, why we argue that the real motherlode of value in Facebook is not just individual streams of data but open access to all the data for analysis, and why we're so intrigued to see Google enter this space.

The availability of census and other public data has helped illuminate a wide variety of issues through "computer assisted reporting" - from the redlining of housing loans along racial lines to very current studies of ongoing urban segregation.

Just like blogging democratized publishing, we hope that Google and other services will make enough data sets available for anyone to cross reference and visualize that analysis of public data will also become something that anyone can do. That means that a whole lot more of it will be done.

]]>Discuss]]>
http://www.readwriteweb.com/archives/google_begins_to_make_public_data_searchable.php http://www.readwriteweb.com/archives/google_begins_to_make_public_data_searchable.php Analysis Tue, 28 Apr 2009 13:28:37 -0800 Marshall Kirkpatrick
Can Digg Keep Up With Facebook? compete_logo_mar09.pngLooking at a regular graph of traffic data from Digg and Facebook, it would be easy to assume that Digg is lagging far behind Facebook's staggering growth. However, Compete just produced a very different graph that compares traffic at Digg and Facebook since their respective launches, and according to this data, Digg is actually doing better than Facebook. Facebook is obviously older than Digg, so while it has more traffic now, Digg's growth since its inception has actually been faster than Facebook's.

]]>Sponsor

]]> As you can see from the graphs, Digg and Facebook had very similar growth curves for the first four years of their existence, and according to Compete's historical data, Digg's traffic was actually greater than Facebook's for 33 out of 51 months.

digg_facebook_comparison_compete.pngIt needs to be said, though, that Facebook's user base has exploded over the last year, while Digg's traffic 'only' grew by about 50% according to Compete. During its fifth year, Facebook's traffic more than doubled from about 28 million visitors to over 73 million.

As Jay Meattle points out in his guest post for Compete, Digg will have to come up with something very special if it wants to continue to match Facebook's growth.

Can Digg Become Mainstream?

In a way, though, comparing Digg to Facebook isn't even necessarily fair, as they provide two completely different services, but in terms of the users they want to reach, both have very similar aspirations. For now, Digg, however, hasn't been able to break into the mainstream (even though Kevin Rose and Alex Albrecht made an appearance on Jimmy Fallon last week), while there is a good chance that even your mother is now joining Facebook. If Digg wants to continue its growth, it will have to find a way to attract more mainstream users without alienating its base.

]]>Discuss]]>
http://www.readwriteweb.com/archives/digg_facebook_traffic_comparison.php http://www.readwriteweb.com/archives/digg_facebook_traffic_comparison.php News Fri, 20 Mar 2009 09:32:51 -0800 Frederic Lardinois
Finally, A Practical Use for Second Life When you think of virtual worlds, the first one that probably pops into your head is Second Life, but in reality, there are a number of different virtual worlds out there. There are worlds for socializing, worlds for gaming, even worlds for e-learning. But one thing that most virtual worlds have in common is that they are places for play, not practicality. (Yes, even the e-learning worlds are designed with elements of "fun" in mind). Outside of some reports that virtual worlds will replace web conferencing in the enterprise, we haven't seen a lot of innovation in this space which would make businesses sit up and take notice. However, that may be about to change thanks to new software that lets you perform data visualization and manipulation techniques within the virtual world environment.

]]>Sponsor

]]> About Glasshouse

The software, Glasshouse by Green Phosphor, lets you take data from either a spreadsheet or database query and place a 3D representation of it into a virtual world environment where it can then be explored interactively. Users are inserted into the virtual world as an avatar which can then manipulate the visualization of the data by drilling down into it, re-sorting it, or even just spinning it around to see it from all angles.

The benefits to working with data in this way don't really need to be touted too much - many businesses already perform data visualization, often using expensive software and powerful computers to do so. What makes what Green Phosphor does so interesting is not that they've come up with a way to visualize data - it's that they've come up with a way to leverage the platforms of virtual worlds to do so.

How it Works: CICP (Think HTTP for Virtual Worlds)

Some of the company's solutions involve using a proprietary virtual world, "Glasshouse," for data visualization, but for Second Life, Sun's Wonderland, and other virtual world users, they've developed adapters that project graphs from Glasshouse into whichever virtual world you're using. The only requirement is that the virtual world be CICP-enabled.

CICP, or Content Injection and Control Protocol, was developed in-house by Green Phosphor CEO Ben Linquist and released to the public domain. The standard, cross-platform protocol essentially serves as HTTP for virtual worlds where it works as a communication mechanism that the Glasshouse gateway can use to generate temporary artifacts in the worlds. Already it has been added to Sun Wonderland and released under the GPL license there. It has also been implemented in Second Life with the help of a Java servlet and released under a BSD license. The company is currently working to add it to other virtual worlds, too.

Data Viz for Anyone: From Spreadsheets to Biotech

Depending on company size, there are three different levels of service available. First, a spreadsheet world lets you upload Excel spreadsheets that can then be visualized in a web interface. Next, there's a workgroup appliance that delivers data visualization and virtual conferencing needs to small or medium-sized businesses. And finally, enterprise solutions designed especially for virtual markets like bio-technology have also been developed as more customized solutions.

As Linquist explains in this YouTube video, the technology is even advanced enough to produce a virtual laboratory where researchers can perform model-based drug development.

If you have Java installed, you can test their web-based virtual world demo by clicking here (launches Java window). For more information about their solutions, visit GreenPhosphor.com.

]]>Discuss]]>
http://www.readwriteweb.com/archives/finally_a_practical_use_for_second_life.php http://www.readwriteweb.com/archives/finally_a_practical_use_for_second_life.php Products Thu, 19 Mar 2009 10:42:33 -0800 Sarah Perez
Hitwise: Twitter Drives Traffic to Blogs and Social Networks, But Not to Retail Sites twitter_logo_Jan_09.pngAccording to the latest data from Hitwise, Twitter sends most of its traffic to Google, Facebook, TwitPic, and MySpace. Overall, Twitter sends about 1 in 5 users to social networks and another 1 in 5 to entertainment sites like Twitpic, YouTube, or Flickr. Even though some people think that Twitter is just a 'poor man's email system,' Twitter's clickstream profile is very different from that of most email services.

]]>Sponsor

]]> There are a number of interesting results in Hitwise's study. Among others, Hitwise notes that a higher share of downstream clicks from Twitter.com go to blogs and personal websites than from search sites, social networks, or email services. A larger number of Twitter users are also being sent to news and media sites, which points towards Twitter's growing role as a medium for sharing and breaking news stories.

hitwise_twitter_downstream1.png

twitter_downstream2.pngAnother interesting fact about the downstream clicks from Twitter is that very few users go from Twitter.com to retail, business, or finance sites.

Here are a few other interesting findings:

  • after visiting Twitter.com, more users visit Etsy.com, the marketplace for buying all things handmade, than Amazon
  • in terms of downstream clicks, CNN.com is the most popular news service on Twitter
  • Yahoo Mail gets more downstream clicks than Gmail or Windows Live Mail

One caveat about this data that Hitwise does not mention, however, is that a large number of Twitter users never even visit Twitter.com because they use more fully-featured desktop or mobile clients like Twhirl, TweetDeck, or Tweetie. Hitwise obviously doesn't have access to this data, but it would be interesting to see if those Twitter users who use a Twitter client exhibit a different behavior compared to those who use the web site.

]]>Discuss]]>
http://www.readwriteweb.com/archives/hitwise_twitter_downstream_traffic.php http://www.readwriteweb.com/archives/hitwise_twitter_downstream_traffic.php News Thu, 12 Mar 2009 10:30:14 -0800 Frederic Lardinois