MySpace has taken a bold step and allowed a large quantity of bulk user data to be put up for sale on startup data marketplace InfoChimps. Data offered includes user playlists, mood updates, mobile updates, photos, vents, reviews, blog posts, names and zipcodes. Friend lists are not included. Remember, Facebook and Twitter may be the name of the game these days in tech circles, but MySpace still sees 1 billion user status updates posted every month. Those updates will now be available for bulk analysis.
This user data is intended for crunching by everyone from academic researchers to music industry information scientists. Will people buy the data and make interesting use of it? Will MySpace users be ok with that? Is this something Facebook and Twitter ought to do? The MySpace announcement raises a number of interesting questions.

The 22 sets of data being made available are cheap. Prices range from $10 for raw dumps from the MySpace API to $300 for everything broken out by latitude and longitude. Subsequently derived data sets can be put on sale by InfoChimps users as well, with a revenue split.
Analysis coming from the data could include things like music trends per zipcode, popular URLs being shared, etc.
MySpace is generally thought of as a social network on the decline, but if it is able to position itself as the place to do music still then its hundreds of millions of users could remain engaged. Will data scientists want this data, though? Time will tell, but MySpace has long done cooler things with data than competitors Facebook and Twitter and people haven't gotten terribly excited about it yet.
Related: See today's coverage of the cancelation of the Netflix Challenge due to privacy concerns.
Bulk user data has tremendous analytical potential and both Facebook and Twitter have thrown the breaks on 3rd parties offering up their user data more than once. We covered InfoChimps' offering of bulk Twitter data in depth this Fall, but the marketplace quietly removed that data after Twitter asked them to "wait" for a second time.
In February we profiled Pete Warden (The Man Who Looked Into Facebook's Soul), a developer who planned on putting a huge pile of Facebook user data online for academic analysis. As we wrote in that article:
If what people call Web 2.0 was all about creating new technologies that made it easy for everyday people to publish their thoughts, social connections and activities, then the next stage of innovation online may be services like recommendations, self and group awareness, and other features made possible by software developers building on top of the huge mass of data that Web 2.0 made public.
Days later Facebook contacted Warden days later and asked him to hold off on release of that data as well. Last week Warden posted open source code for harvesting the same type of bulk user data from Google Profiles, so the game's not up yet, not by a long shot.
Why is this kind of big data interesting? This rational may be less applicable in the case of MySpace given its focus on music, or it may be more applicable given the allegedly poorer user demographics on the site compared to Facebook, but here's how I explained my interest in big social network data analysis in general, as part of a discussion about an excellent special report on big data in the Economist this month.
I think in big data there lies a lot of hidden patterns that represent both opportunities for action and for reflection. At RWW we're working on trying to find ways to mine data to find news first (we've got some interesting methods employed already) and personally, I think the world is an awfully unfair mess and I'm hoping that data analysis will help illuminate some of the hows and the whys. Like the way that real-estate redlining was exposed back in the day by cross referencing census data around racial demographics and housing loan data. That illuminated systematic discrimination against black families in applying for home loans in certain parts of town. So too I think we'll find a lot of undeniable proof of injustices and clues for how we might deal with them in big data today.
What will we see come out of MySpace's bulk data? What could we see come from Facebook and Twitter data if only they would let people get their hands on it? Time will tell.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
As far as "Will MySpace users be ok with that?" I would suggest that an opt in/out in the user's profiles stating "I wish to make my data public" OR "I wish to donate my data for scientific research" will be fair (pretty much like the organ donation on our insurances).
Also what is sold and what remains private should be extremely well explained to the MySpace users if they do not want to scare them and make them flee.
MySpace have been having a tough time, but they deserve credit for taking a lead in moving the industry forward in relation to opening up some of this social network data, in the same way that a lot of scientific, government, financial and sensor data is now being made public, and starting to unleash innovation and creativity around analytics on big data.
Next week we'll be launching Cloudcel (www.cloudcel.com) at DEMO (www.demo.com) - the first realtime, massively parallel cloud computing platform for big data. The world's 100 million Excel power users will then overnight have access to an analytics tool that can be used seamlessly from inside Excel, but is as powerful and scalable as anything used by the top programmers at Google or on Wall Street, as powerful as anything on the planet.
Just keep participating world. So we can all make money.
Well, not all of us I guess.
....Were the myspace members informed that their data was going to be sold ?
...Have my given myspace permission to sale their data ?
....Who at myspace is responsible for this initiative and is anyone going to ask them if they have the members permission to sell their data...
....Does infochimps check to see if company/organization uploading data into their system has permission to sell the data ?
Is myspace going to share any of the revenue that they receive from the sell of the data with the myspace members that own the data.....
How come the members of myspace that generate millions of dollars for them do not have free access to the data ?
http://www.factoetum.com/factoetum/Juanita_Goggins_Civil_Rights_Activist
Bruce, well put. I should do a write-up just about bulk user data and privacy. What I'd start it off with would be a conversation I had with the EFF when Infochimps went public with Twitter data. A rep there told me that since this data was published publicly, they believe it is within these firms' rights to make use of it. There never was an expectation of privacy around this stuff and these various service providers have added value in aggregating, hosting and serving up the data. Thoughts?
.....Can we say there never was an expectation of privacy without asking the members their opinion of having there data sold ?.....
....For me it is not about what the EEF thinks is public data or not...Its about respect for the members that generate millions and receive very little in return....If what myspace is doing by selling members content is above reproach; why not let members know that their data will be sold ? Why not give members the option to opt out...?
Marshall, I think you should definitely write about that. You understand the issues as well as anyone in the industry. Your comment on EFF/Infochimps seems right, but Bruce's comments are also well put, and reflect the standard view of most social network users who care about such things, i.e. if the social network is making money on "my data" how do I get some benefit.
At Cloudcel, we're in the big-data-analytics-platform business, not in the selling-data business, so we have a different perspective. We want as much data as possible to be opened up, and opened up in realtime wherever possible, because that way we can move towards smarter businesses, a smarter web, smarter government, better science, less risky finance, smarter grids etc.
Given that social networks, search engine companies and other large data gatherers will almost certainly need to provide users with at least "their own data" free of charge or risk losing those users to a competitor, we're pretty confident that you will soon see new ways in which innovative business models can be built that are more of a win for the ODPs ("original data providers"). A side effect of this will be that the walls of the walled gardens will be more porous over time.
I could also go on about the related issue of how the PubSubHubbub web will also radically shift the relationships between web site owner ODPs, old-school search engine crawlers, and new data platform companies that have a quite different goal from search engines. Again I think we'll see business innovation that can offer more to an ODP than just inclusion in the Google index.
There are a couple issues at play here.
One is the question to whether MySpace users own their data - they don't. By using MySpace's service, they are granting MySpace a royalty free license to do whatever MySpace wants with that data.
Second is the issue of profiting on the sale of this data. Look at Radian6, Rapleaf, and the whole ecosystem of listening platforms and real-time search. All of these business models are built on top of giving paying customers information out of social network data. Infochimps is just a marketplace for data and we happen to be in the lucky position that one of the social networks agreed to take this step with us. The idea of a move like this is that these are products which are going to serve this ecosystem, and many others.
Third is the privacy issue. Many of the datasets that went up today are derived datasets that lack any user information whatsoever. Datasets like number of application adds for a 3 month period or total word count are only aggregates, not containing usernames or ID's. The only user information are in the daily raw dumps, in which every piece of data that is on Infochimps is public and also available on http://myspace.collecta.com/ through MySpace's real-time feed.
Wow! I need to think about this. For that matter, after something I learned today about publishing Twitter rate statistics, I need to think about a *lot* of things.
I'm glad I'm not at SXSW - between "big data" and "location-based services", there are plenty of opportunities to fall into some pretty scary traps and fall prey to wishful thinking. Keep plugging away at this stuff, Marshall.
Meanwhile, I think I'll spend the next week playing with something safe, like algorithmic composition and synthesis of music. ;-)
I completely concur with Bruce Wayne and I feel that my privacy and information should be available to be viewed by me and me alone. If they were to use my information, I would like them to make sure that the public and I are aware of this and that we have given our wholehearted consent. This form of monetization and violation of our rights will continue to grow unless we stop them.
The Netflix Prize II cancellation is another example of why we need a lot more discussion around these issues. Here we have a great example (Netflix Prize I) of how the simple availability of data had a huge impact on the science and the business of computational/algorithmic recommendation and machine learning. It seems that for a tiny sum, $50K, Netflix and all the others who want to help create a world in which advertising and recommendation are helpful rather than an annoyance, could have continued this outstanding work with a bit of standard automatic data masking. Crazy!
I am absolutely thrilled.
In an interesting article about the sale of user data, you added a bit of a personal (and even humanizing) touch by talking about data and injustice.
Thumbs UP, Mr. Kirkpatrick!
@MELTHEL
Also - this data was not manually put on Infochimps by MySpace, but retrieved and packaged from their API by Infochimps and placed on our site. The data is always free from MySpace's API, though we are able redistribute.
The trouble with large data sets is that you may see patterns that aren't actually there. The idea of research is to form a hypothesis and then do research to confirm or deny that hypothesis. When you have this mass amount of data and no particular ideas about it, then you can go about proving whatever you like because you can chop up the data in whatever way you want.
bulk user data sales i a good move for them
LYF BRANDS IN PRE-LAUNCH
http://lyfbrands.devhub.com
This information is going to be available to not only the US government, but all governments willing to pay for it. Makes watching 1 billion people even easier.
This is so very interesting at so many levels.
First of all, it is, I guess, inevitable that this data is now becoming available and that social networks are keen to cash in on it. Most social networks trade at a value (most for the purposes of private funding) way ahead of what an accountant would call logical based on a reading of their balance sheets precisely because they provide access to enormously rich data on the nature or relationships, trends, likes, dislikes... just the sort of stuff that market research and aggresive marketers would love to get their hands on. The assumption has always been that this data would be available to marketers who would pay handsomely for it one way or the other. I guess the surprise is that the raw data is available for download, rather than MySpace putting together clever marketing packages to enable advertisisers to exploit this stuff on-site. But then as analyst Allen Bonde says, "Mass marketing is dead" -- perhaps the very conversation elements of social media means the data HAS to be used in other ways other then to exploit for advertising, because mass promotion no matter how well qualified with target data is a thing of the past and has no place in the future Social Web,
The second hugely interesting angle on this is whether this will prompt other social networks to do the same to realise some income from the wealth of information they also hold. Facebook has many times the number of active users (probably 5 times what MySpace has in user numbers, many times that again in volume of user data); Twitter already sells tweets to Yahoo, Google, Microsoft and others. What if LinkedIn started selling details of the career history of its members? Or who they know; what Groups members follow, detailing their professional interests and affiliations?
Social networks have been around for some years now and most make a pitance from their huge user bases. Stats are hard to get as most are private companies surrounded by hype, but Facebook is rumoured to be heading for $1B of revenue from its 400 million users. This sounds a lot, but it's (easy maths) just $2.50 earned each year from each user. LinkedIn has 60 members but it's believed only 2% pay for premium membership from $25 a month -- in reality the buld of subsciption revenue comes from a small number of recuiters paying many times that for access to 60M resumes (so in reality they are already selling bulk data but on-site). Is the temptation to earn money selling the data of the 98% who don't pay too much to resist? Do they have a choice anyway when investors have put so much capital into these companies and need to see a return at some point?
All these considerations on top of whether users of such networks are actually happy with their data being sold in this way and how they might likely react when they discover...
Ian Hendry
CEO, WeCanDo.BIZ
http://www.wecando.biz
Marshall - for some interesting stats on the growth of "big data," see "How much data is generated on internet every year/month/day?" at http://bit.ly/bLKzPm. The emergence and rapid growth of these data suggest that needs, opportunities and issues addressed in your article will continue to grow.
Phil Hendrix, Ph.D., immr
Another good example of how the DaaS (data-as-a-service) world is being driven by API adoption. It's exciting to see the business models surrounding DaaS emerge and we'll be highlighting additional examples @ http://bit.ly/9fucSR
I find a couple of things about this idea disturbing:
For one, users' non-ownership of their own personal data isn't necessarily made clear by Myspace -- especially since you need a password to login and have access to your own profile, anyone else's photos, and any "private" profiles. The assumption most people make is "I'm using a protected service, so my pictures, name and personal info must be private." For Myspace to start selling this information without the users' consent, though it is probably legal, is still a pretty brazen betrayal of their trust. If I were a Myspace user (I got annoyed with the service and quit years ago), I would be up in arms.
Second, though I understand the usefulness of mass data (I'm in marketing), all the excitement over something like this is kinda giving me the chills. Even if the notion of personal data selling continues to grow as a research tool, there always needs to be an acknowledgment of privacy involved -- even if it's just to keep the slope from getting too slippery. No one needs social networks to become a watch-what-you-post-because-you-might-see-it-on-a-billboard sort of environment...
Hi folks.
This is not actually that new. Contrary to what the article mentions, both Facebook and Twitter datasets have already been made publicly available. I'm a professor at UC Santa Barbara, and my group did a study in 2008-2009 of 10.8 million Facebook user profiles and activity patterns based on full profile crawls. The resulting paper was published at Eurosys 2009: User Interactions in Social Networks and their Implications
Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao
Proceedings of ACM EuroSys 2009.
We've made subsets of anonymized data available online (http://current.cs.ucsb.edu/facebook). Another group studying Twitter (Sue Moon@KAIST) has made their crawled data of Twitter publicly available. Just google her name.
The difference here is that a) MySpace is making $$ off of its user information, and b) the information is incredibly detailed and more fine grained than what people can get from parallel crawlers...
@Joseph Kelly #8:
I'm sure you're aware that ownership of data is a contentious issue, one which will continue to spur court battles and political fights. Just because the feds made the wrong decision a few years back doesn't mean that it won't be overturned.
At some point, companies will have to recognize the individual's right to access and control of his or her personal data. We're just not there yet.
However, I think we need to be aware that "anonymized" aggegate user data is easily de-anonymized. Think back to AOL's data release a few years ago and you'll recall that it was possible to take the bulk data and match it with available personally-identifying data to gain frighteningly pervasive insight into a person's preferences and activities online.
I received an advert in the mail from a company I haven't done business with for a smart phone that was a near-clone of one I owned / used a year ago. They may have thought they were being cute, but I'll never do business with them now.
This will be a major problem for both of the people who still use MySpace.
Seriously, though, is this company actively trying to commit suicide? Or are there really actual people who are hired to be real executives of real companies who are this stupid?
Posted by: plankhead.com
|
March 16, 2010 6:45 PM
In today's world of identity theft, privacy of data is extremely important. The idea that my data is being sold by My Space to further bring me useless and annoying ads that I block, skip or ignore anyway is appalling. If they must sell this data for whatever reason then there absolutely needs to be an opt in question. If My Space does not need to sell the data then it needs to protect its users. It is bad enough that prospective employers and bosses use the social network to essentially spy on people but now big business will be using it to make even more money off of consumers. My first gut reaction to this article was to pull my My Space and delete my account but I have held off to see if My Space will react as Twitter and Facebook did or give the opt in clause. In the mean time, I plan to e-mail My Space to let them know my feelings.
Haha, here we go.
Selling user data whether an action is right or wrong? And I could not figure out, why buy that kind of user data?
small sofas
this is just the one that I was looking for! Thanks for
the information:)