aggregate data - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/aggregate data en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 18:04:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Website Lends an Ear to Student Woes - Then Reports Trends to Schools spilllogo.jpgHeidi Allstop was a Junior year psychology student when she launched her online business Student Spill, a website where students can anonymously submit descriptions of their personal problems and receive responses within 24 hours from trained student supporters.

Now available on 10 campuses around the United States, Student Spill provides a simple method of offering support and of gathering information about what kinds of support a school's students really need. "Usually universities are wrong in their assumptions," Allstop says. "They have no way to get insight into what is bothering students, to know what students are crying on their pillows about." Spill-using schools can leverage the data the service provides for student retention, risk mitigation, suicide prevention and to develop recommendations for services they should consider. It's an excellent example of value created through analysis of aggregate social app user data.

]]> schoolspillscreen.jpg

Above: One school's report for one semester.

Allstop says that sales to schools are led by students who discover the service and ask that schools budget to engage it. Users are required to have an .edu email adress and their "spills" are responded to by student volunteers. Those volunteers are trained in effective listening and writing empathetic responses. Four to six volunteers send a response to each Spill. "This gets the student supporters more engaged," Allstop says, "and offers multiple perspectives for the Spiller to deal with their problem.

Allstop's alma mater, sees the heaviest use - generally between 6 and 12 spills submitted per day. That seems very small by general interest consumer web standards, but as Allstop says that makes schools happy. "If you're changing 6 to 12 peoples' lives a day, that's all it takes," she says.

Spill tells schools how their students' lives compare to the lives of students at other schools and Allstop says she thinks those analytics will become all the more valuable as her service scales. She hopes to move into corporate and other markets as well.

"Fifteen to twenty percent of students feel comfortable enough with a counselor to seek out that kind of help," Allstop says. "Schools are getting all their data from that small group. 82% of Student Spill's users indicated embarrassment or fear as the number one factor that had prevented them from utilizing campus counseling facilities in the past." Allstop argues that her service is a non-threatening way to gather more representative data about actual student concerns, while also providing direct support to students in distress.

That model of service-based analytics being used to power recommendations to institutions is likely to become far more common in the future.

]]> Discuss]]>
http://www.readwriteweb.com/archives/website_lends_an_ear_to_student_woes_-_then_report.php http://www.readwriteweb.com/archives/website_lends_an_ear_to_student_woes_-_then_report.php Data Services Wed, 17 Nov 2010 14:18:05 -0800 Marshall Kirkpatrick
The Man Who Looked Into Facebook's Soul Youth social networking researcher danah boyd has observed that many people presume the way they use social networks is the way everyone uses them. "I interviewed gay men who thought Friendster was a gay dating site because all they saw were other gay men," she says. "I interviewed teens who believed that everyone on MySpace was Christian because all of the profiles they saw contained biblical quotes. We all live in our own worlds with people who share our values and, with networked media, it's often hard to see beyond that."

Now picture our perspective leaving our own experiences, zooming out and up until we can see how all the different groups are interacting on a worldwide social network. That bird's-eye view could be both beautiful and horrible if the resolution was clear enough. That's what a Ramen-eating, ex-Apple engineer named Pete Warden is about to release to the public this week.

]]> This Wednesday, Warden will make Friend, Fan page and name data from hundreds of millions of Facebook users available to the academic research community. It's a move that Facebook has to have seen coming, a move that many in the data-centric community have been calling on the company itself to do for years, and an event that's been complicated by Facebook's recent privacy policy changes, which have muddied the waters of right and wrong but rendered even more data available for outside analysis.

If what people call Web 2.0 was all about creating new technologies that made it easy for everyday people to publish their thoughts, social connections and activities, then the next stage of innovation online may be services like recommendations, self and group awareness, and other features made possible by software developers building on top of the huge mass of data that Web 2.0 made public. It's a very exciting future, and Warden is about to fire one of the earliest big shots in that direction.

Nerds in Space: Social Graph Analysis For Solving Large-Group Problems

Warden studied Computer Vision in college in the U.K., then got into game development. After moving to L.A., he spent six years building graphics drivers for the original Playstation and the XBox. Then he started his own independent business, where, thankfully, he open-sourced much of his work (something he's still doing today).

When he found out that starting his own business wasn't going to work with his immigration status, he was very fortunate to have also caught Apple's eye with the software he had been releasing to the public. Apple bought his company in order to bring him on board. The proceeds of that small sale are now sustaining his next project after going independent again.

After spending five years at Apple struggling to navigate the maze of people and connections and types of expertise in order to get the information he needed, Warden decided to go independent and build a company that solved exactly that kind of problem. "I can't think of a better big company to work for, but it was still a big company," he says. "It was hard to find the right people to talk to, whether for particular expertise or for contacts at external companies." And so Warden left Apple to build a company that would use social graph analysis to solve problems like that. He called the company Mailana, a play on "mail analysis" since he was initially focused on email social graph analysis.

We've written here a number of times about Mailana's tool that analyzes the social graph of any Twitter user. Enter the username of someone on Twitter and Mailana will show you which 20 other people the user has exchanged the largest number of reciprocal public @ replies with. Find someone interesting or important? Mailana's Twitter analyzer will tell you who they most regularly interact with. See, for example, The Inner Circles of 10 Geek Rockstars on Twitter.

Pulling Down the Facebook Social Graph

Now Warden is about to unveil a much larger project along the same vein. For the past six months he's been crawling public profile pages on Facebook. He now has more than 215 million of them indexed and updated about once a month. When he began he was using the Web crawling service 80legs, but over time he had to build his own crawling infrastructure.

When I talked to him this afternoon, he had already begun uploading 100 GB of user data onto his server to make it available for academic research starting on Wednesday. Warden says he's removed identifying profile URLs but kept names, locations, Fan page lists and partial Friends lists. All those fields of data are just waiting to be analyzed and cross referenced. That's one very rich resource.

Yesterday Warden posted some of his own initial observations from the data on his personal blog. Those included:

  • In almost every state in the Southern U.S., God is number one most popular Fan page among Facebook users. Among people in the L.A., San Francisco and Nevada regions? "God hardly makes an appearance on the fan pages, but sports aren't that popular either," Warden writes. "Michael Jackson is a particular favorite, and San Francisco puts Barack Obama in the top spot." In the Oregon and Idaho region? Starbucks is number one.
  • In the Mormon-influenced areas of Utah and Eastern Idaho, the most popular Fan pages are The Book of Mormon, Glen Beck and the vampire book Twilight, which was authored by a Mormon.
  • The bulk of Warden's posted analysis yesterday was about location networks. People in the western U.S. tend to have Facebook friends all over the country; people in the southern U.S. tend to mostly be friends with people who have remained in the same area.

Taking a Deeper Look

These observations are interesting, but they are only the beginning of what's possible. Name, location, friends and interests are great data points to analyze. Warden has written a program that will estimate gender as well, based on names. All these data points can be cross-referenced with outside data, too. Members of Facebook's own staff did this kind of analysis when they compared user last names to U.S. Census data, which allowed them to estimate changes in Facebook's racial composition over time based on the likelihood of people with particular last names to report a particular racial backgrounds.

"I'm mostly thinking 'What do I try first?'," Warden says. "There's so many interesting ways to slice the data - especially as I'm starting to get changes over time. I'm also trying to map out political networks in aggregate; how polarized the fans of particular politicians are - so how likely a Sarah Palin fan is to have any friends who are fans of Obama, and how that varies with location too. One of my favorite results is that Texans are more likely to be fans of the Dallas Cowboys than God."

Warden says he hasn't talked to anyone from Facebook since he started crawling the site, but he did get an email from someone on the security team asking him to take down instructions he'd posted that exposed a security hole that made harvesting peoples' email addresses easy. So the company is paying attention. "I'd love to see them put me out of business by putting decent data out there," Warden says. He says his Amazon Web Services bill was over $5,000 last month.

Why is he indexing all this content and why is he going to hand it over to the academic world later this week? "I am fascinated by how we can build tools to understand our world and connect people based on all the data we're just littering the Internet with," Warden says.

"Nobody thinks about how much valuable information they're generating just by friending people and fanning pages. It's like we're constantly voting in a hundred different ways every day. And I'm a starry-eyed believer that we'll be able to change the world for the better using that neglected information. It's like an x-ray for the whole country - we can see all sorts of hidden details of who we're friends with, where we live, what we like."

For a great example of the kind of social impact that data analysis can make, Warden points to some of the fascinating ways that GIS data is illuminating the intersection of race and public services. Data has shed light on social injustices for decades, and measurable information about the interactions of hundreds of millions of people every day on Facebook offers opportunities to discover both good and bad news about the contemporary human condition.

Warden says he's not yet been able to interest any investors in his ideas for businesses based on this data, so his girlfriend Liz Baumann, a former insurance actuary, stepped in to help and is now running much of the crawling. He says he's now focused on "working on ways of presenting all this information in a form that answers questions for people willing to pay." His first experiment along those lines is the very interesting FanPageAnalytics.com.

What does Pete Warden hope for from this week's public release of all this Facebook data? "Hopefully I'll get to see a bunch of interesting [academic research] papers come out of it, worst case. And I'd like to be the guy people turn to when they need stuff like this."

Already well-respected among a fringe group of bleeding-edge geeks, we hope that Warden's work on social graph analysis will end up impacting a far larger number of people than may ever know his name.

]]> Discuss]]>
http://www.readwriteweb.com/archives/facebook_user_data_analysis.php http://www.readwriteweb.com/archives/facebook_user_data_analysis.php Analysis Mon, 08 Feb 2010 21:15:35 -0800 Marshall Kirkpatrick
Google Analytics Benchmarks and the Future of Portable Data Google announced a new feature for its web analytics product this week that illustrates well the potential in anonymous aggregate data analysis. This siloed product announcement points to an even more exciting future if data portability dreams come true.

Google Analytics Industry Benchmarking will let users opt-in to share and have access to aggregate traffic info for websites in their industry vertical and at other points in their supply chain. (See sample screenshot below.)

]]> The idea is to allow companies to compare their website performance over time and to put their experiences in context with the experiences of other related businesses. If an action you took seemed to have caused a big traffic spike, it would be good to confirm that it was not just an industry-wide traffic increase that actually occurred. Likewise, if traffic growth for your business has a particularly strong correlation with growth in a related businesses sector, then some biz dev time might be warranted there.

Online invoicing service FreshBooks has been doing the same kind of thing for individual contractors for some time ("other consultants in your field are getting their invoices paid on average 2 weeks faster than you are"). Personal finance service Mint compares your spending habits to those of other users, NetWorthIQ uses aggregate financial data for wealth benchmarks and Yahoo!'s MyBlogLog displays aggregate traffic trends for users with similar web browsing interests.

These kinds of data driven value add are enabled in most cases by the network effect of a successful app but also by the world of web services. If recommendation engines are often the result of aggregate information analyzed and pointed at an individual, then industry benchmarks may be the flipside - aggregate information aimed at organizations.

Just add data portability to change the game

The new Google Analytics Benchmarks are a peek into an exciting future and a further example of how data portability could yield even further innovation. Today a huge business like Google can best scale these kinds of data sets in-house, but imagine a future when secure data portability is a reality.

If users could port their commercial or behavioral data from service to service, then analysis of significant aggregate data could take on forms limited only by an innovator's imagination and ability to persuade users to bring their data to the party. That kind of value add could become the core of any number of services in the future. It's very exciting.

Standards based data portability is clearly not a requirement for startups to be able to quickly scale services based on analysis of anonymous aggregate data, but it would be a game changer by making this kind of innovation much, much easier. For now we'll have to enjoy innovation in the big data silos and imagine the future when this kind of access to data is blown wide open for vendors.

gbenchmarkscreen.jpg

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_analytics_benchmarks.php http://www.readwriteweb.com/archives/google_analytics_benchmarks.php Product Reviews Thu, 06 Mar 2008 14:24:26 -0800 Marshall Kirkpatrick