ReadWriteWeb

MIT Researcher Collecting Passive Social Graph Data From Cellphone Activity, Bluetooth

Written by Marshall Kirkpatrick / December 21, 2007 6:39 PM / 5 Comments

Sandy Pentland, a researcher at MIT whose work has received funding from Nokia, is working on processing more than 350,000 hours of data collected from peoples' cell phones. More than just who calls who, Pentland is also studying proximity, location and activity data using information like interactions recorded between Bluetooth devices.

The result is a field Pentland has given the obnoxious name "reality mining."

In an interview yesterday with MIT's Technology Review (found via author Nick Carr), Pentland says that self-reporting of social connections and roles is far inferior to the kinds of analysis that can be done using passively collected data via mobile devices. While calling this data "reality" denies the importance of our hearts, minds and other parts of reality as yet imperceptible by our cell phones - it is very interesting research none the less.

This is where discussions about things like OpenID, OAuth and OpenSocial are likely to be played out. Passive mobile data will be a huge part of and will leverage your Social Graph. Once this kind of data becomes readily accesible in sophisticated ways, that could be when we'll see Telcos pressuring web services to produce standards compliant data - so they can make use of it for mobile marketing and services. Some of those services will be awesome and I anticipate them with both eagerness and caution.

Pentland predicts a future when he'll be able to use frequency of calls, physical proximity and interruptions in conversations to determine for example who among your Facebook friends is a real life friend, who you've never met in person and who is your superior in a workplace hierarchy. I see different ring tones for these different groups of people some time in the future!


Pentland also says that the data mobile devices can capture will be good for early alerting of things like epidemics (15% of the residents of an apartment building didn't go to work today - could be a problem). Using special software and already available hardware, there's a whole lot of data that can be collected - it's just a matter of figuring out how best to crunch that data.

Just Imagine the Shopping Opportunities!

Some people seem dead set on making the movie Minority Report a reality, Pentland among them. (Can we just have the interface without the mind reading, please?) Obviously the marketing opportunities that will arise from this kind of data are huge. Big, big money.

When your phone and Facebook put their heads together with your boss's Amazon wishlist - the only question that will remain is whether the birthday presents will be purchased via your phone or via your web enabled brain implant (11% of US respondents say they are somewhat or very likely to get one).

What Will the Rules Be?

Data mining is not bad. In fact, it's quite an exciting idea with a whole lot of potential. As long as it's not used to catch me thinking subversive thoughts - then let's go with it. That's not even an "if" - that's pretty much a deal breaker. Let's ignore that for just a moment, though.

Pentland articulates two good rules in his interview. First, there has to be an opt-out (or opt-in) option. Second, aggregate data needs to be anonymized and your individual data needs to be viewable by no human eyes but your own. When he says we need a "new deal" for privacy, I think that's probably a good choice of phrases.

Mobile devices are wonderful, life and world changing things. They are also the hardware for projects like Pentland's, for better or for worse.

Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. "it's quite an exciting idea with a whole lot of potential" ... something about that seems immediately self-evident, no? That makes me suspicious.

    Call me perverse (go ahead ... call me) but I find myself having to ask (ok, so call me compulsive): given the set of "wonderful ultimate applications", if we remove the subset "marketing", are we not left with an empty set?

    Unless stalking is your personal hobbie, what's the pragmatic foundation / rationale for tracking people that way? Ok fine, support ... as an extension of, ayup, marketing.

    10 minutes ago I twittered "if I had been mercenary / without conscience I'd be designing UI for mobile devices" and that's how I feel about this bastard child of CS and social-psych.

    --bentrem

    Posted by: Ben Tremblay | December 21, 2007 7:50 PM



  2. Ben, you are compulsive and perverse - but you do bring up a good question. In addition to marketing, other possibilities that are more user-centric include: easier access to your own data and more sophisticated services from your mobile device, personalization of services and content/user recommendations. In other words, you could go to a party and have as an option meeting the person your phone told you had the most overlap with your interests and modus operandi. I wouldn't rush over the that person and speak to no one else, but I'd make sure that I did meet them.

    Those are some of the possibilities off the top of my head.

     Posted by: Marshall Kirkpatrick Author Profile Page | December 22, 2007 12:05 AM



  3. What he is using is called Data-mining. It is a combination of statistics and machine learning.

    Posted by: Falafulu Fisi | December 23, 2007 10:07 PM



  4. "It is a combination of ..." Very easy to say "machine learning", but there's many a slip twixt cup and lip.

    In an R&D setting I almost bought into AI for built-in-test-equipment (BITE) ... elegant. Or intractable. We needed at least ball-park estimates of required effort. In the end I sat down and with the others went for expert-system. (Rules based ... not easy, but with FMECA under your belt you can know what you're facing.)

    What worries me is /any number/ of success scenarios; Heisenburg's uncertainty principle at work in marketing? Puh-leeze!
    When something works 80% for 80% of the people, you've got a "success".

    I hope we can debate this somewhere / sometime. Given the number of replies I'll opt to de-voice myself now. (Insidious, ehh whot? *grin*)

    and to all: the very best good fortune in '08
    bdt

    Posted by: Ben Tremblay | December 29, 2007 11:03 PM



  5. Ben said...
    We needed at least ball-park estimates of required effort. In the end I sat down and with the others went for expert-system.

    Ben, I specialize in scientific computing (numerical computing and anything that involves differential calculus is my domain) and that includes machine learning, data-mining, computational intelligence, blah, blah, blah... The difference between symbolic expert system and adaptive is that the symbolic ones are inefficient when data it applies to, are continually changing. Symbolic expert system is good for something like an accounting system, but is useless to use in an environment such as automated online product recommendation such as Amazon (eg: Customers who bought item A also bought item B,...).In a situation when data are continually changing, symbolic expert system are almost useless. Adaptive expert system using machine learning (I don't like using the world AI , since it is a misleading term), is that when the data changes, the rules captured by the engine changes, a good example is the Amazon product recommendation engine stated above. The buying behavior of online customers changes frequently every hour, and an expert system that is designed with rules of yesterday would almost instantly useless after a day of trading at Amazon (as I said, the data from yesterday has completely or almost changed today, ie, the rules applied yesterday when the system was designed, aren't valid anymore today). Symbolic expert system is deductive (query-based) while adaptive expert system is inductive (discovery-based).

    One of the most popular adaptive expert system of today (which I had developed a number of applications using this methodology) is ANFIS (adaptive neuro fuzzy inference systems). ANFIS is a combination (hybrids) of artificial neural network & fuzzy logic. You give your expert system an initial set of rules. When the real world data changes, the neural network adapts itself to the changing data. This is done without any human to re-program the new rules (since the data has changed from previous or original rules). ANFIS has outperformed stand-alone neural network and also outperformed stand-alone fuzzy logic application. Outperformed means that it has less classification error compared to its other 2 stand-alone counterparts.

    ANFIS has been applied in bio-informatics (Neuro Fuzzy Classification and Detection Technique for Bioinformatics Problems). Interestingly, ANFIS is the algorithm that NASA Shuttle space-craft is using for its space-station docking mode. When the shuttle approaches the space station, the space-craft is switched to automatic mode (adaptive guidance system using ANFIS). ANFIS with its sensors feeding data into it (live) is able to learn on the fly of how to best manoeuvre the vehicle into the docking port without the slightest mis-alignment. Any mis-alignment by (4 or more inches) event by a slight margin endangers the space-craft itself and also the space-station. So the situation is fatal for appreciable amount of mis-alignment. If the shuttle uses a symbolic expert system to guide it to the docking port, it would be useless, since the prior rules of how to manoeuvre the aircraft to the port, would be different each time. I read about how ANFIS is used for the Shuttle docking guidance system in a 1999 issue of IEEE Transactions on Control System Technology .


    Posted by: Falafulu Fisi | December 30, 2007 1:35 AM



The ReadWrite Real-Time Web Summit
RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS