data mining - ReadWriteWeb http://www.readwriteweb.com/feeds/search/data mining en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 18:04:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Government Report Finds Data Mining an Ineffective Way To Smoke Out Terrorists nrclogo.pngRemember the "pre-cog" cop-things in Minority Report, able to figure out who was going to commit a crime before they committed it? If that's ever going to happen it looks like it's going to have to be something super-natural - because at least these days, technology is a long way from able to predict who's going to commit a crime.

A new 350 page report released today, written by heavyweights like former US Secretary of Defense William Perry, National Academy of Engineering President Charles Vest and sponsored by the Department of Homeland Security, argues that large scale data mining of consumer and other records is of "limited effectiveness" in finding suspects preparing to commit acts of terrorism.

]]> The report was published by the National Research Council and was titled All Counterterrorism Programs That Collect and Mine Data Should Be Evaluated for Effectiveness, Privacy Impacts; Congress Should Consider New Privacy Safeguards. CNet's Declan McCullagh says the report offers a retort to the aims of the office of Total Information Awareness, whose duties were dispersed throughout the Federal government after extensive controversy several years ago.

The report notes that while credit agencies have been able to use data mining to find fraudulent financial activities, the tactic is of limited effectiveness in finding would-be terrorists for two reasons. First, because so little about the psychology and behavior of terrorists is known and second, because the resulting data is so rife with false positives that it's of very low quality.

The report argued that it was much more effective to use data mining to track known terrorists or to find people exhibiting very specific behavior. It warned against using such tactics as tracking emotional or psychological states as those are things the authors believe individuals should not be called to account for. Apparently that doesn't go without saying anymore.

perrypalin.pngMuch of the report's summary, and clearly its title, focused on the privacy implications of these false positives in particular and of this kind of data mining in general. Presumably the report was written in a different era, before it became appropriate to try out for the Vice Presidency of this country with words like "Al Qaida terrorists still plot to inflict catastrophic harm on America, and [Barack Obama] he's worried that someone won't read them their rights." (Palin acceptance speech) Evidently we live in a post-rights world now.

Thus what's most significant in today's report is the finding that pre-emptive data mining just doesn't work. Surely the ineffectiveness of pre-emptive actions is significant, isn't it? The report warned against using anti-terrorism data mining as an opportunity to find other actionable information.

The report offers a series of recommendations that include close monitoring of any such programs, possibly even including subjecting data-mining activities to regular data-mining based assessments of thier effectiveness. The report said that "legislation to clarify private-sector rights, responsibilities, and liability in turning over data to the government" was an area "ripe for congressional activity." At a time when neither party running for the US Presidency is willing to mention anything like this, such recommendations might seem either refreshing or insane.

]]> Discuss]]>
http://www.readwriteweb.com/archives/government_report_finds_data_m.php http://www.readwriteweb.com/archives/government_report_finds_data_m.php Analysis Tue, 07 Oct 2008 13:23:07 -0800 Marshall Kirkpatrick
Facebook Data Mining: Truth in Association? facebook_datamining_sept09.jpgWith a product as ubiquitous as Facebook, the public has raised a number of privacy-related concerns including optional settings, privacy policies and data mining. In the past, ReadWriteWeb covered Facebook's plans to sell user data for market research purposes. However, today's article in the Boston Globe suggests that user information can be mined for more than just advertising purposes.

]]> facebook_socialgraph_sept09.jpgAn MIT experiment dubbed, "Gaydar" by creators Carter Jernigan and Behram Mistree has employed computational analysis to identify user traits based on information listed by their Facebook friends. Through friend profiles, the program predicts the likelihood of your religious affiliations, political leanings and even your sexual orientation. Essentially the idea is that friends are likely to share traits. So if you're in the closet, but you've got loads of vocal friends, a program of this nature could potentially out you.

Said Hal Abelson, a professor who co-taught the course, "[It] pulls the rug out from a whole policy and technology perspective that the point is to give you control over your information - because you don't have control over your information."

With the service being used to catch tax evaders, in addition to a conspiracy theory citing CIA ties, it'll be interesting to see how the public reacts to this latest show of Facebook data mining capabilities. While it's unlikely that terrorist suspects are friending each other on Facebook, there are a number of associations that need not be publicized to corporate partners or governments.

Photo Credit: Steve Jurvetson

]]> Discuss]]>
http://www.readwriteweb.com/archives/facebook_data_mining_truth_in_association.php http://www.readwriteweb.com/archives/facebook_data_mining_truth_in_association.php Facebook Sun, 20 Sep 2009 19:41:26 -0800 Dana Oshiro
How Data Will Impact the Future of Healthcare (Infographic) IBM staff storyteller Chris Luongo has created a great infographic explaining the different ways that healthcare could become data driven in the future. The IBM Smarter Planet blog calls it Smarter Healthcare.

We've embedded the infographic below in Microsoft's new web page viewer Zoom.it.

]]>

The data mining part of this story is one of the most interesting to me. As one online resource has explained:

Medical (or clinical) databases have accumulated large amounts of data on patients and their medical conditions. This kind of information, stored along with that of other patients, make up an ideal place to look for new analysis and patterns, or to validate proposed hypotheses. To exploit such large volumes of medical data, numerous inductive data analysis techniques derived from Machine Learning (ML) study have been successfully applied to medical data to discover useful and new knowledge. However, medical data mining is considered by many machine learning communities as the most complex and problematic domain yet to be overcome.

Where there is data, there is opportunity for analysis and building added value. The increasing instrumentation of the healthcare experience could become a major platform for innovation and improved service for consumers.

See also: How Location Services Could Impact Healthcare

]]> Discuss]]>
http://www.readwriteweb.com/archives/how_data_will_impact_the_future_of_healthcare_info.php http://www.readwriteweb.com/archives/how_data_will_impact_the_future_of_healthcare_info.php Analysis Fri, 06 Aug 2010 09:10:02 -0800 Marshall Kirkpatrick
Digital Urdu: New Software Improves Data Analysis of Pakistan's National Language urdu_alphabet.jpgThe extent to which social media sites like Twitter and Facebook play a role in the recent political uprisings in Tunisa, Egypt, Bahrain and so on continues to be a source of debate. What is more clear, however, is that the major languages of these regions are not well-served by electronic resources that make text analysis of these documents and data possible.

But now computer scientists have developed the first software system that will allow for the processing of documents in Urdu, the national language of Pakistan and one of the five most-spoken languages in the world.

]]> The software will help lay the foundation for data mining in Urdu and provide for more accurate transliteration, as well as open the door for projects in similar languages. "This is the first comprehensive, natural language processing system for Urdu," says Rohini Srihari, University of Buffalo associate professor of computer science and engineering.

The work is a joint project between her department and Janya, an Amherst-based company she founded that provides information extraction technology in multiple languages, including Chinese, Arabic, Pashto, and Russian.

Natural Language Processing and Urdu

The problem with data mining and sentiment analysis in other languages is that they don't often have the same sort of "established electronic infrastructures" that we have for English and European languages. "If you are trying to do sentiment analysis - to find out what are the main topics people are talking about in a country, is there intensity building up over something and who is swaying opinion - then you must have an information extraction system," Srihari says.

That is what she has been working on with researchers, something that can perform word segmentation (tagging parts of speech, for example) and entity-tagging (recognizing people, place, and organization names) in a raw, untranslated Urdu document.


"Voice of the Citizen" Through Social Media Data Mining

Srihari says she's focused on the "voice of the citizen" in this project. "Some of the information is political and some of it is not," she says, and despite the turmoil in the region, a lot of the social media chatter in Pakistan is about cricket.

Srihari presented her findings at a recent conference - "Blogs & Bullets: Social Media and the Struggle for Political Change" - at Stanford Universitiy. She says she became interested in Urdu because they were looking at blogs from different cultures. Noting that the advent of the Web has caused an explosion in online content in a variety of languages, Srihari says, "When you start looking at blogs in different cultures, you can really start to understand public sentiment and opinions."

]]> Discuss]]>
http://www.readwriteweb.com/archives/digital_urdu_new_software_improves_data_analysis_o.php http://www.readwriteweb.com/archives/digital_urdu_new_software_improves_data_analysis_o.php News Mon, 07 Mar 2011 11:16:06 -0800 Audrey Watters
Yahoo! Experiments in Reality Mining with Bluetooth MyBlogLog Yahoo! owned MyBlogLog is stepping into dangerous waters with a new experiment in mobile presence tracking through Bluetooth.

Demonstrated at the eTech conference today, m.mybloglog.com says it allows users to: "Bind your Bluetooth address to your MyBlogLog account and discover others nearby and [sic] find out if you have any shared interests. Meetspace keeps track of time spent with others so you have a running log of people to meet and things to talk about."

]]> The new Mobile MyBloglog uses a java applet to tie your Bluetooth device to your MyBlogLog account, then polls for new activity every two minutes. In some way it's not that different from Google's Dodgeball or other mobile presence trackers. MyBlogLog is very tied into your online behavior, though, most recently relaunching with an emphasis on online lifestreaming. This new feature will let you, and Microhoo, view the recent online activities of the (participating) people you've been near lately.

Reality Mining

"Reality mining" is a phrase coined by MIT researcher Sandy Pentland, whose work we wrote about in December. Pentland is working on processing more than 350,000 hours of data collected from peoples' cell phones. Pentland's Nokia funded work is studying proximity, location and activity data using information including interactions recorded between Bluetooth devices.

Previous coverage of what Pentland is up to is worth a read on its own. Obviously he's not the only one working on passive collection of presence and activity data through the interaction of mobile devices.

The Privacy Lab That is MyBlogLog

MyBlogLog is a great laboratory for Yahoo! to experiment with behavioral tracking and personal information among early adopter crowds. There's a lot of fascinating work being done there. It sometimes borders on creepy, though, and this is one of those times.

If you've signed up for a MyBlogLog account, you've probably experienced the ambivalent feelings that can arise from on one hand being interested to see the faces of other people who read your blog or the blogs you like, but on the other hand feeling a little uneasy with your own blog reading being very public. The MyBlogLog cookie is very persistent, too. Of course this is opt-in, but how far down the rabbit hole are we going to go before that's no longer sufficient justification for new levels of tracking?

Data portability and lifestreaming online have huge potential, but once experiments like this start creeping into reality mining territory there are some gigantic privacy questions that come up. I don't know why MyBlogLog thinks it can get away with introducing this kind of service when it knows it has a shaky public image on privacy.

My first thought upon seeing this was: the internet brain implant creeps closer every day. Maybe I'm over reacting, but how often do you see people who never take their Bluetooth headsets off? This kind of tracking needs to stay as far away from the inside of my head as possible.

I have said several times that Yahoo! is pushing the envelope on data portability with MyBlogLog while the standards community sits too far towards the sidelines having a different discussion. The web, and data portability itself, need a big discussion of the privacy half of the data portability discussion. To keep track of these important discussions here's an RSS feed you can subscribe to that contains DataPortability.org discussions that contain the word "privacy" and Ask.com blogsearch results for the query: privacy AND "data portability" OR authorization. Enjoy. Here's a preview of the last few things that have come through this feed.

Recent Items in Data Portability and Privacy Feed

]]> Discuss]]>
http://www.readwriteweb.com/archives/yahoo_reality_mining.php http://www.readwriteweb.com/archives/yahoo_reality_mining.php Mon, 03 Mar 2008 21:02:09 -0800 Marshall Kirkpatrick
eBay Bets $80 Million on Personalization, Acquires Recommendation Technology Hunch Hunch-150.pngEvery ecommerce site needs to customize and personalize products for fast-moving Internet consumers. eBay is no stranger to this. In a quest to further personalize its recommendations, today eBay acquired Hunch.com. It will use the new technology to ramp up its ecommerce recommendations, including predictive merchandising, interpreting unstructured data and creating merchant insights. Personalization is a hot trend on the Internet. It is found on sites ranging from daily deals Google Offers and Groupon to social reading apps like Zite and Flipboard.

]]> Hunch focuses on machine learning, data mining and predictive modeling to make suggestions. It will enhance the eBay tool Discover, which attempts to make serendipity a regular occurrence on the site by mining shoppers' actions on eBay and social networks.

This "patented prediction technology" will be incorporated into the search function, and its advertising and marketing.

Hunch launched in 2009 as a platform for recommending things that it believed its members would like based on what they shared online. It relaunched in 2010 as an Internet personalization service with a taste-graph driven recommendation engine that recommends highly targeted personal recommendations to its users based on 20 quick questions. Hunch is now officially a part of eBay, but will keep its New York-based office, and continue to operate as its own entity.

]]> Discuss]]>
http://www.readwriteweb.com/archives/ebay_bets_80_million_on_personalization_acquires_r.php http://www.readwriteweb.com/archives/ebay_bets_80_million_on_personalization_acquires_r.php E-Commerce Mon, 21 Nov 2011 10:45:00 -0800 Alicia Eler
Web as Platform For Research on Oceans, Galaxies The University of Washington has announced two new research projects that will utilize cloud computing platforms from Internet companies such as Google, Microsoft, Amazon and IBM. According to the press release published on Genetic Engineering News, the University of Washington has won grants from the National Science Foundation to fund projects examining ocean climate simulations and analyzing astronomical images. Both of these projects will utilize cloud computing to examine and interact with "the massive datasets that are becoming more and more common in science."

]]> The University of Washington projects tie into a couple of major trends in the current era of the Web: there's now much more data being created for the Web, or being transported to the Web; and we're seeing Web technologies being used to analyze and make sense of that data.

It's not only in scientific realms. We're seeing this on the Consumer Web too, as Marshall Kirkpatrick explained this morning in an article about social media monitoring tools. He wrote that data mining tools are being democratized and used more nowadays, similar to how online publishing tools were democratized in Web 2.0. The cloud computing servers that the University of Washington will utilize are relatively cheap and easy to use Web platforms that will enable data mining on a scale not seen before. These projects will access a cloud datacenter established for educational use in 2007, through a partnership between Google, IBM and six academic institutions (including the University of Washington).

Oceans and Galaxies of Data

Bill Howe, a researcher at the UW's eScience Institute, explained the impact of cloud computing on his ocean climate simulation project. Instead of running a simulation to test a single hypothesis, he said, climate scientists are now running long-term simulations and then sifting through tens of thousands of gigabytes of resulting data to discover trends.

Andrew Connolly, a UW associate professor of astronomy, explained that for his project analyzing astronomical images, cloud computing makes it easier to store and process information in the cloud and make the information available over the Web. He said that whereas scientists once competed for time on telescopes, recorded data and then studied the individual images in detail, now "telescopes continuously record high-resolution images that are available to all, providing millions of times more information." So the shift is that the data gathering has been automated and is available on a much larger scale than before for scientists to analyze it.

Data Rich - And Useful

This current era of the Web, which some are calling 'Web 3.0' (but we frankly don't know what it's called yet) is increasingly data rich. The same thing could have been said about the Web 2.0 era, when oceans of 'User Generated Content' were created. However the world of sensors is rapidly pouring even more data onto the Web. Ed Lazowska, a UW professor of computer science and engineering, noted that "the rapid evolution of sensors is transforming all sciences from data-poor to data-rich." He said that "the challenge is to use modern cloud computing resources, such as Amazon Web Services, and modern computer science advances, such as data mining and machine learning, to explore these massive volumes of data." He claimed that this new computational science will be pervasive and will have enormous impact.

We're always pleased when the Web has a meaningful impact on the 'real world' - and particularly on science projects such as this, where the findings could be profound.

]]> Discuss]]>
http://www.readwriteweb.com/archives/web_as_platform_for_research_on_oceans_galaxies.php http://www.readwriteweb.com/archives/web_as_platform_for_research_on_oceans_galaxies.php Real World Wed, 15 Apr 2009 18:45:43 -0800 Richard MacManus
Do You Trust Google to Resist Data Mining Across Services? googlelogo6.jpgGoogle's breadth of services is truly awesome and the amount of information the company touches concerning our lives and world can sometimes feel downright frightening. While almost no one takes the old phrase "Don't Be Evil" seriously anymore now that there are billions of dollars on the table and Chinese autocrats to satisfy - regular evaluations of Google's ethical positions still seem advisable.

One of the big questions being asked with increasing frequency is this: Is Google using data it collects through particular services and using it for its benefit in other services? We know the company scans our GMail and uses the text there to sell ads, but is this a tactic being employed across services? Some people appear to believe it is.

]]> The Fears

When enterprise wiki service Socialtext announced this morning that they were folding Dan Bricklin's SocialCal (Visicalc) spreadsheet into their offerings, the announcement included this interesting customer quote:

"The timing of SocialCalc is perfect - we were in need of a wikified spreadsheet that had all of the utility of Google Docs without the datamining," remarks Brandon Stafford, Principal Engineer at GreenMountain Engineering."

We found it very interesting that a new application would specifically aim at Google's data mining as a weakness. That kind of tactic is likely to become increasingly frequent.

Similarly, when Google's Mark Lucovsky was a guest on last week's Gillmor Gang podcast, he was pressed on the question of data mining concerning the free javascript libraries that Google hosts and offers to developers. Is Google monitoring everything that goes on at the sites that use the libraries and using those observations for market intelligence such as ad sales?

You might remember that was a question people asked about MyBlogLog when Yahoo! bought the widely embedded service. Was Yahoo! using MyBlogLog to spy on AdSense and other activity unrelated to their own technology?

In Google's Defense

The information available cross-application is probably too seductive for Google, or almost any company, to pass up. The search and ad giant's saving grace may be that it has so much information in each silo already that it's uniquely satisfied not cross-pollinating.

Google's Lucovsky told Gillmor that "the Slashdot crowd" might think there's some kind of conspiracy, but that there really isn't. He assured listeners that Google only uses the information it collects from his javascript libraries to improve the service of the javascript library service. "The Slashdot crowd" is old school lingo for nonprofessional writers who post on the web but don't have a vested interest in respecting power - so they point at alleged conspiracies more often than the tamer professional press does.

Behind every alleged conspiracy at a giant company though is just a bunch of people doing their jobs. Only occasionally, we presume, do some of them come up with what would be a great idea as long as they don't get caught.

Data Portability

Some cross pollination of data from one service to another might in fact be great - if users had control over it and could use the same tactic for our own direct benefit. Until that kind of data portability policy and technology are in place, though, may of us would prefer that data remains right where it is and keeps its hands in plain sight.

Perspective

One of the first posts I wrote in my time at TechCrunch was about a Google experiment that would use your computer's microphone to track the ambient audio in a room, determine what TV shows you were watching and then serve up related ads in your browser. Presumably that program hasn't gone anywhere, snooping-obsessed researcher Shumeet Baluja has moved on to other research like monitoring video game players' behavior and psychology for ad targeting and watching how much porn people look at on their mobile phones.

Outside of Google's actions - data integrity (privacy) in hosted services has long been a concern and is now being responded to by some enterprise sales teams with boxes carrying applications locally behind customers' firewalls. As recently as the end of last year, SalesForce.com admitted that one of its employees fell for a phishing scam and handed over the key to that company's customer email accounts.

What if it wasn't wasn't an accident or an outside party though? What if data that was collected in "anonymous aggregate" proved just too juicy for personalization-hungry ad sales teams or security-obsessed government agencies. Do you trust Google to resist mining your data across the various Google services you use? Is avoiding "Google data mining" an effective selling point that would increase your consideration of products from another vendor? We expect that the answers to these questions will change over time and we think it would be wise to revisit them periodically.

]]> Discuss]]>
http://www.readwriteweb.com/archives/do_you_trust_google_to_resist_data_mining_across_services.php http://www.readwriteweb.com/archives/do_you_trust_google_to_resist_data_mining_across_services.php Analysis Tue, 10 Jun 2008 11:05:05 -0800 Marshall Kirkpatrick
Sickweather Analyzes Social Data to Map Illness Outbreaks sickweather_150.jpgThere have probably been times when just a cursory glance at your Facebook feed or Twitter stream reminds you that it's flu season and plenty of your friends' status updates referred to some sort of sneezy, snuffly, achy, barfy condition. Thanks to mobile technology, that's something you can still do while sick in bed: post to your various social networks.

For the healthy among us, these sorts of status updates serve as a good reminder of who we should steer clear of. But at a larger scale, this social data can give other warnings about where diseases clusters are occurring. And unlike the sorts of statistics released by the Center for Disease Control, this social data can be tracked in real-time.

That's the aim of a new startup called Sickweather. The company, which is still in private beta, wants to track the signs of sickness via social networks and generate maps so that people can determine who and where to avoid.

]]> Data Mining Every Sneezy Status Update

sickweather_ss.jpgSickweather wants to build a social network around this sort of information, but currently the startup is utilizing publicly available social data. By mining Facebook and Twitter for certain keywords, the company can ascertain where there are disease outbreaks.

Sickweather isn't the only company thinking about the ways in which our online data can be utilized to track illness. Earlier this week, Google noted that it was monitoring search patterns around Dengue Fever in order to track the spread of the virus.

Google says that it wants to be able to build an "early warning system" of sorts, and Sickweather's aims are similar, but usese social rather than search information. The startup insists that the publicly available data it's using right now is anonymized, and the company promises privacy protection as well. Tweeting that you're staying home from work because of a wicked cough doesn't mean that Sickweather will point to you as the vector or that it will offer any particular diagnosis about what illness you have. But taken with other people's updates, also complaining about similar symptoms, the company's algorithm will be able to pinpoint places to avoid.

The company plans to build out a number of apps for Facebook and Twitter as well as for mobile devices. Through these the company plans to offer different levels of access to data, from just being able to view generalized maps of flu outbreaks, for example, to more details about specifics and, as is the case with most social networks, to be able to share these with only certain friends and followers.

Privacy Concerns?

On one hand, Sickweather might raise some questions about medical privacy - do people want to be able to share this sort of personal information? But as the frequent Twitter and Facebook updates about illness demonstrate, people are already doing this. Sickweather hopes to be able to make better use of this information - at both an aggregate level, for certain cities for example, but also for people's own social networks.

]]> Discuss]]>
http://www.readwriteweb.com/archives/sickweather_analyzes_social_data_to_map_illness_ou.php http://www.readwriteweb.com/archives/sickweather_analyzes_social_data_to_map_illness_ou.php Health Thu, 02 Jun 2011 17:00:17 -0800 Audrey Watters
Foursquare Searching for Data Scientist - A Sign of Things to Come? foursquare_logo.jpg Foursquare has an open position for a data scientist. Specifically, the company is looking for someone with "experience with prediction or recommender systems, search and ranking algorithms, and classification algorithms." In September, Foursquare co-founder Dennis Crowley told the audience at Picnic that the company is building a recommendation engine. About Foursquare thinks this may hint at things to come from Foursquare.

]]> Data scientists are statisticians and/or computer scientists who specialize in working with large datasets. As explained here, the job of the data scientist is to obtain, scrub, explore, model and interpret data.

It's likely that Foursquare is looking for someone to turn its massive datasets culled from all those check-ins into something useful and, of course, monetizable.

Alistair Goodman wrote at Business Insider that he expects Facebook Places to win the check-in wars, but:

Mark Andreesen, an investor in Foursquare and board member of Facebook, will most likely still lead Foursquare into new areas that won't be touched by Facebook with the hopes of helping it pivot beyond the check-in. Gowalla won't be so lucky.

Getting into big data in a big way would be one way for Foursquare to build value and keep from becoming just another check-in service. As we've noted before, it won't have a whole lot of competition in the food recommendation space.

Marshall noted that, in addition to a recommendation engine, Foursquare has talked about incentivizing behavior:

In addition to recommendations, the company has long talked about incentivization of real-world behavior. Today, for example, Foursquare announced a partnership with CNN, which will give a "healthy eater" badge to anyone who checks-in at one of ten thousand farmers markets. It's unclear whether a dorky apple badge with CNN emblazoned on it is going to incentivize anyone to do anything - but it's a start and an interesting idea.

Imagine checking in at a farmer's market, then later receiving recommendations to restaurants that cook with locally-sourced food when you check-in nearby. It's got to be just a matter of time before big companies like McDonald's start incentivizing fun and Happy Meals lest we all get too many farmers market recommendations.

We've asked before what value there may be in the massive datasets generated by geotracking. If anyone can think of some novel uses for this data, please let us know in the comments (or found a start-up).

Interested in applying for the job? Here's are Foursquare's requirements:

  • MS or PhD in CS/Machine Learning or Statistics or a BS with extensive experience in the field
  • 5+ years experience as a data scientist/analyst on large datasets, or research in this area
  • Ability to work with big datasets with minimal engineering support
  • Comfortable in a small, intense and high-growth start-up environment

If you want to learn more about data science, you might want to check out the free e-book Mining of Massive Datasets from Stanford professors Anand Rajaraman and Jeffrey Ullman.

]]> Discuss]]>
http://www.readwriteweb.com/archives/foursquare_searching_for_data_scientist_-_a_sign_o.php http://www.readwriteweb.com/archives/foursquare_searching_for_data_scientist_-_a_sign_o.php Location Mon, 13 Dec 2010 16:35:00 -0800 Klint Finley
MIT Researcher Collecting Passive Social Graph Data From Cellphone Activity, Bluetooth Sandy Pentland, a researcher at MIT whose work has received funding from Nokia, is working on processing more than 350,000 hours of data collected from peoples' cell phones. More than just who calls who, Pentland is also studying proximity, location and activity data using information like interactions recorded between Bluetooth devices.

The result is a field Pentland has given the obnoxious name "reality mining."

]]>

In an interview yesterday with MIT's Technology Review (found via author Nick Carr), Pentland says that self-reporting of social connections and roles is far inferior to the kinds of analysis that can be done using passively collected data via mobile devices. While calling this data "reality" denies the importance of our hearts, minds and other parts of reality as yet imperceptible by our cell phones - it is very interesting research none the less.

This is where discussions about things like OpenID, OAuth and OpenSocial are likely to be played out. Passive mobile data will be a huge part of and will leverage your Social Graph. Once this kind of data becomes readily accesible in sophisticated ways, that could be when we'll see Telcos pressuring web services to produce standards compliant data - so they can make use of it for mobile marketing and services. Some of those services will be awesome and I anticipate them with both eagerness and caution.

Pentland predicts a future when he'll be able to use frequency of calls, physical proximity and interruptions in conversations to determine for example who among your Facebook friends is a real life friend, who you've never met in person and who is your superior in a workplace hierarchy. I see different ring tones for these different groups of people some time in the future!


Pentland also says that the data mobile devices can capture will be good for early alerting of things like epidemics (15% of the residents of an apartment building didn't go to work today - could be a problem). Using special software and already available hardware, there's a whole lot of data that can be collected - it's just a matter of figuring out how best to crunch that data.

Just Imagine the Shopping Opportunities!

Some people seem dead set on making the movie Minority Report a reality, Pentland among them. (Can we just have the interface without the mind reading, please?) Obviously the marketing opportunities that will arise from this kind of data are huge. Big, big money.

When your phone and Facebook put their heads together with your boss's Amazon wishlist - the only question that will remain is whether the birthday presents will be purchased via your phone or via your web enabled brain implant (11% of US respondents say they are somewhat or very likely to get one).

What Will the Rules Be?

Data mining is not bad. In fact, it's quite an exciting idea with a whole lot of potential. As long as it's not used to catch me thinking subversive thoughts - then let's go with it. That's not even an "if" - that's pretty much a deal breaker. Let's ignore that for just a moment, though.

Pentland articulates two good rules in his interview. First, there has to be an opt-out (or opt-in) option. Second, aggregate data needs to be anonymized and your individual data needs to be viewable by no human eyes but your own. When he says we need a "new deal" for privacy, I think that's probably a good choice of phrases.

Mobile devices are wonderful, life and world changing things. They are also the hardware for projects like Pentland's, for better or for worse.

]]> Discuss]]>
http://www.readwriteweb.com/archives/reality_mining.php http://www.readwriteweb.com/archives/reality_mining.php Mobile Fri, 21 Dec 2007 18:39:43 -0800 Marshall Kirkpatrick
Mobile Security With a Data Mining Solution: Lookout Releases API for App Stores Red Android 150x150.jpgLookout Security wants to eradicate mobile malware before it gets a chance to really take flight. That is not an easy thing to do, but as opposed to malware that plagues PCs, malicious mobile programs are still in their nascent stages. That means the security companies can stay a step ahead and today Lookout is releasing a mobile security API designed to cut mobile malware where it originates - at the point of purchase.

Lookout has made security API for app stores. The new product is called the Mobile Security API and first on board is Verizon's V Cast store for Android apps. Lookout runs the security API and its processes through a cloud, scanning every app that comes into the store against a database of 700,000 apps between iOS and Android across the world looking for abnormalities. End users enjoy the security of the API through the Mobile Threat Network that Lookout has created, checking apps in a back-end server before users download them. The thought is to squeeze the mobile malware ecosystem so tight that there is nowhere for malicious programs to gain access. In doing so, Lookout is staying ahead of the threats, a trait not often seen in security companies.

]]> Lookout believes that security should come in many layers. It is not just on a device or in the cloud, in the browser or the app store. It is layered into all of them. While he Mobile Threat Network brings security to app stores through the API, on the other end, apps are scanned when users download them. Lookout Premium for Android protects users in the browser with its "safe browsing mode."

"Our vision is to eliminate mobile malware across the world," said Kevin Mahaffey, Lookout co-founder and CTO. "There is no silver bullet to security. The vision that we have is that it is important to build security anywhere ... the API allows app stores to be proactive in stopping mobile malware."

To Lookout, mobile malware is not a matter of analyzing malicious programs. It is, in essence, a data problem. Hence, the security API for apps stores is essentially a data-mining tool used by Lookout and shared with partners such as Verizon.

"We think more like Google than a security analysis company," Mahaffey said of Lookout's approach to mobile malware. "We are building a newer type of security company that can scale with the threats."

Scale was the initial problem that security companies found with PCs. The amount of spam leading to malicious downloads became too great too fast for the security companies to keep up. Scale remains a problem in the PC ecosystem. Companies will almost always be a step behind the criminal hackers because there are too many exploits and too many botnets to keep track of all at once. Mobile security is different. Lookout recognizes that it now has an advantage and to keep that advantage it must be able to grow as the problem grows. That is what the API is about.

Yes, zero-day exploits (hacks that take advantage of an unknown vulnerability) will happen and rootkits, bootkits, premium subscription launchers and the like will wash over the mobile ecosystem from time to time. Mahaffey claims that Lookout saw an 85% increase in mobile malware from the first week of the second quarter to the last. The third quarter will probably be worse. Yet, the only serious mobile malware threats in existence now are the GGTracker and lingering derivations of DroidDream that are often caught as soon as they surface.

Can Mahaffey and Lookout along with the other security companies like AVG and Symantec rid the world of mobile malware? The short answer is no. Where there is a rich target, there is a motivation to hack it and smartphones are richer targets every day. But, for once in the history of networked devices, the security companies have the upper hand.

And they plan to keep it.

]]> Discuss]]>
http://www.readwriteweb.com/archives/mobile_security_with_a_data_mining_solution_lookou.php http://www.readwriteweb.com/archives/mobile_security_with_a_data_mining_solution_lookou.php Security Wed, 20 Jul 2011 06:01:00 -0800 Dan Rowinski
Computers Double the Number of Americans Involved in the Arts nea_logo.pngA new National Endowment of the Arts study has looked back into the data from the 2008 Survey of Public Participation in the Arts. Expanding the definition from "benchmark" activities (like going to the opera) to the creation and viewing of art or art-related content digitally has yielded a radically different picture of American's relationship to the arts.

The new definition shows a three-fold increase in the number of Americans taking part in art: from one in four to three in four.

]]>
  • The highest rates of participation via electronic media--including mobile devices and the Internet--were reported for classical music (18%), Latin music (15%), and programs about the visual and literary arts (15% each).
  • An American adult who creates or performs art is almost six times more likely to attend arts events than one who does not create or perform art.
  • In addition to reporting higher arts-attendance rates, those who receive arts education as a child are more likely to create or perform art, engage with the arts via media, and take art classes as an adult.
  • In 1982, nearly two-thirds of 18-year-olds reported taking art classes in their childhood. By 2008, that share had dropped below one-half (2.6 million), a decline of 23%.
  • Declines in childhood arts education from 1982 to 2008 are much higher among African American and Hispanic children than among white children. In that timeframe, there was a 49% drop for African Americans, and a 40% drop for Hispanic children, compared with a statistically insignificant decline for white children.
  • There are patterns related to age and generation that are significant. For example, older adults (born in 1955 or earlier) are more likely than younger Americans to be "cultural omnivores," people who attend a variety of arts events, in different art forms and settings. As these generations have aged, there have been fewer cultural omnivores; furthermore, they are now attending arts events less frequently. It is estimated that 82% of the decline in total benchmark arts activities attended between 2002 and 2008 stems from this combination.
  • Age and generation may be less important in audience outreach than previously thought.
  • This data mining has resulted in three reports: "Arts Education in America: What the declines mean for arts participation" by Nick Rabkin and E.C. Hedberg; "Beyond Attendance: A multi-modal understanding of arts participation" by Jennifer L. Novak-Leonard and Alan S. Brown; and "Age and Arts Participation: A case against demographic destiny" by Mark J. Stern.

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/computers_double_the_number_of_americans_involved.php http://www.readwriteweb.com/archives/computers_double_the_number_of_americans_involved.php Art Thu, 24 Feb 2011 15:31:00 -0800 Curt Hopkins
    Read/WriteWeb Daily The Daily is back, now that I'm over my jet lag :-)

    edgecase- Scoble: I’m not an edge case (If you listen carefully, you'll hear me whoop near the end of Scoble's excellent outburst. I've never whooped in my entire life - yet here I am carrying on like I'm on the Oprah Show...)

    - Alex Barnett's 'Edge Case' series on Flickr (caption to pic on left: "If someone calls me an edge case....")

    - Dion Hinchcliffe on Live Labs (Microsoft's think tank and incubator is indeed an interesting project -- the best part for me is that they're going to invite external people, and not just scientists either, to play a part)

    - Product Development: TV Guide will roll their own (cool - I did some analysis work on this...)

    - Rumors of a Google homepage makeover (here's a screenshot c/- Flickr... I like the look of it)

    - Google misses Street targets, shares tumble ("[this] ended the uninterrupted winning streak Google has had since its August 2004 public offering.")

    - Apple analyst predicts big things (sees "potential for new iBooks by April [...], a potential "media hub" product (and more services), new iPods into year-end (including a new media player) and even a new cell phone within a year.")

    - Ben has details of Aussie 2.0 action (Yahoo7, NewsCorp's truelocal.com.au, Fairfax - all ramping up for a Web media battle)

    - The Online Storage Gang (TechCrunch has an excellent reference and analysis piece on online storage solutions, sure to be one of the key products on the Web by the end of this year. Great to see aussie company OmniDrive as their #1 pick!)

    - Mining the Two Types of User-Supplied Content (Josh ponders the data mining efforts of Yahoo and Google)

    - Internet Explorer 7 Beta 2 Preview released ( Dave Winer says it's significant because it's "the first Microsoft release that includes comprehensive support for RSS not only on the producing side, but also on the consuming side.")

    Flickr pic by Alex Barnett

    ]]>
    http://www.readwriteweb.com/archives/readwriteweb_da_2.php http://www.readwriteweb.com/archives/readwriteweb_da_2.php Lists Tue, 31 Jan 2006 20:58:10 -0800 Richard MacManus
    [Update] Nokia Publishes Policy on Conflict Minerals mining shutterstock 150.jpg"Conflict minerals," those mined to support groups conducting armed conflict or engaging in human rights abuses, have been an issue since long before we first wrote about it in July of 2010. The mineral equivalent of blood diamonds, they include tantalum, tungsten, tin and gold, all of which are used to manufacture our electronics.

    Nokia, the world's largest manufacturers of mobile phones, today published its policy on conflict minerals.

    Update after the jump.

    ]]> "Nokia Policy Against Illegal Trade of Natural Resources"

    In a post on Nokia's "Conversations" blog, Ian Delaney lays out the company's public policy (PDF), which augments their supplier requirements.

    Delaney boils the policy down to these four elements.

    • We prohibit human rights abuses associated with the extraction, transport or trade of minerals.
    • We also prohibit any direct or indirect support to non-state armed groups or security forces that illegally control or tax mine sites, transport routes, trade points, or any upstream actors in the supply chain.
    • We have no tolerance with regard to corruption, money-laundering and bribery.
    • We require the parties in our supply chain to agree to follow the same principles.

    pit mine shutterstock.jpgThe policy delves at some length into Nokia's commitment to human rights "in accordance with accepted international conventions and practices, such as those of the United Nations' Universal Declaration of Human Rights, ILO Core Conventions on Labor Standards, UN Global Compact, and OECD Guidelines for Multinational Enterprises."

    Under the sub-heading, "Implementation of the Policy with Regards to Conflict Minerals," the document reads:

    "We prohibit human rights abuses associated with the extraction, transport or trade of minerals. We also prohibit any direct or indirect support to non-state armed groups or security forces that illegally control or tax mine sites, transport routes, trade points, or any upstream actors in the supply chain. Similarly, Nokia has a no tolerance policy with respect to corruption, money-laundering and bribery. We require the parties in our supply chain to agree to follow the same principles."

    The document outlines some of the company's process for oversight of suppliers, including the EICC-GeSI Conflict Minerals Reporting Template. It would be interesting to know how the suppliers will be reviewed, how often and what will happen to errant suppliers who use conflict minerals. We have asked Mr. Delaney exactly that and will update should we receive a response.

    Update: We received a note from Nokia's Anna Bask.

    "Nokia follows up the effectiveness of corrective actions and conducts on-site assessments as necessary. However, as stated in the article, the reality is that problems often lie upstream and not with our first tier suppliers. So as well as demanding proper due diligence from our direct suppliers to ensure that the material flows are conflict-free, we ask them to set policies and supplier requirements of their own and pass those on into the supply chain. Continued non-conformance and refusal to address issues of concern will lead to termination of business relationship." (Our bold.)

    Conflict Minerals

    Although conflict minerals could theoretically crop up anywhere, practically, East Africa is ground zero. The Democratic Republic of the Congo is certainly the worst-affected by conflict mineral mining. There, the Congolese National Army vie against three different rebel groups to extract and refine the valuable ores.

    Here is how the various minerals are used in our electronics, including mobile phones, computers and music players.

    • Tantalum: stores electricity in cell phones
    • Tungsten: creations vibrations in phones
    • Tin: circuit boards
    • Gold: used to coat wiring

    Photos courtesy of Shutterstock

    ]]> Discuss]]>
    http://www.readwriteweb.com/archives/nokia_publishes_policy_on_conflict_minerals.php http://www.readwriteweb.com/archives/nokia_publishes_policy_on_conflict_minerals.php Electronics Manufacture Fri, 03 Feb 2012 12:05:00 -0800 Curt Hopkins