ReadWriteWeb

data mining

10 result(s) displayed (11 - 20 of 38):

Searching for Sadness in New York: Is the Foursquare API Living Up to Its Potential?

By Klint Finley / February 28, 2011 6:35 PM / View Comments

As explained in this blog post, Foursquare needed a way for its business staff to run reports based on its data without slowing down production servers and without learning technologies such as Scala and MongoDB. The company decided to make its data available to business staff through a Hadoop cluster hosted by Amazon Web Services. Foursquare's data miners could then query it using Hive, which provides a SQL-like query language for Hadoop.

As a proof-of-concept the company has produced a report on the rudest cities in the world, based on the number of tips that contain profanity. Which is pretty cool (apart from the assumption that profanity use = rudeness). But it makes me realize just how under-utilized geolocation APIs are.

Pattern: A Bundle of Data Mining Modules for Python

By Klint Finley / February 24, 2011 6:15 PM / View Comments

Pattern is a collection of open source (BSD license) web mining modules for Python from the Computational Linguistics and Psycholinguistics Research Center. It contains tools for data retrieval, text analysis and data visualization and comes with over 30 sample scripts.

Google, Eyebeam and What We Pay For Team Up to Sponsor Data Visualization Contest

By Klint Finley / February 22, 2011 8:00 PM / View Comments

What We Pay For logo What We Pay For is a simple website that lets you enter your income and filing status and find out how your tax dollars are being spent. It breaks down the amount you're likely to pay in taxes by various spending categories, such as Social Security, national defense and Medicare.

The developers behind What We Pay For have released an API for the service, and Google and the non-profit organization Eyebeam are sponsoring a contest for visualizations based on the site's data.

Enterprise Startup Spotlight: Revolution Analytics, Taking on SAS, SPSS

By Klint Finley / February 17, 2011 5:30 PM / View Comments

Revolution Analytics Revolution Analytics is a company that provides commercial support for the open source statistical programming language R. Its flagship product is Revolution R for Enterprise, a distribution of R that competes with other commercial statistical products such as SAS and SPSS. Revolution CEO Norman H. Nie was the co-inventor of SPSS.

How Apple Uses Hadoop: UX Analysis, iAds and More

By Klint Finley / February 14, 2011 11:05 AM / View Comments

A job posting from Apple reveals the company is using or will use Hadoop for its iAds system. The job listing is for a "Senior Software Engineer - Hadoop" with experience in MapReduce, Hive and either HBase or Cassandra. Oozie and Flume are also mentioned. The ad was first spotted by The Register, and has since been removed from Apple's website. However, searching through Apple's job listing reveals other places that Hadoop may be in use, including improving the iOS experience.

Social? Really? Folks - It's About the Data

By Alex Williams / February 9, 2011 10:45 AM / View Comments

Social Data VisualizationThe question I always come back to when I hear the term Enterprise 2.0 is one that I think my buddy Dennis Howlett would ask. I mean, who gives a flying trombone? That's not really how Dennis would say it. I will let him express himself in his own words about why anyone in their right mind would pay for anything with a high price tag that has a big fat social label on it.

Here's what gets me. We get so wrapped up about collaboration concepts and the nuances of social. In this upside down world, social is a term that is more commonly use to describe enterprise architecture than it is about sharing a beer with your mates.

API of the Week: Data Source Handbook

By Klint Finley / January 31, 2011 8:00 PM / View Comments

Data Source Handbook thumbnail This week, instead of a single API we're spotlighting ReadWriteWeb contributor Pete Warden's new e-book Data Source Handbook, which was just released today. Pete covers a slew of data sources including, of course, many APIs.

"These are hand-picked services that I've actually spent time using during my own work," Pete writes. "And I chose them because they add insights and information to data you're already likely to be dealing with."

He's made a list of services and a couple excerpts available here.

How FluidDB Built an API for Boing Boing in an Evening

By Klint Finley / January 27, 2011 7:45 PM / View Comments

FluidInfo logo 150x150 This week, Boing Boing posted its entire 11 year archive (63,999 posts) in XML format. But Nicholas H.Tollervey from FluidDB wanted more." XML is good, but having a searchable database of posts is better," he writes on the FluidDB blog. So he ported Boing Boing's XML archive into FluidDB.

"Because of FluidDB's open nature anyone can now make use of boingboing's data via a few simple and easy to construct RESTful calls to FluidDB," he writes. In other words, FluidDB is hosting a Boing Boing API. For free.

The cool thing - apart from being able to use FluidDB to mine BB for interesting data - is that you can do this yourself with your own blog.

Conduct Social Media Sentiment Analysis Research with SMART@zmeb

By Klint Finley / January 26, 2011 11:00 AM / View Comments

smiley faces by kirinqueen Yesterday SMART@znmeb (SMART stands for "social media analytics research toolkit"), a SUSE Linux appliance created by Ed Borasky, added sentiment analysis to its set of features. The toolkit now includes texttir, a sentiment analysis package created in the statistical programming language R. SMART@znmeb includes other open source tools that include data mining, dashboarding and data visualization.

Borasky says textir is the first open source sentiment analysis library he's found that he thinks may actually work. "Most of the vendors sell a sentiment analysis tool of some kind or another, and the customers that have tested multiple tools spend a lot of time trying to figure out why they give different answers," he says. He also cautions that sentiment analysis is vulnerable to spam and other gaming tactics and requires a large investment in hardware.

Data Mining and Taco Bell Programming

By Klint Finley / January 22, 2011 2:00 PM / View Comments

Taco Bell logo Programmer Ted Dziuba suggests an alternative to traditional program that he called "Taco Bell Programming." The Taco Bell chain creates multiple menu items from about eight different ingredients. Dziuba wants to be able to be able to create many applications with combinations of about eight different shell commands.

Previous 1 2 3 4 Next

Movable Type search results powered by Fast Search

RWW SPONSORS



ReadWriteCloud - Sponsored by VMware and Intel






RWW PARTNERS