You may not realize it, but the data analytics market is buzzing. There are new vendors emerging, new products popping up, new deals being done, and several new strategies being pursued. Vendors are predominately chasing big data, with battles lines being drawn by solution providers that cater to between roughly 100 TB and 10 PB data sets. The battle was inevitable because the world is producing data at a phenomenal rate, and we have an increasing need to analyze them within shorter time frames. In this post we analyze one of these vendors, Kickfire.
Yet while the big names in town are capturing the headlines, in reality only a small percentage of businesses today need to be able to analyze petabytes of data. Today, the rest of us are more likely to deal with analytic data sets in the 50 GB to 3 TB range.
Kickfire is interesting because it has decided to let the other vendors fight it out for the massive data volumes. Instead, it has focused on a relatively untapped segment: the MySQL database market or, more correctly, the market that MySQL serves.
The bulk of MySQL installs are for Web 2.0 and web-related applications (i.e. applications based on the LAMP stack), and these applications usually aren't set up to manage industrial-sized data sets. Instead, they often have gigabytes or a few terabytes of data, but analyzing that data is just as important to their owners. However, like many transaction-oriented databases, MySQL doesn't perform very well when you run analytics-style queries, even on mid-sized data sets. Customers often find that running complex ad-hoc queries that aggregate data across many rows is very time-consuming, and the lack of certain features, such as query parallelism, diminishes MySQL's appeal.
Kickfire's solution is to use MySQL as the base, because this gives its customers the ability to easily migrate to Kickfire but replace MySQL's storage engine with their own column store engine. Under the covers, the column store structures data based on the columns in a table, rather than the traditional method based on rows in a table. This structure has been found to achieve better compression and better ad-hoc query performance because only the columns being queried -- not all of the rows -- need to be scanned. The column store is also used by Vertica and was popularized by its founder, the well-known database researcher Michael Stonebraker.
But Kickfire doesn't end there. It goes one step further by adding a proprietary "SQL Chip" co-processor to further enhance its product's performance. Kickfire has replaced the MySQL query optimizer (the component that takes an SQL statement and splits it into a series of operators for processing) to produce operators that can be sent directly to its SQL Chip for processing. So, rather than running these operators on a general-purpose CPU, which has to convert them into a series of regular CPU instructions and then muck around loading the data into registers from memory, the optimizer instead sends them to the SQL Chip, which natively understands them and processes them on data streamed directly from memory.
Kickfire's solution is bundled as a data warehouse "appliance," which is made up of two physical servers: one conventional server running MySQL 5.1, and the other connected via PCIe, which is used to offload processing to the SQL Chip. The underlying capabilities that Kickfire adds remain largely transparent in terms of the user's interaction via SQL code, because Kickfire hasn't changed the MySQL syntax that its customers are already familiar with.
Page 2: Is the Performance Advantage Real?