<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2008://1.6350-</id>
  <updated>2009-10-30T14:11:59Z</updated>
  <title>Comments for Database Analytics Startup Aster Data Launches, Analyzes MySpace</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2008://1.6350</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=6350" title="Database Analytics Startup Aster Data Launches, Analyzes MySpace" />
    <published>2008-05-20T07:01:01Z</published>
    <updated>2008-05-20T04:15:43Z</updated>
    <title>Database Analytics Startup Aster Data Launches, Analyzes MySpace</title>
    <summary>Grid-computing startup Aster Data Systems will officially launch today, three years after it was founded. Aster, which began in the Ph.D program at Standford, is a provider of &quot;massively parallel processing databases&quot; for organizations that have mammoth quantities of data that need to be stored and analyzed quickly. The Redwood City, California-based company is backed...</summary>
    <author>
      <name>Josh Catone</name>
      <uri>http://www.readwriteweb.com/</uri>
    </author>
    
    <category term="Products" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img border="0" src="http://www.readwriteweb.com/images/aster-logo.jpg" width="95" height="80" />Grid-computing startup <a href="http://www.asterdata.com/">Aster Data Systems</a> will officially launch today, three years after it was founded.  Aster, which began in the Ph.D program at Standford, is a provider of "massively parallel processing databases" for organizations that have mammoth quantities of data that need to be stored and analyzed quickly.  The Redwood City, California-based company is backed by Sequoia Capital, Cambrian Ventures, and First-Round Capital.</p>]]>
      <![CDATA[<p>Aster's nCluster software allows companies with large amounts of data to store it on commodity hardware and scale with one-click, adding new servers as the data set grows.  The company's first major client is MySpace, which generates 100s of terabytes of traffic data from its 110 million monthly unique users.  Mining that data to understand how customers use and interact with the site requires some pretty robust architecture.</p>

<p>Aster's solution for MySpace uses a 100-node cluster of off-the-shelf commodity servers that can capture and load 100% of the data and run complex queries quickly.  "MySpace needed to analyze complete datasets - not just samples or summaries. Sampling would completely miss infrequently occurring but highly profitable patterns," according to Aster, which says that nCluster has allowed MySpace to work with all of its terabytes of data and avoid the need to sample.</p>

<p><img border="0" src="http://www.readwriteweb.com/images/aster-servers.jpg" width="600" height="304" /></p></p>

<p>nCluster works by splitting up the cloud into smaller bits that each have a specific task.  "Loader" nodes load data from external sources (and export to them), while "worker" boxes keep data stored on local disks. A "queen" layer directs the entire operation intelligently routing queries to the proper node.  The "loader" tier can scale independently as needed, say Aster.  "This enables query load-balancing to eliminate hot-spots and increase performance, returning results in seconds or minutes versus hours or 'did not finish,'" writes the company in a case study.</p>

<p>The software reminds me of 3Tera's <a href="http://www.3tera.com/AppLogic/">AppLogic</a> (<a href="http://www.readwriteweb.com/archives/3tera_utility_computing.php">our coverage</a>), which is a grid computing operating system that makes it easier for companies to deploy their own compute cloud on commodity hardware.  nCluster is essentially the same idea, but with an eye specifically toward managing and querying massive databases.</p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2008://1.6350-comment:55422</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2008://1.6350" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php#c55422" />
    <title>Comment from Falafulu Fisi on 2008-05-20</title>
    <author>
        <name>Falafulu Fisi</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Excellent product and well done to the founders for developing it. Massive Data analytics of today is mostly done off-line and not on-line ie, to find or extract important <a href="http://en.wikipedia.org/wiki/Feature_extraction" rel="nofollow">features</a>  from the data. Parallel processing has an advantage in doing live (on-line) query of the stored <i>features</i> (which were calculated off-line) from the database(s). This is what Google is doing. Feature extractions is done off-line and indexed, which is the slowest part. Live query is done using on the feature database , not on the original database, but features do point to IDs in the original database for retrieval. </p>

<p>I would say that the top analytics of today that target the enterprises are SPSS, SAS, Oracle, KXEN. Aster Data Systems must battle these established data analytic vendors in that  market. </p>

<p>I know (not personally) a few personnels who are top of the ladder in the domain analytic R&D (ie, machine learning, data-mining, etc) that consult to those  vendors mentioned above, such as <a href="http://math.cofc.edu/langvillea/" rel="nofollow">Prof. Amy Lanville</a> (she is consulting to SAS), where her algorithm for solving the PageRank was adopted by Google in 2003. Langville's algorithm solves the PageRank faster than the one that Google used. PageRank can be solved using a variety of linear algebra algorithms, but Langville's algorithm was faster the the Google's own PageRank solver in 2003. </p>

<p>Another one is <a href="http://en.wikipedia.org/wiki/Vladimir_Vapnik" rel="nofollow">Prof. Vladimir Vapnik</a> (Consulting to KXEN), who was a pioneer in the development of the popular <a href="http://en.wikipedia.org/wiki/Support_vector_machine" rel="nofollow">SVM</a> (Support Vector Machine), which is adopted in most state-of-art software of today, from speech processing, image processing, online recommendation engine, DNA sequencing, text-mining, and more... There are more top notch researchers in analytics than the 2 names I have mentioned where they are consulting to those leading vendors in enterprise analytics. </p>

<p>I have read some of the publications of those researchers   and I think that Aster Data Systems must stepped up in order to compete with these well-established analytic vendors. </p>]]>
    </content>
    <published>2008-05-20T12:17:28Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2008://1.6350-comment:56030</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2008://1.6350" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/database_analytics_startup_aster.php#c56030" />
    <title>Comment from kaz on 2008-05-26</title>
    <author>
        <name>kaz</name>
        <uri></uri>
    </author>
    <content type="html" xml:lang="en" xml:base="">
        <![CDATA[<p>Good article and good comment.</p>]]>
    </content>
    <published>2008-05-26T07:19:14Z</published>
  </entry>

</feed>