<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
      xmlns:thr="http://purl.org/syndication/thread/1.0">
  <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php" />
  <link rel="self" type="application/atom+xml" href="http://www.readwriteweb.com/atom.xml" />
  <id>tag:,2009:/1/tag:www.readwriteweb.com,2008://1.12692-</id>
  <updated>2009-10-30T13:11:30Z</updated>
  <title>Comments for Amazon Web Services Seeks Public Data Sets</title>
  
  <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.23-en</generator>
  <entry>
    <id>tag:www.readwriteweb.com,2008://1.12692</id>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.readwriteweb.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=1/entry_id=12692" title="Amazon Web Services Seeks Public Data Sets" />
    <published>2008-11-24T07:19:57Z</published>
    <updated>2008-11-24T20:46:09Z</updated>
    <title>Amazon Web Services Seeks Public Data Sets</title>
    <summary>Amazon Web Services Seeks Public Data Sets</summary>
    <author>
      <name>Lidija Davis</name>
      
    </author>
    
    <category term="Amazon" />
    
    <category term="Features" />
    
    <category term="NYT" />
    
    <content type="html" xml:lang="en" xml:base="http://www.readwriteweb.com/">
      <![CDATA[<p><img alt="aws" src="http://www.readwriteweb.com/aws_nov_08.jpg" width="201" height="95" />Amazon is turning to the public for help, asking for public data sets in an attempt to create a cloud data service that provides what they describe as a "convenient way to share, access, and use public data."</p>

<p><font style="float: right; margin-left: 10px;"><script type="text/javascript">digg_url = 'http://digg.com/tech_news/Amazon_Web_Services_Seeks_Public_Data_Sets';digg_bgcolor = '#ffffff';digg_skin = 'normal';</script><script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></font>Called <a href="http://aws.amazon.com/publicdatasets/">AWS Hosted Public Data Sets</a>, the service will enable you to use public data within your Amazon EC2 environment.  Select public data sets will be hosted on AWS for free as an Amazon EBS snapshot.</p>]]>
      <![CDATA[<p>While there are publicly available data sets, accessing them can be expensive and tedious. For instance, the <a href="http://www.gutenberg.org">Gutenberg Project</a> offers its <a href="http://www.gutenberg.org/wiki/Gutenberg:Information_About_Robot_Access_to_our_Pages#Getting_All_EBook_Files">eBooks files as a download</a>, but to get a copy you can expect to wait 48 hours for the download to be complete (based on DSL 1MBit/s and a 14.5 GB zip file). If you want the mp3, you'll have a nine day wait to download the 91.5GB file.</p>

<p>However, as there is no indication that the Gutenberg Project will be added to AWS, we've calculated how long it would take to download and upload the 80GB UGI Virtual Conformer Library, one of the listed data sets AWS plans to host.</p>

<p>Using a residential cable provider in California, it would take 22 hours 36 minutes to download, and 3 days 36 minutes to upload to a server in the same state.  However, if the server was in New York and we accessed it from California, it would take 3 days 42 minutes to download, and 7 days 14 hours to upload.  Clearly inefficient.</p>

<p><a href="http://groups.google.com/group/get-theinfo/browse_thread/thread/79e5b1159e533d52?pli=1">People have been searching for better ways to access public data sets</a> for some time, and AWS Hosted Data Sets may just be the answer they've been looking for; allowing anyone to do the type of computing that in the past has been limited to large organizations with lots of money.</p>

<p>Current data sets that Amazon are working on include: annotated Human Genome data, PubChem and UGI Virtual Conformer libraries, the U.S. Census, various labor statistics, and various economic and transportation databases.<br />
 <br />
AWS will continue to add to the collection over time, and this is where you come in.</p>

<p>If you have a public data set and hold the rights to the distribution of it, you can submit a request on the <a href="http://aws.amazon.com/publicdatasets/">AWS Public Hosted Data Sets site</a> to have it included.</p>

<p>This is huge.</p>]]>
    </content>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2008://1.12692-comment:117790</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2008://1.12692" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php#c117790" />
    <title>Comment from Walker Hamilton on 2008-11-24</title>
    <author>
        <name>Walker Hamilton</name>
        <uri>http://walkerhamilton.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://walkerhamilton.com">
        <![CDATA[<p><a href="http://manybooks.net" rel="nofollow"><a href="http://manybooks.net" rel="nofollow">http://manybooks.net</a></a> is essentially a nicer interface for the Gutenberg Project (plus other open books the webmaster manages to dig up) an, from what I understand, he's caching any of the format-specific generated books to S3.</p>

<p>His service offer a whole slew of formats for each book, but he only generates it the first time someone actually requests that book in that specific format. He caches that format to S3 and serves it from there, from then on.</p>]]>
    </content>
    <published>2008-11-24T14:34:05Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2008://1.12692-comment:117812</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2008://1.12692" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php#c117812" />
    <title>Comment from michael.chelen.myopenid.com on 2008-11-24</title>
    <author>
        <name>michael.chelen.myopenid.com</name>
        <uri>http://friendfeed.com/ffs</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://friendfeed.com/ffs">
        <![CDATA[<p>It is a wonderful new resource, especially for fans and users of  EC2. If Amazon could develop a good process for submitting and data sets, that would open the service to even broader use.</p>]]>
    </content>
    <published>2008-11-24T15:44:04Z</published>
  </entry>

  <entry>
    <id>tag:www.readwriteweb.com,2008://1.12692-comment:117841</id>
    <thr:in-reply-to ref="tag:www.readwriteweb.com,2008://1.12692" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php"/>
    <link rel="alternate" type="text/html" href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php#c117841" />
    <title>Comment from Maureen Flynn-Burhoe on 2008-11-24</title>
    <author>
        <name>Maureen Flynn-Burhoe</name>
        <uri>http://oceanflynn.wordpress.com</uri>
    </author>
    <content type="html" xml:lang="en" xml:base="http://oceanflynn.wordpress.com">
        <![CDATA[<p>Downloading electronic copies of complete digital books through the Gutenberg Project is instantaneous (it never takes more than a few seconds) and is completely free. You can choose formats including just a basic text file which makes searches within the text simple and efficient. </p>

<p>So obviously you are describing something much more elaborate here regarding smarter technologies to search, access and distribute data that is now not easily available in deep web databases? </p>

<p>In the past Amazon on-line services had limitations based on whether or not users made on-line Amazon purchases. Is this similar? </p>]]>
    </content>
    <published>2008-11-24T18:07:17Z</published>
  </entry>

</feed>