A number of web service vendors now offer multi-tenanted key/value databases on a pay-as-you-go basis. Most of them meet the criteria discussed to this point, but each has unique features and varies from the general standards described thus far. Let's take a look now at particular databases, namely SimpleDB, Google AppEngine Datastore, and SQL Data Services.
Amazon: SimpleDB
SimpleDB is an attribute-oriented key/value database available on the Amazon Web Services platform. SimpleDB is still in public beta; in the meantime, users can sign up online for a "free" version -- free, that is, until you exceed your usage limits.
SimpleDB has several limitations. First, a query can only execute for a maximum of 5 seconds. Secondly, there are no data types apart from strings. Everything is stored, retrieved, and compared as a string, so date comparisons won't work unless you convert all dates to ISO8601 format. Thirdly, the maximum size of any string is limited to 1024 bytes, which limits how much text (i.e. product descriptions, etc.) you can store in a single attribute. But because the schema is dynamic and flexible, you can get around the limit by adding "ProductDescription1," "ProductDescription2," etc. The catch is that an item is limited to 256 attributes. While SimpleDB is in beta, domains can't be larger than 10 GB, and entire databases cannot exceed 1 TB.
One key feature of SimpleDB is that it uses an eventual consistency model.This consistency model is good for concurrency, but means that after you have changed an attribute for an item, those changes may not be reflected in read operations that immediately follow. While the chances of this actually happening are low, you should account for such situations. For example, you don't want to sell the last concert ticket in your event booking system to five people because your data wasn't consistent at the time of sale.
Google AppEngine Data Store
Google's AppEngine Datastore is built on BigTable, Google's internal storage system for handling structured data. In and of itself, the AppEngine Datastore is not a direct access mechanism to BigTable, but can be thought of as a simplified interface on top of BigTable.
The AppEngine Datastore supports much richer data types within items than SimpleDB, including list types, which contain collections within a single item.
You will almost certainly use this data store if you plan on building applications within the Google AppEngine. However, unlike with SimpleDB, you cannot currently interface with the AppEngine Datastore (or with BigTable) using an application outside of Google's web service platform.
Microsoft: SQL Data Services
SQL Data Services is part of the Microsoft Azure Web Services platform. The SDS service is also in beta and so is free but has limits on the size of databases. SQL Data Services is actually an application itself that sits on top of many SQL servers, which make up the underlying data storage for the SDS platform. While the underlying data stores may be relational, you don't have access to these; SDS is a key/value store, like the other platforms discussed thus far.
Microsoft seems to be alone among these three vendors in acknowledging that while key/value stores are great for scalability, they come at the great expense of data management, when compared to RDBMS. Microsoft's approach seems to be to strip to the bare bones to get the scaling and distribution mechanisms right, and then over time build up, adding features that help bridge the gap between the key/value store and relational database platform.
Outside the cloud, a number of key/value database software products exist that can be installed in-house. Almost all of these products are still young, either in alpha or beta, but most are also open source; having access to the code, you can perhaps be more aware of potential issues and limitations than you would with close-source vendors.
CouchDB
CouchDB is a free, open-source, document-oriented database. Derived from the key/value store, it uses JSON to define an item's schema. CouchDB is meant to bridge the gap between document-oriented and relational databases by allowing "views" to be dynamically created using JavaScript. These views map the document data onto a table-like structure that can be indexed and queried.
At the moment, CouchDB isn't really a distributed database. It has replication functions that allow data to be synchronized across servers, but this isn't the kind of distribution needed to build highly scalable environments. The CouchDB community, though, is no doubt working on this.
Project Voldemort
Project Voldemort is a distributed key/value database that is intended to scale horizontally across a large numbers of servers. It spawned from work done at LinkedIn and is reportedly used there for a few systems that have very high scalability requirements. Project Voldemort also uses an eventual consistency model, based on Amazon's.
Project Voldemort is very new; its website went up in only the last few weeks.
Mongo
Mongo is the database system being developed at 10gen by Geir Magnusson and Dwight Merriman (whom you may remember from DoubleClick). Like CouchDB, Mongo is a document-oriented JSON database, except that it is designed to be a true object database, rather than a pure key/value store. Originally, 10gen focused on putting together a complete web services stack; more recently, though, it has refocused mainly on the Mongo database. The beta release is scheduled for mid-February.
Drizzle
Drizzle can be thought of as a counter-approach to the problems that key/value stores are meant to solve. Drizzle began life as a spin-off of the MySQL (6.0) relational database. Over the last few months, its developers have removed a host of non-core features (including views, triggers, prepared statements, stored procedures, query cache, ACL, and a number of data types), with the aim of creating a leaner, simpler, faster database system. Drizzle can still store relational data; as Brian Aker of MySQL/Sun puts it, "There is no reason to throw out the baby with the bath water." The aim is to build a semi-relational database platform tailored to web- and cloud-based apps running on systems with 16 cores or more.
Ultimately, there are four reasons why you would choose a non-relational key/value database platform for your application:
But in making your decision, remember the database's limitations and the risks you face by branching off the relational path.
For all other requirements, you are probably best off with the good old RDBMS. So, is the relational database doomed? Clearly not. Well, not yet at least.
Top image by Tim Morgan
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/10240