Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database".
If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database.
Relational databases have been around for over 30 years. During this time, several so-called revolutions flared up briefly, all of which were supposed to spell the end of the relational database. All of those revolutions fizzled out, of course, and none even made a dent in the dominance of relational databases.
A relational database is essentially a group of tables (entities). Tables are made up of columns and rows (tuples). Those tables have constraints, and relationships are defined between them. Relational databases are queried using SQL, and result sets are produced from queries that access data from one or more tables. Multiple tables being accessed in a single query are "joined" together, typically by a criterion defined in the table relationship columns. Normalization is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.
Relational databases are facilitated through Relational Database Management Systems (RDBMS). Almost all database systems we use today are RDBMS, including those of Oracle, SQL Server, MySQL, Sybase, DB2, TeraData, and so on.
The reasons for the dominance of relational databases are not trivial. They have continually offered the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility in managing generic data.
However, to offer all of this, relational databases have to be incredibly complex internally. For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the "execution plan" that best answers our requests by using things like cost-based algorithms.
Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. This has not been much of a problem so far because the universal dominance of RDBMS has outweighed the need to push any of these boundaries. Nonetheless, if you really had a need that couldn't be answered by a generic relational database, alternatives have always been around to fill those niches.
Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.
Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.
For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that come with relational databases.
These efforts, combined with those of existing niche vendors, have led to the rise of a new breed of database management system.
Next page: The New Breed
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/10240