ReadWriteEnterprise

Is the Relational Database Doomed?


This post is part of our ReadWriteEnterprise channel, which is a resource and guide for IT managers and technologists in the Enterprise. The channel is sponsored by Intel. As you're exploring solutions for your enterprise, check out this helpful resource from our sponsors: All New 2010 Intel Core vPro Processors and Microsoft Office 2010: Your Best Choice for Business PCs

Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database".

If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database.

Relational databases have been around for over 30 years. During this time, several so-called revolutions flared up briefly, all of which were supposed to spell the end of the relational database. All of those revolutions fizzled out, of course, and none even made a dent in the dominance of relational databases.

First, Some Background

A relational database is essentially a group of tables (entities). Tables are made up of columns and rows (tuples). Those tables have constraints, and relationships are defined between them. Relational databases are queried using SQL, and result sets are produced from queries that access data from one or more tables. Multiple tables being accessed in a single query are "joined" together, typically by a criterion defined in the table relationship columns. Normalization is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.

Relational databases are facilitated through Relational Database Management Systems (RDBMS). Almost all database systems we use today are RDBMS, including those of Oracle, SQL Server, MySQL, Sybase, DB2, TeraData, and so on.

The reasons for the dominance of relational databases are not trivial. They have continually offered the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility in managing generic data.

However, to offer all of this, relational databases have to be incredibly complex internally. For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the "execution plan" that best answers our requests by using things like cost-based algorithms.

The Problem with Relational Databases

Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. This has not been much of a problem so far because the universal dominance of RDBMS has outweighed the need to push any of these boundaries. Nonetheless, if you really had a need that couldn't be answered by a generic relational database, alternatives have always been around to fill those niches.

Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.

Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that come with relational databases.

These efforts, combined with those of existing niche vendors, have led to the rise of a new breed of database management system.

Next page: The New Breed

Page:  1   2   3  Next  »




1 TrackBacks

TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/10240

Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteEnterprise posts

  1. First Post!
    Anyways, if you ask me, which didn't so I'll tell you anyways, traditional relation databases are doomed because they can't compete with the more flexible and aesthetically pleasing alternatives like Hierarchical models. In the future it won't be surprising if we see the explosion of wiki-based Hierarchical Systems governing the data from sources like, would you believe it, Wikipedia.
    If any of this doesn't make sense, don't worry it shouldn't. I just BSed all of it :D

    Posted by: br0ck0bama | February 12, 2009 3:28 PM



  2. There is also a new crop of databases called "graph databases" gaining traction (with a model based on nodes, relationships and properties), one of them being the open-source neo4j (http://neo4j.org).

    Using graphs to structure information is very powerful and intuitive.

    Posted by: Marc | February 12, 2009 3:41 PM



  3. While I don't believe relational databases are dead, their use in places that a big key/value pair table is more efficient is nearing its end.

    Relational databases are excellent at storing and querying structured, heavily relational data. They are likely to remain present to fill that need for the foreseeable future.

    But many applications tend to rely on structured or semi-structured data, things that developers have typically held in relational databases because they understood them. As we move more toward a rapid development model, the structure enforced by SQL and issues with scaling have made people look for alternatives... And rightly so.

    If you want to do data mining, a distributed cache won't get you there. But for random access data, the semi-structured, distributed model is easily preferred. Plus, data can always be pulled out into SQL or OLAP for those data mining purposes.

    Posted by: Andre de Cavaignac | February 12, 2009 3:45 PM



  4. I think it is based on the problem you are trying to solve. And address book does not need something like BigTable. But loosely structure data like stock market data or search data, again, something like BigTable makes a lot of sense.

    I do not think RDBMs are done for, but certainly there prominence will decline as they are not suitable solutions to all of the problems we are trying to solve.

    Posted by: Fabian Schonholz | February 12, 2009 4:07 PM



  5. ummmm....

    kinda not smart; and this meme has been played out in the tech industry before...

    here's the deal: if you use a "key" in any way to access data, it's a relationship, aka Relational Database.

    even if it resembles more of a pure hash table with the database holding abstract objects (hashtables/hashmaps/hashsets/etc) it's the same thing.

    isn't this like some newbe's brain going off and saying, "Wouldn't it be cool if table names were like, domain names!!!"

    Posted by: lemon obrien | February 12, 2009 4:26 PM



  6. The RDBMS isn't doomed, its just getting a primacy downgrade within the "data access & data management" value pyramid.

    You list a number of alternatives to RDBMS technology, but none of your alternatives offer a net gain relative to the RDBMS because they don't hone into the Achilles heel of the RDBMS.

    Bottom line, this is the Logical Model based RDBMS vs Conceptual Model based alternatives (all of which are Entity-Attribute-Value + Relationships & Classes (EAV/CR) graph model based).

    To usurp the RDBMS you need a scalable EAV/CR solution that leverages platform independent pointers, so that you can point to Entities, Attributes, Values, Relationships, and Classes across DBMS, Application, Operating System, and Network boundaries (the "holy grail" of data access and data management).

    The only technologies in the DBMS realm that offer that today are those you will encounter in the "Linked Data" realm i.e. RDF (an EAV/CR) Quad Stores that support dereferencable HTTP based Entity IDs (pointers).

    If a DBMS technology possesses what I describe above, it is than able to deliver the following:

    1. Holistically mesh disparate data across a variety of line-of-business applications (i.e., solve age-old data integration challenges)

    2. Smart ETL (data is always dirty)

    3. Construct private, public, or hybrid Linked Data Spaces
    that are navigable using Conceptual level languages such as SPARQL or Entity-SQL.

    Again, you cannot offer net value over an RDBMS if you don't address the problems above without compromising performance, scalability, and security. Object and Object-Relational DBMS technologies of yore, tried and failed.

    For live examples of EAV/CR + HTTP based Identifiers, see:

    1. http://dbpedia.org/resource/DBpedia

    2. http://lod.openlinksw.com/fct/facet.vsp - More than a Billion EAV/CR records (and counting) all endowed with HTTP based Identifiers (URIs)

    Anyway, I wrote a detailed post about this issue two weeks ago, so this reply if fundamentally an extension of the developing discourse around this vita mater :-)

    Links:
    1. http://bit.ly/euz2O - The Time for RDBMS Primacy Downgrade is Nigh!

    Posted by: Kingsley Idehen | February 12, 2009 4:27 PM



  7. Great to bring visibility to the non-RDBMS space, thanks.

    Like everything, general solutions break down in specific narrow cases. RDBMS' are great, but if your narrow case is important enough (scale, performance, management) then you might want to roll your own solution.

    We did.

    I once posited that only 5% of webapps needed a RDBMS. I don't know if that's true, but I just don't see sites requiring so many adhoc queries. Who gives their users SQL queries? And the larger sites who shard? I don't understand at all: they loose so many relational-db benefits, but gain a world of hurt.

    Lots of new opportunities out there, folks. Keep typing.


    Posted by: Israel L'Heureux | February 12, 2009 4:27 PM



  8. When you are at close to 100%, there is no place to go but down. Some applications need ACID so they will stay relational. Natural inertia will keep most apps relational for the next 10+ yrs.

    The non-relational competitors have a long way to go before they match the feature set of relational DBMS's. However, the size of the non-relational apps will make their importance soar far sooner. You don't need 100K Facebooks but there are well over 100K hotel property managements systems.

    Posted by: Alvin Wang | February 12, 2009 4:31 PM



  9. Hi, thanks for your comments. I agree with what I have seen so far, while they are both forms of databases relational still continue to have their place however for a growing list of requirements they can be unnecessary or an overkill. An if you don’t need many of the benefits that a traditional relational DB provides then you open up to a bunch of cost-effective, highly scalable alternatives. The concern, which I think is key and valid, is while you may be able to do away with many of the more complex query operators (joins, aggregations, functions, range scans etc) initially you also have to be pretty sure that this won’t change over time as it is very hard to re-create this type of functionality.

    Posted by: Tony Bain | February 12, 2009 4:34 PM



  10. We have created a monster, not only in the storage engines that underlie the relational databases, but in the very development model and architecture of modern web apps.

    The industry has invested heavily in a model that almost always results in a scaling problem:

    1) create a simple web app apache / mysql on a cheap host

    2) get written up on RRW or TechCrunch

    3) Crash

    4) try to scale, crash

    5) write blog posts about scaling

    It's the storage engines that have made partitioning a devil's deal. The tables are not the essential evil. Almost all open source RDMS have a RAID volume as a central spinning storage, maybe with seperate log and transaction volumes.

    The next step to clustering or parallelizing these monsters is a black art within a black art, and almost all of the solutions are prone to write saturation. Each set of multi-machine solutions to the typical MySql problems creates a monster set of problems in other areas.

    I am keen student of the limits of Web scaling, as I used to craft data storage engines for fast laboratory data capture - of course that was a long time ago, but I remain an informed, fascinated outsider.

    I only hope that some smart folks will provide our contemporary workforce (weaned on n tier architectures with a RDBMS at the ragged end), with a new storage manager that allows the illusion of columns and rows, but transparently or simply allow partitioning of data across broader, simpler, fault tolerant systems that do not have to be RAID.

    my 2 cents.

    Posted by: Alan Wilensky | February 12, 2009 5:03 PM



  11. IMHO, (technical considerations aside) - Most companies would be shooting themselves in the foot by implementing key/value database system. People who understand how to structure data, store it, and query it -- are are different breed then most programmers developing current web applications.

    Even with the structure that a RDBMS provides; developers are often reluctant to understand and take the time necessary to organize/query data properly within a database. Now, some of these companies would be trying driving a car without traction control, on ice....good luck!.

    Posted by: omar | February 12, 2009 5:33 PM



  12. The issue to me is simple: general purpose relational databases are good for general purposes. But for special purposes (e.g., warehousing, OLTP, XML, streams, scientific data, metadata) a special-purpose database will beat the general purpose one every time.

    Which in turn begs the question: are there any general purposes or after you've optimized all the special ones is there any need for a general-purpose DBMS?

    Posted by: Dave Kellogg | February 12, 2009 5:37 PM



  13. If you don't design a system to scale, it won't.

    Blaming scalability issues on the relational database model makes about as much sense as blaming the rock solid mathematics underlying it - relational algebra.

    I have been a DBA for more than a decade and I have seen my share of horror stories. Not one of these circumstances were the result of technology.

    Usually they were PEBKAC issues or ID-10-T errors...

    Posted by: Chuck Boyce | February 12, 2009 7:26 PM



  14. What is being discussed is scaling to volume. yes rdbms scales badly. however there is other scaling that is much greater problem : scaling to complexity. here rdbms is hopeless, vast number of tables have to be created and this is solidified upfront. also a problem with rdbms is everybody has to put their 2C worth and have a different way of storing the same things, rdbms does not help with standardization. thus millions of people toil developing their dbs all different no science of information emerges.

    the future is in semantic databases and entirely different architectures. We at thoughtexpress have spent 10 years researching the problem of complexity and are convinced that rdbms is in fact the root cause of unnecessary complexity found in modern it systems.

    Posted by: pawel lubczonok | February 12, 2009 7:59 PM



  15. I have to disagree. I don't think that RDB is going any time soon. Sure, we will find better alternatives but for a while...

    Posted by: regac | February 12, 2009 10:18 PM



  16. This is all very fascinating.
    At this time the disadvantages of the key/value DB seem to overtake its advantages. I would be very hesitant to develop any application based on it in such a non-mature stage of the methodology.
    However, I think a good way to start may be to implement the key/value methodology on an existing RDBMS infrastructure. Then you get all the RDBMS benefits (transactions, replication, fail-over etc.) and you get to test what it is like to let the application handling joins and data integrity. This process may make the migration more gradual, and whenever you feel secure enough, at least you have the data structured in a way that enables exporting out of the RDBMS and importing into the "real" key/value system.

    Posted by: Shahar Solomianik | February 13, 2009 3:00 AM



  17. The best database currently available
    is the human brain (well.. most of the time :o)
    It is associative/ relational. It is amazing..
    Regards
    Ted.

    Posted by: tedvg | February 13, 2009 3:54 AM



  18. Funny to see that we're going back to the future... Hierarchical databases as a replacement for relational databases.
    In my opinion the database implementation is getting less and less important, but the ability to view loosely coupled distributed data as consistent whole. We need to be able to treat the internet as a database. Something REST and query engine like YQL are enabling. More here: http://www.andrejkoelewijn.com/wp/2009/02/12/rest-is-a-distributed-data-model/ (REST is a distributed data model)

    Posted by: andk | February 13, 2009 5:20 AM



  19. This makes it seem like non-relational databases are something new. Products like Caché have been around for a long time, but it doesn't get the buzz of something a lot less mature like CouchDB for some reason.

    Posted by: Marc | February 13, 2009 6:08 AM



  20. The print button at the bottom of your post does not work, it only prints the first 1/4 part of the article.

    Posted by: Erik Terpstra | February 13, 2009 6:22 AM



  21. Personally, due to the fact that there are so many RDB based applications in the world, it would take years and years to phase them out or convert them to a new model.

    I think the key/value trend may attract a small amount of early adopters and some of those may be established corporations, but many businesses will be hesitant to change their entire data model over to the new method, due to expense and manpower.

    I feel that the RDB model will be here to stay for the foreseeable future.

    Posted by: Jason Bartholme | February 13, 2009 6:54 AM



  22. One of the big problems with key/value databases is that people (especially but not exclusively non-IT people) struggle to understand the concepts, let alone the considerable advantages that these solutions provide.

    It it wasn't for this little problem, SQL/RDBMS & the like would have been dead & buried by the late 1980s.

    Posted by: andy_pagin | February 13, 2009 7:51 AM



  23. To usurp the RDBMS you need a scalable EAV/CR solution that leverages platform independent pointers, so that you can point to Entities, Attributes, Values, Relationships, and Classes across DBMS, Application, Operating System, and Network boundaries (the "holy grail" of data access and data management).

    muhabbet mIRC

    Posted by: jan | February 13, 2009 9:02 AM



  24. Can anybody give an actual breakdown of a query on one of these key/value databases. Using page 2 of this article as an example.

    If I search for Model == Pathfinder, what the hell is coming back? A ragged collection of Items that match my query? And then what, do I iterate through this collection and try to find the “TypeOf” item I want? Or does one object come back that is basically a merge of every single item that was considered a match in the domain? And if that is the case, then what happens when one item says the color is green, and the other says it is blue? Or is it when I make the query, I must define the structure of the returning object?

    Posted by: JFD | February 13, 2009 9:27 AM



  25. I dont think the standard application is facing the RDBMS vs. distributed hybrid question yet. Typical applications and websites only deal with hundreds or thousands of 'rows', RDBMS can take care of this. Lets say you are an exception and are dealing with tens of millions, you still can push the envelope with RDBMS. Once you start hitting the hundred million or billion point, then you have a problem which RDBMS cant solve. Look at companies like Autonomy, they are making a killing becuase they understand this need and enterprises are handing them cash by the fistfuls because their database technology is years ahead of the competition. It really comes down to 2 things, vastness of data and $$$. 95% of the projects out there dont fit into either category, so they are going to stick with RDBMS. The exceptional problems which have enormous data scale and proper funding will find distributed database solutions waiting at their doorstep.

    Posted by: kevin green bud | February 13, 2009 9:37 AM



  26. Hah, no the relational database is *NOT* dying. This rumor is akin to the "Apple is going out of business" one that was so prevalent in the 90s.

    Most other solutions are either completely impractical for what relational databases handle, are too large and overly-complex for rapid development, or are theoretical at best and silly at worst.

    Each application should be looked at and the best DB solution should be applied.

    Posted by: jeff | February 13, 2009 9:37 AM



  27. It seems that an article like this comes out about every two months.

    I beg all software practitioners to please read C. J. Date's Introduction to Relational Database Systems. It's entirely possible to go through an entire career in software, never knowing what the Relational Model really is; doing so is a shame and does a disservice to yourself and others.

    Most of the confusion on the subject comes from the mixing the model with the implementation. Think of the Relational Model as being analogous to arithmetic, and the implementation as a calculator. The calculator could be an old, room-sized, gear and lever machine that takes minutes to produce a single answer. Does the clunkiness of such a calculator mean that arithmetic is "doomed"?! Obviously not. The Relational Model is basically the Mathematics of Data and as such is likely to outlive us all. The database systems of our time, however, really are a lot like old awkward calculators in more ways than we probably realize. The clunkiness of the SQL language, the lack of logical/physical separation in systems, and the silo nature of today's DBMSes are by no means the fault of the Relational Model. Fortunately, where there are problems, there is much opportunity.

    Posted by: Nathan Allan | February 13, 2009 9:51 AM



  28. I wonder, where is Oracle? Are they out of the game?

    Posted by: Konstantin | February 13, 2009 10:48 AM



  29. I'm amused to see Hierarchical databases presented as a 'new' alternative to relational databases, simply because I remember when RDBMS were the new alternative to the problems of Hierarchical systems - and indeed RDBMS had the reputation for being much slower / power hungry / etc, but what they offered in terms of relative simplicity led to their rapid uptake.

    The problem, to me, seems to be people just picking architectural stacks off the shelf and building on top of them without understanding them. If you're just using a database for object persistence, and there is no risk of any clash between users, or need to query data across them, then it is probably the wrong solution.

    On the other hand, there is a lot of mathematical theory (primarily set theory) underlying relational databases, that many other forms of data modeling lack.

    Also, there is the notion of the object-relational mismatch, which largely stems from the technical implementation of most existing RDBMS and SQL - a pattern was largely set by DB2 based on limitations of the time.

    The primary one is the fixed set of types - that means that we have to define our Car as having a foreign key to a Make and Model table, then joining to them to construct the full object (projection).

    But there is no reason why you can't have a database where instead our 'Make' attribute on the Car was literally defined as a Make rather than foreign key to a table. Equally, there is no reason why you shouldn't be able to define 'Car' as a sub-class of 'Vehicle', rather than defining tables and then creating a view for 'Car'.

    A lot of the problems are around the fact that we end up coding to the physical model, rather than being able to work with the higher relational model

    Posted by: JulesLt | February 13, 2009 10:58 AM



  30. oracle is servicing the vast portion of the market where RDBMS still works as a solution

    nathan, you bring up a good point, but the relational model point being made is a bit more complex than arithmatic. Can you simulate a nuclear explosion with a calculator? Sure, it will just take you a lifetime to do. The same goes for RDBMS. Its not too hard to transform complex data requirements to fit into the parameters that RDBMS can handle. But the point is that the hybrid distributed database can easily handle the data out of the box and blow away the RDBMS in performance and have much less data complexity.

    Imagine a single command which can return real time results across billions of 'rows' of data in seconds without the contraints of tables, rows, and SQL. Basically the point here is that sometimes you have to break the mold to see the light. Given the establishment of RDBMS, you obviously going to cause a stir with such statements.

    Posted by: kevin green bud | February 13, 2009 11:50 AM



  31. When claiming that cloud based data stores have better performance versus relation DBs, it will be useful to also answer two important questions: which Relation DBs and for what kinds of queries ? Most Web programmers' impression of relation DBs is drawn from DBMSs like MySql that were designed to handle just a few millions of rows and were not designed to scale (i.e. to work on a shared-nothing architecture). There are relational databases implemented on the right architecture and designed to linearly scale that are handling billions of rows. Ebay has 5 peta bytes of user data on a Teradata relational database and other like Netezza are also crunching billions of rows. For decades, these relational databases have been using similar scale-out techniques used in cloud computing. There is not much new ideas in cloud computing that can not be found in a shared-nothing RDBMS. I think there is no technology based argument against relational databases in terms of performance (Even for key/value data). The argument is affordability and availability for the masses. And I believe this is a valid argument.


    Posted by: Dawit | February 13, 2009 12:48 PM



  32. Paylaşım için teşekkrler..

    Posted by: izmir evden eve | February 13, 2009 2:13 PM



  33. fabulous post!

    Gnip bucked the relational DB trend for scalability reasons (node/cluster replication simplicity) and instead went with Distributed Shared Memory via TerraCotta. ( http://one.valeski.org/2008/05/gnips-head-is-in-clouds.html )

    We've hit some app ceilings as a result however; at some point you have to get data down onto disk to keep things moving along (there's only so much memory in the world, while disk is infinite). That has us exploring a hybrid/custom berkley db persistence model.

    Relationships between our data aren't all that interesting to our real-time, in memory, app, yet they're valuable when it comes to metrics/reporting which can be done offline in a traditional "DB" that as a SQL interface.

    This is all well and good when you own the internal structure/use of your data model, but when you want to interface with others (contractors for example) they generally speak SQL and expect status quo when it comes to access to the underlying data. While technically annoying, it's an operational/marketplace reality; everyone speaks SQL. Too much customization in the data access layer can silo your project.

    If you're using a relational DB to query/explore/define/structure the relationships between your data, that's one thing. If you're using it as a way to persist data, that's another, and generally a modern relational DB is way overkill for this use case and most developers use it because they're lazy (shocker).

    Posted by: Jud Valeski | February 13, 2009 2:23 PM



  34. Apples and oranges. The purposes of the two models presented are so fundamentally different that it's rather silly to compare them.

    A data store that provides no integrity management is a simple heap. Performance and scalability are easily undermined by bad data.

    The main reason why relational databases are so effective and why programmers hate them so much is that they are data-centric. Programmers tend to see data as secondary or peripheral to code. This programmer bias is the main fuel in the quest for something "better" than an RDBMS, resulting in reinventing wheels that were partially or completely rejected in the 1970s (such as the hierarchical model).

    In general, the average practitioner knows neither enough theory nor enough history to evaluate a database management system effectively. Moreover, they are often constrained by outside forces (boss's or client's choice, operating environment, etc). So, they will continue to choose whatever they are told to use, relational or otherwise, while complaining that it does not do what they want it to do.

    Posted by: Senny | February 13, 2009 2:43 PM



  35. Hi Nathan, thanks for your comment. In my article I try and convey that the relational database is clearly the most universal and indeed it is not going away. However current technical issues make it difficult to easily scale a RDB across thousands, hundreds or even dozens of nodes. Key/value stores are by nature more compatible with distributed scalability. If you only need the capability of the key/value store then that’s great (and clearly there are requirements where a RDM is less suitable and a key/value store more suitable), but if you have a “relational” requirement that you want to dynamically scale-out your options are much more limited. In fact if your scalability demands are so important you might choose to retrofit your database into something that is compatible with a key/value store even if this is far from idea from a pure database architecture perspective.

    Some key/value stores or document orientated databases are designed to meet a specific need and are achieving that and don’t aim to be anything different to that. However what I think is happening in some other vendors, certainly what Microsoft seems to be doing, is simplifying everything in terms of architecture and functionality to the point where the distributed scalability problem can be solved, then work backwards re-adding core capabilities until you re-gain some of the key concepts of a relational architecture. For example SDS has a (very) limited join operator.

    Posted by: Tony Bain | February 13, 2009 3:02 PM



  36. Senny, agree about apple and oranges. But some distributed scalability demands and the cost effective cloud data service offerings are forcing/prompting people to try and make cider from orange juice (is that taking the metaphor too far?).

    Posted by: Senny | February 13, 2009 3:07 PM



  37. paylaşım için çok teşekkür.

    Posted by: söve | February 13, 2009 3:18 PM



  38. Hi Tony, great post.

    As an RDBMS person I'm still trying to get my mind around non-RDBMS systems and when we should be using them.

    As a follow up post I'd love you to connect this to the big websites like Flickr, FaceBook, Twitter etc. What do they do?

    Cheers

    Rod

    Posted by: Rod Drury | February 13, 2009 3:38 PM



  39. I apologize in advance for my complete ignorance of the back end of databases. I am a simple consumer.

    I work in education and have data in more than half a dozen different DBs - student data, assessment, special ed, finance, HR, etc. No matter how I look at it, bringing all that data into one DB product does not seem feasible. I am faced with problems of both keeping key biographical sychronized across DBs and running reports using data from multiple DBs.

    We're not likely to invest in a SIF solution any time soon for a number of reasons including money and upkeep. But there are a few new products on the market that will pull selected data from all of those sources into a data cloud and then run both pre-programmed and on-the-fly reports. This doesn't solve the data sync problem, but there are some ODBC routines than can address the basics of that.

    As long as you have one of a few key fields in every data set (like Student ID, Teacher ID, etc.) it can compare related data. Most of the data will continue to reside in traditional DBs, but we might also get an assessment report from the state in CSV text format and just add it into the data cloud rather than worrying about which DB we want to import it into and having all of the fields correctly created and mapped.

    So, am I correct that this is a solution that makes best use of both approaches? Or do I not know what I am talking about? If anybody has a different approach, I would love to hear it.

    Posted by: Michael Patron | February 13, 2009 3:39 PM



  40. > The responsibility of ensuring data integrity
    > falls entirely to the application

    Hahahahahaha! Priceless.

    Posted by: Mike | February 13, 2009 4:36 PM



  41. Great article. Great discussion.

    I'm going to agree with the comments that limitations of RDBMS are typically due to lack of application developer/web developer knowledge on proper modeling and databases exertise in general.

    However, scalability is certainly an issue and as we push out into more hybrid/cloud warehousing models relational simply is not going to do it. Netezza is on the right track and Oracle is dabbling as well. Discussions of spatial/graphical databases are on the rise (Franz). Meta data is just not as easily harnessed in a traditional relational model and a smarter referencing and relationship building is necessary.

    I don't see RDBMS going away, but for heavy analytical and less process oriented applications I think the trend will be that RDBMS will phase out.

    Posted by: Michele | February 13, 2009 5:16 PM



  42. Don't forget DabbleDB. It's pretty awesome.

    Posted by: AppBeacon | February 13, 2009 5:39 PM



  43. thanks! fun article.

    i'm surprised you mentioned sql data services instead of azure tables. of the two, tables is definitely a better representative of the new breed.

    details: http://snarfed.org/space/windows+azure+details#Tables

    Posted by: Ryan Barrett Author Profile Page | February 13, 2009 5:42 PM



  44. Speaking of key-value stores: Tokyo Cabinet. Best part is, it goes beyond a key value store and offers query support sans fixed schema.

    Posted by: Ilya Grigorik | February 13, 2009 5:51 PM



  45. Hi Rod, great idea for a follow on post. FYI, most of these key/value stores have risen out of the larger Web companies own internal requirement for massive scalability (and cost effectiveness). Amazon - SimpleDB, Google – BigTable, LinkedIn - Project Voldemort, Facebook – Cassandra, Yahoo – PNUTS and I believe work from DoubleClick inspired 10gen in creating Mongo. From what I understand these all were created to serve a need and have all since taken on a direction of their own.

    Posted by: Tony Bain | February 13, 2009 11:09 PM



  46. Nice read... for the first third. There's no "all in one page" option so I'm not spending more effort on this site as a matter of policy. Too bad, it had some promise.

    Posted by: Joe Public | February 14, 2009 1:16 AM



  47. For the most part these discussions are stuck in a very narrow paradigm and of course they are true in this paradigm. Simply, consider the following difficiencies of relational databases:

    1. Why everytime a new piece of data is to be stored you need dba or some tech guy to add it. No automation
    2.As things change constantly in real life one has to rewire how one thinks about reality - in rdbms world it means massive rewrites or constant adding on bits ending with sphagetti. In other words, rdbms solidifies structures and is very bad at adapting its internal structure to changes in real world.
    3. The word relational should be replaced with slightly relational. The relations reflected are only of the most trivial nature: key lookup. All other relations are embedded in programs that read some data and write some data. Vaunted transactional processing.
    4. Problem dealing very complex problems, that in relaional scheme would need thousands of tables are just not tractable, even with small volumes.
    5. Current RDBMS are not tempral or time aware, this means that one has to from scratch and in non standard way to create this temporality.
    6. The bigest failure is that everybodies db is non standard and all is redone over and over manually keeping millions of techies in jobs.

    Sorry, this is a bit blunt. We believe there is a way of automating this stuff by elevating to semantic , knowledge plane.

    P.

    Posted by: pawel lubczonok | February 14, 2009 3:04 AM



  48. But why does your Car table have a MakeKey column? Somebody should normalize these articles.

    Posted by: Jason | February 14, 2009 6:19 AM



  49. @ Ted: "The best database currently available
    is the human brain (well.. most of the time :o)
    It is associative/ relational. It is amazing.."

    The human brain scales well too ;)

    Seriously though, if you look at the brain, specifically at a storage level, you'll see that the way memory is stored is via a distributed, decentralized, message passing model. That why we can't run accurate reports on memories, et cetera.

    So the brain is not a Relational model: not in the sense that we are talking about anyways. It's very much a message passing model - the messages being a hash table of sorts - with very loose relationships by accidental overlap. Do we really want a computer system to be as undependable as our mind?

    On the other hand, the mind is blazing fast because it is not constrained. So the results we do get are usable immediately. The mind also prioritizes based on a density of messages with like overlap. It then takes those messages and looks for more overlap in other messages on the parts that don't overlap already. In this way it gives the illusion of a cohesive unit.

    But really all we have are individual cells playing the phone game, and passing messages they've received as quickly as possible before they die off. This method is fantastic for an organic system that HAS to be scalable first and foremost due to the unpredictability of life and death.

    If we start to treat servers (the cloud) in the same way of thinking we will have to treat data like this as well. There will no longer be a single storage place for any one piece of data. That piece of data will need to be stored in SEVERAL places at once. The more is is stored the more likely it is to survive. Then the query implementation has to take on a polymorphic approach to relating data. It has to assume that the application understands the data more than the data storage itself does.

    As a veteran in the trenches I can say that data is the most misunderstood field of all. I would say I spend the bulk of my time as lead teaching the others how to form a relational table to the expected model, how to normalize, and when to store data as key/value meta data instead. The main reason for this is because people don't naturally think in terms of data retrieval. They look at things on a document level for the most part. It is a unique and rare skill to naturally take to data and know how to make it work for you. As stated above, this is a specialized task that not many are equipped for.

    Do I think RDBMS's are dying. I hope so. It would save me the hassle of teaching them ;) . Even then though I hope people realize that we will always have relationships. It's just the implementation that is wrong. Key/value data will relate somewhere; whether in the app or in the database. Otherwise the data is useless in the long run for an sort of reuse application. I hope that we can move forward and not backwards with our database engines. The solution is not there yet.

    Posted by: Michael Christenson II | February 14, 2009 7:41 AM



  50. I could not agree with you more Michael (Christenson II). The brain had considerably more time to develop than our data base systems and has to deal with constantly reconfigurable "relational data base scheme" ( :-) ) (Other wise we would die). And, in fact that is where the traditional RDBMS is hopeless, the effort to reconfigure just the data that goes to the problem on the fly is so arduous. The reality is that the world is moving away from intensively changing data to intensively changing knowledge. The problem of quickly processing bank transactions is long time solved.
    On a small point, your point "Do we really want a computer system to be as undependable as our mind?" it is not necessarily that systems that are perfect are most dependable and or efficient. We have some evidence that systems that err and have capacity to self correct may be more reliable :-)

    P.

    Posted by: Pawel Lubczonok | February 14, 2009 7:54 AM



  51. 1 2 3 4 Next

Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |