If you want to create an intensive and emotional discussion with colleges, suggest casually that relational databases as we know them are dead. Even better if your colleges are DBAs or work with databases. The idea is of course ludicrous to most people if not plainly stupid. But think again. Nothing lasts forever, but then, technologies do not die easily. Maybe relational database are not going away soon but their roles might be changing.
Challenging the role of the relational database system is sure to create a lively and hot debate. I’m not saying relational database are done for, they will have their place and role in IT systems, but their place and role is changing. I will consider three trends.
First, consider the traditional role of the databases and there they come from. That is, why do we need a database to begin with? Traditionally, database organize the secondary storage. As memory is limited and expensive, putting data in a secondary, less expensive albeit slower storage makes sense. Databases are therefore the main storage of data in enterprise IT systems. The advantage of the database is that it provides a central repository of massive amount of data. Surly there is advantage of the centrality and the vast storage capability. It’s not like we can put all our data in memory.
Or can we? Let’s image you have a computer or device with 1 terabyte of solid state memory. Better yet, let’s make that 10 TB of memory. Advantages in solid state memory technologies are advancing to the terabyte scale. And prices are going down. While you thought it was revolutionary to have all your songs on an iPod, image you have a device with every song ever recorded! The issue of hard disk is not interesting – they’re gone. With vast amount of non-volatile memory why not organize your data in memory objects using whatever data structures you want to organize your data. No hard disk, plenty of fast memory – do we need a database?
Of course you can have the database in-memory since they are good for organizing data and provide pretty fast retrieval. After all, the people in Redwood City and Redmond make pretty good products. What does this mean? For one thing database latency is less of a problem suggesting different and more efficient ways to use databases. For example, many object oriented enterprise design patterns aim to reduce network latency sometimes at the cost of good object oriented practices.
Sure, for large amount of data, most of it which is “dead” and not likely to be used any time soon, it does not make sense to put everything in memory. This brings me to my second trend which the database as an archive.
Consider a data center for a web based application where incoming requests come from outside clients. The requests are in the thousands per seconds and the latency must be very low. Think of a major bank where stock are being traded. How can you scale these types of applications? Getting the fastest supercomputer is not the answer since this problem is the amount of scale not processing. The solution is to get hundreds of servers each handling number of requests. For scalability, more servers are added. In this environment, the database is simply too slow and becomes a bottleneck even if it is clustered. The solution is to have lots of memory and do all transaction in-memory. For reliability, transactions are replicated over more then one node.
In this case the database becomes an archive instead of handling the real-time transaction. All transactions are sent to the database using a messaging system. The data will get there but we don’t have to wait for it to be stored.
The third trend is not so obvious and has to do with the nature of relational database. These database are organized into tables and rows, with indexes relationships. This was of course the correct way to structure data for data processing system. Customer record is a customer record with customer fields. Same of order, invoice and all this business related data.
While this works fine for business related data processing, the problem is that the world is not organized into tables and rows. As we see new type of application with user generated content and social networks, new types of data is required. And the schema needs to change all the time. For relational databases, changing the schema is enough to cause database programmers to jump out of their seat in horror. But in the new world of fast moving web sites creating an advanced data schema that will never change is not acceptable. The requirements are that you store data in some data store and get the database – no schema needed, the application programmer decides the structure. We can already see this in Amazon’s SimpleDB. Just PUT something in the store, and GET it back. The store is more like a multi-dimensional spreadsheet then relational database. No data types, just strings.
What ever holds the future for database technologies we must acknowledge that things are changing. And with changes there are new opportunities.