From time to time the concept of ‘Database Abstraction’ comes up, lately a lot in PHP circles because it often runs along side the ‘Using PHP Templates’ (like Smarty) argument. One of the unfortunate side affects of these discussions are arguments over what is a ‘Database Abstraction Layer’ and what isn’t it and what is a ‘Template System’ and what isn’t. I think most people get these terms confused with other concepts. So let’s tackle just one of these for now, the question of ‘What Is A Database Abstraction Layer?’.
Definition 1a for abstract from Merriam Webster applies most to what we want out of database abstraction: ‘disassociated from any specific instance’. In the context of our discussion we might word this as ‘disassociated from any specific database server’. Meaning that no matter what database back end is being used (presuming that it is supported by your abstraction software), everything runs fine, we are completely ‘disassociated’ from the database server abilities. The result of such a thing would be the ability to swap MySQL for Sybase, then move to Oracle, back to MySQL and then to PostgreSQL, all with out changing a single line of code (again presuming that MySQL, Sybase, Oracle and PostgreSQL are all supported by you abstraction layer). Certainly sounds impressive. One might wonder if such a thing even exists. There are certainly projects that are trying to do just that.
Getting all of this portability isn’t for free though. For starters it is never going to be as fast as the ‘native’ database functions. That isn’t to say that it won’t scale though. But I’m not going to go down that road right now, I simply want to state out front that adding extra code will add some execution time. One of the biggest jobs of the abstraction layer is dealing with the same feature that is implemented in many different ways across database servers. As examples of this take a look at ADOdb‘s page on writing portable SQL in PHP. Even something that seems simple enough, like limiting the result set to the first ten rows, is done five different ways. As a result instead of writing plain SQL you end up learning ADOdb functions + SQL. To truly abstract away the database differences your queries must be done this way, otherwise you might accidentally write SQL that only works with your current database. Even that limitation isn’t the biggest hold up with abstraction layers though, it is the idea that your feature set must be constrained to the lowest common feature set among all of the database systems your software will support. One example is a feature that I enjoy in both PostgreSQL and MySQL, regular expression support. If those were the only two databases that I wanted to support, that would be fine, but as far as I know MS SQL and DB2 don’t support it and Oracle only started supporting it in recent versions. Another example is support for Sub-Selects. Most of the big databases support it, but MySQL doesn’t support in their current production version (although their beta version 4.1 does), so making use of that feature is out if you want to be able to support MySQL as a current database back end.
The summary version: a true database abstraction layer will take a speed hit (ok, livable), require you to not write SQL but a mix of functions plus partial SQL (I don’t really like that, but still livable in some cases) and restrict you to the lowest common feature set among all of the database systems you want to support (this can be the real killer). My feeling is that for many projects (not all) using a true database abstraction layer simply isn’t worth it. You might ask then, under what situations is it worth it. An example of when using an abstraction could be the right decision is something like PHPLens, a tool that derives a lot of it’s usefulness by being able to be used against several different database servers. On the flip side of that I think that most ‘in house’ projects probably don’t need to use an abstraction layer, mostly because of the limited feature set requirement.
One of the things that many people confuse with an abstraction layer is something that we’ll call a ‘Database Access Layer’. A database access layer is a consistent way to access your database resources, but makes no attempt to abstract away the differences between database systems. This is the approach that I personally have taken on most of my projects (no sense in trying to hide my preferences). This technique usually provides a simple class (or set of functions) that are always the same no matter which database you are connecting to, but leaves it up to you to know how to correctly deal with the features of your particular database. This means that if you switch from Oracle to MySQL you’ll have to change your SQL queries to deal with that, but the functions (or class) that you use will remain the same. The benefit there is only having to learn one set of functions to access your database, even though actually moving from one database to another will require knowledge of each database’s feature set and how to properly use it.
Virtually everyone seems to agree that an access layer is a good idea, but they tend to call it an abstraction layer, which then confuses the whole discussion because now the conversation turns to the differences between an access layer and an abstraction layer. Hopefully this post has cleared up some of that.
This whole thing started with a post from Jeremy complaining about abstraction layers which in turn started with a two year old SitePoint post where the author ranted about: how smart he was, how stupid template systems are, how MySQL is a joke (why do people insist on getting on that wagon again and again?) and how database abstraction layers are the only way to go. Jeremy points out what is generally obvious to most people, moving from one database system to another is not easy. I would add to that that it is not easy unless you are willing to strictly abide by the lowest common feature set rule and even then it isn’t a cake walk.
In the end, do what works for you and your project. At a minimum using an access layer just makes plain sense. Deciding to use an abstraction layer will take a lot more research and evaluation to determine if it will actually meet your goals without out restricting your feature set beyond what you are comfortable with.
Update (9:06 am 9 Jul 2004): Post over at PHP Everywhere in response to Jeremy’s post. This post is a perfect example of confusing an abstraction layer with an access layer. They are not the same thing. Trying to use the term abstraction to mean all things to all people just causes more confusion.
Update (10:46 am 9 Jul 2004): Lambda’s post about Jeremy’s post. Basically falls into the same trap that PHP Everywhere did, labeling everything that provides any sort of layer to database functions an abstraction. One more time; abstraction layer: hides all of the features and uniqueness of your database, access layer: provides a consistent set of functions to access your database without hiding any features.
Update (12:10 pm 9 Jul 2004): RevJim’s response to Jeremy and PHP Everywhere, looks like he gets the idea of what an abstraction layer is and how it is different from the other approaches.