josephscott – Page 467

It seems that most people (myself included) who have dealt with database design long enough learn to become good friends with the concept of sequences. For a brief review take a look at the PostgreSQL docs for CREATE SEQUENCE and sequence functions. This concept is implemented in different ways (MySQL: AUTO_INCREMENT, MS SQL Server: Identity, DB2: Identity and Sequence and Oracle: Sequence), but supports the same general concept: automatically generate a series of numbers, usually for one specific field in one specific table in on specific database. This sort of thing is very handy and is often used as a primary key because it is guaranteed to be unique (at least for the table that it is being used in).

Being able to generate unique numbers (generally sequential, but that isn’t guaranteed to be the case by any means) also has other uses. I’ve seen it in used in some cases for order and invoice numbers. When a new order is created you need some way to uniquely identify in the database, often times a sequence works just fine. Since the database is using it in this way you may as well provide it to your customer so that if they call back later they have a way to quickly reference their previous order. The same idea goes for pretty much anything that you want to go back and uniquely identify later.

But life in sequence land isn’t perfect. Database replication often runs into complications when making use of sequences. Depending on the replication scheme this can vary in severity. Another issue is uniqueness across something broader than a single database. If you have satellite databases that sync up to one master database each night then you have to avoid collisions between all of the satellites. There are ways of solving that problem and still using sequences, but part of the reason for using sequences in the first place is their simplicity and ease of use.

With all of this in mind some have recommended using Universally Unique Identifiers (UUID) instead of sequences. Instead of an integer, a UUID is a 37 character string and is suppose to be unique for everyone for anything. An example of a UUID is: 118edde0-fb3e-11d9-8cd6-0800200c9a66. There are different versions of UUID based on how the UUID is generated, using such things as the MAC address, time stamp, random seed, name, name space, MD5 and SHA-1.

So having a universally unique identifier gets around the collision issues for things like syncing and replication, but it doesn’t really work that well for the other examples. Could you imagine having to provide a 37 character string as an order number to a customer? And you thought tracking numbers for UPS, FedEx and DHL were bad. This isn’t so bad for things like entries in the del.icio.us database and other identifiers that you don’t expect to be written down though.

On a more theoretical note I’ve wondered if there any performance difference in replacing the use of sequences (integers) with UUIDs (char(37)). This may likely vary between database implementations, or even indexing methods. One these days I may devise a few tests of this on PostgreSQL and MySQL.

As I wrote this in a rather brain dump sort of format another thought occurred to me, sequences are database generated data. This makes it different from pretty much all other data (except maybe default values). Maybe that is the pattern that I should be focused on here, data should be generated from outside sources and not the database itself.