Categories
josephscott

UUID vs. Sequences

It seems that most people (myself included) who have dealt with database design long enough learn to become good friends with the concept of sequences. For a brief review take a look at the PostgreSQL docs for CREATE SEQUENCE and sequence functions. This concept is implemented in different ways (MySQL: AUTO_INCREMENT, MS SQL Server: Identity, DB2: Identity and Sequence and Oracle: Sequence), but supports the same general concept: automatically generate a series of numbers, usually for one specific field in one specific table in on specific database. This sort of thing is very handy and is often used as a primary key because it is guaranteed to be unique (at least for the table that it is being used in).

Being able to generate unique numbers (generally sequential, but that isn’t guaranteed to be the case by any means) also has other uses. I’ve seen it in used in some cases for order and invoice numbers. When a new order is created you need some way to uniquely identify in the database, often times a sequence works just fine. Since the database is using it in this way you may as well provide it to your customer so that if they call back later they have a way to quickly reference their previous order. The same idea goes for pretty much anything that you want to go back and uniquely identify later.

But life in sequence land isn’t perfect. Database replication often runs into complications when making use of sequences. Depending on the replication scheme this can vary in severity. Another issue is uniqueness across something broader than a single database. If you have satellite databases that sync up to one master database each night then you have to avoid collisions between all of the satellites. There are ways of solving that problem and still using sequences, but part of the reason for using sequences in the first place is their simplicity and ease of use.

With all of this in mind some have recommended using Universally Unique Identifiers (UUID) instead of sequences. Instead of an integer, a UUID is a 37 character string and is suppose to be unique for everyone for anything. An example of a UUID is: 118edde0-fb3e-11d9-8cd6-0800200c9a66. There are different versions of UUID based on how the UUID is generated, using such things as the MAC address, time stamp, random seed, name, name space, MD5 and SHA-1.

So having a universally unique identifier gets around the collision issues for things like syncing and replication, but it doesn’t really work that well for the other examples. Could you imagine having to provide a 37 character string as an order number to a customer? And you thought tracking numbers for UPS, FedEx and DHL were bad. This isn’t so bad for things like entries in the del.icio.us database and other identifiers that you don’t expect to be written down though.

On a more theoretical note I’ve wondered if there any performance difference in replacing the use of sequences (integers) with UUIDs (char(37)). This may likely vary between database implementations, or even indexing methods. One these days I may devise a few tests of this on PostgreSQL and MySQL.

As I wrote this in a rather brain dump sort of format another thought occurred to me, sequences are database generated data. This makes it different from pretty much all other data (except maybe default values). Maybe that is the pattern that I should be focused on here, data should be generated from outside sources and not the database itself.

5 replies on “UUID vs. Sequences”

I argued here once…

http://www.gadgetopia.com/post/3061

…that databases should generate a sequence numbers across tables. There should be a system table with a unique sequence applied to every new record in any table. By querying this table, I would get the (1) ID, and (2) the table. So a single number could identify a single record anywhere in the database.

The idea being that I could execute…

“SELECT [number]”

…and get back a record from somewhere in the database, without knowing what table it was coming from.

This article doesn’t really say much re. UUID vs sequences as row identifiers. We’re faced with an issue of InnoDB tables locking because of the auto_increment columns and are seriously considering using UUIDs in their place. What about table fragmentation (assuming the UUID is the primary key and is clustered)?

@David –

There are a lot of variables that come into play there. InnoDB does some replication of the primary key on other keys, so the increased over head could be more pain than it’s worth.

Your best bet is to try and it out and see if it’s worth the trouble to change or not.

In most dbs, UUID is a 128 bit datatype. If you were to store it as characters, it would take up 37 bytes, but the binary representation is more compact and takes up 16 bytes. The reason UUID’s are so unique is that there is a 1:3.40282367 × 10^(38) chance of getting of the same UUID if you have a perfectly random random number generator, which you don’t but even 1% of those odds means you’ll never see the same UUID twice, or so they say. Since a lot of websites use UUID’s for pages I tried googling 500K UUIDS in my database and none matched.
I am using UUID’s for my db application because I need uniqueness amongst several independent machines. There is no way to do this without a central server model, so I’m using UUIDs for primary keys. Obviously there is no way to ask for next record or previous record but I don’t need that functionality.

Leave a Reply

Your email address will not be published. Required fields are marked *