(Warning, I tend to use UUID instead of GUID. For purposes of this post consider them interchangeable.)
I’ve already mentioned the announcement of FeedLounge making the move from MySQL to PostgreSQL last month, but the discussion is still on going. If you haven’t yet, go read all of the comments on the announcement. This has to be some of the most productive and level headed (no flames yet) set of blog comments I’ve seen in a long time. For those of you who are coming late to this story let me bring you up to speed:
FeedLounge started with MySQL as their database, with the MyISAM table types. They ran into problems and so moved to the InnoDB table type in MySQL. Still not completely happy they did more research and decided to try out PostgreSQL for their database needs. Their tests showed that MySQL InnoDB database was 34GB, in PostgreSQL it was 9.6GB. Restores in MySQL took more than 24 hours, in PostgreSQL it was less than 5 hours. These numbers resulted in better performance and reduced time.
It has been more than two weeks since the initial announcement and there are still new comments being added. As of this morning there are 34 35 comments. Some of the comments have come from people who are very knowledgable about MySQL and the size of InnoDB indexes became the focus for why InnoDB was so much bigger than PostgreSQL. Using InnoDB with UUIDs as the primary key was identified as a problem, especially in the area of performance.
Things continued to get more interesting when Heikki Tuuri (the creator of InnoDB) left a comment confirming the size problems in InnoDB when using a UUID as a primary key. It seems that using a UUID in this way causes a couple of problems, the first being that InnoDB wants to keep things in index order, which is great for indexes on sequential numbers but bad for seemingly random data (PostgreSQL can do the same thing with cluster). The next problem is that secondary indexes in the same table get copies of the primary key index to making finding things faster. This, at least in part, was what caused the size difference.
Heikki’s solution (and others have mentioned this) is to use an auto_increment field as the primary key and create a unique secondary index on the UUID field. This would make the primary key index size much smaller and still allow InnoDB to do index lookups on the UUID field. That would seem to solve the problem, but there hasn’t been any feedback yet to confirm if it fixed all of the differences between InnoDB and PostgreSQL.
There is something about having to add an extra field to my table to make the database perform better that doesn’t seem quite right though. I don’t have anything concrete that I can point to showing that this isn’t a good idea, but it doesn’t feel right. For those who are running into these same types of problems though, it might be something to try, at least until you get a chance to try out PostgreSQL.