Categories
Database MySQL PostgreSQL

Benchmarks, Not Enough Details

Once again there is another attempt at comparing MySQL vs. PostgreSQL. Like so many others before it, this benchmark falls prey to a classic mistake, not enough details. So lets go through their review shall we?

Comparison
Operating System: All of the OSs listed for MySQL also apply to PostgreSQL. No explanation is given as to why they only listed two for PostgreSQL. There are even FAQ entries for some OSs, like AIX, HPUX, IRIX and Solaris.

Performance: I’ll go into more detail on this when later, suffice it say that this is a gross over simplification.

Other APIs: I’m not sure why they describe MySQL with “Most of languages” and mention specific languages for PostgreSQL. I suspect that virtually any language that has MySQL support also has PostgreSQL support. I can’t say this for sure of course because I have surveyed every single language that supports one or both of these databases, but you get the idea.

For the rest of the items they do a reasonable job, although they do leave out some additional features that might be of interest to some like domains, inheritance, sequences, etc.

Summary
There are some features that are extremely handy even for fairly small databases (like views). Look, MySQL does a pretty good job (aside from some issues) at what it is intended to do, stop trying to make excuses for though when you discover that it doesn’t fair well feature wise.

Benchmarks
There is no indication into how much tuning was done for either MySQL or PostgreSQL, if any was done at all.

Data Set & Results: There is no discussion about how many simultaneous connections/users there are. From the look of things all of these tests were done with a single connection/user. If that is the case then all of the inserts done in this should be thrown out and redone using MySQL’s LOAD DATA INFILE command and PostgreSQL’s COPY command. Doing bulk imports like this is always going to perform poorly using straight inserts. Not to mention that if you are only interested with single connection/user situations then you may well include things like MS Access for your benchmarks.

Another missing component is what type of table is being used in MySQL. Unless you specify what table type you want MySQL will use the default type, which is MyISAM. If the MySQL side of these tests were run using MyISAM tables then this whole test needs to be thrown out (inserts, queries and deletes) and redone using the InnoDB table type.

Until these sorts of issues are at least addressed all of the results should simply be ignored, there is simply not enough information to gain anything remotely useful from those numbers at this point.

Conclusion
Again, if you are interested in making bulk imports happen very quickly, at least use the right tool for the job.

Hardware & Software
Kudos for mentioning hardware and software details.

My Conclusion
Going good and meaningful benchmarks is hard work, taking a vastly over simplified approach like the one done here is not really helpful to anyone. There were no goals outlined as to what sort of usage they wanted to test against (although it looks like there were interested in single user heavy insert models) and insufficient discussion and details about how they were going to mimic that model as accurately as possible. Of the discussion that was provided, most it revolved around how to minimize the impact of rapid inserts wrapped in transactions for PostgreSQL. That discussion is waste because they weren’t using the right tool for the right job.

I was pointed to the review by a blog entry at SitePoint, which now has several comments. There is something of thread revolving around an issue with MySQL where if you try insert a 300 character long string into a field that only supports 250 characters, MySQL will simply truncate your data without throwing an error. This has been brought up before and it is simply wrong, the MySQL folks need to just fix this and move on instead of trying to find different ways to justify trashing your data when it is inserted. A counter point is brought up that good programmers always validate their data before attempting to do an insert, in this case making sure that your string is less than or equal to 250 characters. The sad thing about this stance is that there is some truth to it, but not in the sense that it is being used. It’s true that you should be checking for obvious problems in your data before you insert it so that you can give meaningful errors back to the user, however, that doesn’t change the fact that what MySQL is doing is corrupting data on insert.

This disagreement reminds me of the folks who simply add client side javascript error checking for form input, which allowed them to provide meaningful warnings and errors without having to process the form every time. The security folks were quick to point out that the same checks still had to be done on the server side because client side javascript checks were easy to by pass. Client side checks are a great thing for user, but they are no excuse to avoid those same checks on the server side. Checking the length of your strings is a good idea to provide good user feedback, but it is not excuse to allow your database to corrupt data.

Use the Microsoft test here, if MS SQL Server did this sort of thing, would you still be saying the same thing?