Benchmarks, love them or hate them, they are here to stay. Though rarely (never?) perfect, they can be helpful and more often than not, misleading. For now though, I want to comment on just one aspect of benchmarks: dates. As a general rule, most people expect benchmarks to change over time because technology generally advances and gets better as time goes on. For my rant on this subject I’m going to pick on the SQLite Database Speed Comparison because it is such a great example of why dates are important to a benchmark (and I am still seeing it referenced, as recently as 31 Dec 2004).
For starters, there is no indication as to when this benchmark was performed. Was it December 2004? October 2003? May 2001? It doesn’t say. Now presumably this benchmark would (should?) carry more weight if it was fairly recent and less weight the older it gets. The only date I could find on the page was the last modified line at the bottom, 2004/10/10. This might lead one to believe that this benchmark was done in the very recent past. There are clues that lead me to believe this is not the case though, specifically the version numbers for the various database systems tested. While I haven’t tracked the SQLite releases very well, I’m reasonably familiar with when PostgreSQL and MySQL releases have happened. So I looked around to find out definitively when each version of the tested software was released. Fortunately each one has a changelog available, with dates:
SQLite 2.7.6: 25 Jan 2003
PostgreSQL 7.1.3: 15 Aug 2001
MySQL 3.23.41: 11 Aug 2001
The most obvious issue here is the huge range in dates. While tested versions of PostgreSQL and MySQL were released in the same month, the SQLite version is 17 months newer than either of the other software versions. At this point I would hazard a guess that this benchmark was probably done not long after January 2003, given the dates above. On this merit alone I pretty much ignore the numbers generated by this benchmark because of the huge gap in development time. We’re talking nearly a year and a half advantage given to SQLite over PostgreSQL and MySQL. Who would possibly think that this would provide any sort of meaningful comparison for someone looking into database benchmarks?
Just by the way of additional information, by January of 2003, MySQL had released version 4.0.10. The MySQL 4.0.x branch was given the ‘Production’ tag two months later in March. Comparing it to 3.23.55 might have also been reasonable, it also came out in January 2003. PostgreSQL released 7.3.5 on 4 February 2003, so they could have used 7.3.1 from December of 2002.
This benchmark is now some where around two years old, making it almost useless for anyone looking for meaningful information today. Of course with out a published date there is now way of knowing this for sure. Unless you did a little bit of homework and looked at when the release of each software version was you could be lead to believe that this test was done three weeks ago. Another important date feature is indicating when the tested version of each software was released. Normally I might not include that as a requirement because it should be reasonable to assume that the most recent production level code available was the one being tested. In this case however that is clearly not the case, so that fact needs to be disclosed.
I’m not picking on SQLite as a product, just using their published benchmark as an example of how excluding date information is a fatal flaw in their document. Notice I did not discuss any of the results or numbers, because in this case once I found out the pertinent dates I’m able to throw out pretty much the entire benchmark because of its age.
7 replies on “Benchmarks And Dates”
You’re very right. I looked for a date on that comparison, but couldn’t find one. The reason I still went ahead with the post is because:
(1) The MySQL version is still in heavy, heavy use. 3.23 was the last version before 4.x and most hosts still use it, I hear due to some licensing changes between 3.x and 4.x.
(2) Postgres hasn’t gone far — it’s only up to 7.4.6 as I write this. (8.0 is at RC1)
(3) While SQLite has pushed into 3.x, its numbers are so drastically better than MySQL, that I was more interested in the difference between them than actual, measurable figures.
It’s a trade off. I was more interested in showing how fast open-source, *nix-ish databases were in general, as opposed to the proprietary alternatives. I should have emphasized that more.
Deane
Older versions of MySQL are still in heavy use, but benchmarks are generally for those looking to deploy something new (either for replacement or brand new). If you were doing this today I’d be hard pressed to believe that anyone would avoid MySQL 4.x in favor of MySQL 3.x.
To compare how much something has changed based on the numerical difference in version numbers is extremely limited, at best. To be honest I’d consider that a lame excuse not to look at new versions of software, PostgreSQL or other wise. Do you consider than Linux hasn’t made many changes because it hasn’t hit 3.0 yet over 2.4 or 2.6?
The whole point about dates and benchmarks emphasize that the numbers are relative to a certain point in time. Without giving a date you don’t know when that point in time is. Sure SQLite has the best numbers in this benchmark, but that wasn’t my point. My point was that they didn’t tell you that they were comparing software versions of MySQL and PostgreSQL that were a year and half older and that the benchmark is at least two years old.
Don’t you think that in two years it is possible that these numbers may have changed some since then? I’m not claiming that PostgreSQL or MySQL will now beat the pants off of SQLite, simply that the numbers are likely to change (for good or bad) over time and in this industry two years is a very long time.
Actually, MySQL 3.23 will stay in heavy use due to licensing.
My understanding of it (this may be wrong…) is that the 3.23 libraries were LGPL, meaning you could still use them in something and not GPL the whole thing. However, the 4.x libraries are full GPL, meaning you may have to GPL your whole app.
3.23 will gets deployed a lot, believe me. All your other concerns are perfectly valid, however.
Licensing is certainly a valid reason for picking one version over another (unfortunately). I suppose this is one of the reasons I generally prefer the BSD license over GPL/LGPL.
I downloaded the MySQL 3.23.58 (latest 3.23 version from MySQL.com) to see what license it had. The README file starts with
“This is a release of MySQL, a GPL (free) SQL database server…”
The file COPYING is a copy of the GPL. Still wanting to make sure I looked at the docs that come with 3.23.58. Section 1.4.3 MySQL Licenses, says:
“The MySQL software is released under the GNU General Public License (GPL), which is probably the best known Open Source license.”
Based on this info I’d say that 3.23.x versions of MySQL are licensed under the GPL. MySQL seems to interpret the GPL in ways that not everyone agrees with, so there may be still be issues, but it doesn’t appear to have anything to do with GPL vs. LGPL.
I also downloaded MySQL 4.0.22 (latest 4.0.x version) from MySQL.com. I looked in all of the same places I mentioned above for 3.23.58. The text in each is virtually identical, all indicating the GPL as the license being used.
Given that there appears to be no difference in licensing between 3.23.x and 4.0.x of MySQL I see no reason anyone would stick with 3.23.x for new installs. MySQL may interpret its licensing in strange ways, but it is still claiming GPL in both versions.
Huh. I think it’s in the client libraries where they were LGPLed? I swear I heard this somewhere.
I just went over to Plesk and logged into the demo of “Plesk 7.5 Reloaded” — their latest and greatest. Sure enough, MySQL 3.23.56.
I’m tellin’ you, there’s some reason a lot of companies are staying with 3.23. I’d be interested in remembering what it is…
IIRC the problem with my$ql stems from the change in the client libraries from lgpl to gpl, meaning that (as my$ql once put it) any program connecting to mysql using the libraries has to either be gpl or get a commercial license. Yeah, that tends to scare a lot of people away.
As for PostgreSQL… ” Postgres hasn’t gone far – it’s only up to 7.4.6 as I write this.”…. egads!! That couldn’t be farther from the truth. With PostgreSQL, each secondary version number bump release is essentially an order of magnitude level change, so going from 7.1.x to 7.4.x is like from linux 2.0 -> 2.6, quite a change indeed. I wont claim that we beat the pants off sqlite in the sqlite benchmarks by any means, but we certainly beat the pants off the 7.1 results seen in the test (someone actually attempted this a while back, with some extensive googling you might be able to find the information).
You may be missing the point a little bit of SQLite. The reasons the benchmarks dont matter, (not to mention date…).
SQLite is not client/server based, therfore the entire inner workings of the product is different compared to the others. If you want to get down to the nitty gritty to compare them at all is pointless. For Example : Where’s the benchmark for metwork latency? Since SQLite has no network latency you are going to see a significant speed increase. Also the fact that SQLite runs in a single process and the others need a minimum of 2 processes running there is going to be another hit.
The benchmark is there for people who dont know much about databases. Like a technical manager or something like that ;). So when hes surfing the net looking to save money, and says “Why dont we use that?!?!” blah blah blah… We’re still on MySQL.
I guess what I’m trying to say in a nutshell, is that comparing SQLite to MySQL is like comparing and XML database API to MySQL. SQLite is getting better but is still not a databse yet.
ie..
No column type contraints.
Text can be put in an int column.
They are just now getting around to adding ALTER TABLE statements that allow you to adjust columns.
Don’t get me wrong I think its a slick package, and with time may mature very well.
I’m just rambling… but that benchmark in the first place is a joke.