As part of a separate conversation Matt suggested something I had not given much thought to before. Say you have a database column that is used to store domain names. Something as simple as `domain` varchar(255) NOT NULL
.
Of course you’ll want to search all of that data. Perhaps you want to lookup all of the sub-domains for example.com. That query might look like
[sourcecode lang=”sql”]
SELECT * FROM table_name WHERE domain LIKE ‘%.example.com’
[/sourcecode]
Having a wildcard ( % ) at the front of a LIKE comparison generally results in poor performance. Depending on the database you may find that either it can’t make use of an index on that column, or it must do a full index scan in order to find the right entry. To make better use of an index the wildcard should be at the end of the string, which we could do if we reverse the domain name ordering:
[sourcecode lang=”sql”]
SELECT * FROM table_name WHERE domain LIKE ‘com.example.%’
[/sourcecode]
So that’s what we do. This of course brings up the question of how to correctly reverse the domain name parts. In PHP this can be done with a combination of explode(), array_reverse(), and implode():
[sourcecode lang=”php”]
$domain = ‘joseph.example.com’;
$reverse_domain = implode( ‘.’, array_reverse( explode( ‘.’, $domain ) ) );
[/sourcecode]
That turns joseph.example.com
into com.example.joseph
. The reversed domain name gets stored in the database and that allows us to use the more efficient database query.
6 replies on “Database Indexes on Domain Names”
That’s a pretty good idea, though it’s an old trick in the book. But tell me, what database would use an index for a ‘%foo’ query?
As far as I know, no index supports that and the only way to do that would be to go with either FULLTEXT search (in which case you’re stuck with MyISAM and the query would look different) or go for a separate solution, such as Solr or sphinx.
I’d need to go back and review query plans for different database systems, but I believe in some they will use the index. Now when I say “use the index” it’s often not in the way you think, they will do a full index scan. That’s not the same as doing an index lookup.
MySQL indexes are fairly primitive. For instance as far as I know there’s no support for functional indexes.
Yeah, either way all would be slow unless you use this solution or go more advanced.
As far as I remember, PostgreSQL supports functional indexes and MySQL definitely doesn’t.
I’d expect a full index scan to be faster than a full table scan in most cases, but yes, it will still be noticeably slower than an index lookup.
PostgreSQL supports functional indexes – http://www.postgresql.org/docs/8.4/static/sql-createindex.html – by allowing you to include an expression in CREATE INDEX.
I believe only the first X characters in a VARCHAR are indexed, which explains why you may find that your query is more efficient reversed like it is. I would recommend doing some testing with very large domain names and many more records. Don’t forget the MySQL server’s index cache size is a factor, if the table has way more VARCHAR indexes than the cache can store then it will still do a full scan.
There definitely more tests that could be done in this area, but I think you’ll find the same general rule of LIKE ‘%blahblahblah’ being slower than LIKE ‘blahblahblah%’ will apply.