Categories
Posts

Database Indexes on Domain Names

As part of a separate conversation Matt suggested something I had not given much thought to before. Say you have a database column that is used to store domain names. Something as simple as `domain` varchar(255) NOT NULL.

Of course you’ll want to search all of that data. Perhaps you want to lookup all of the sub-domains for example.com. That query might look like

[sourcecode lang=”sql”]
SELECT * FROM table_name WHERE domain LIKE ‘%.example.com’
[/sourcecode]

Having a wildcard ( % ) at the front of a LIKE comparison generally results in poor performance. Depending on the database you may find that either it can’t make use of an index on that column, or it must do a full index scan in order to find the right entry. To make better use of an index the wildcard should be at the end of the string, which we could do if we reverse the domain name ordering:

[sourcecode lang=”sql”]
SELECT * FROM table_name WHERE domain LIKE ‘com.example.%’
[/sourcecode]

So that’s what we do. This of course brings up the question of how to correctly reverse the domain name parts. In PHP this can be done with a combination of explode(), array_reverse(), and implode():

[sourcecode lang=”php”]
$domain = ‘joseph.example.com’;
$reverse_domain = implode( ‘.’, array_reverse( explode( ‘.’, $domain ) ) );
[/sourcecode]

That turns joseph.example.com into com.example.joseph. The reversed domain name gets stored in the database and that allows us to use the more efficient database query.

6 replies on “Database Indexes on Domain Names”

That’s a pretty good idea, though it’s an old trick in the book. But tell me, what database would use an index for a ‘%foo’ query?

As far as I know, no index supports that and the only way to do that would be to go with either FULLTEXT search (in which case you’re stuck with MyISAM and the query would look different) or go for a separate solution, such as Solr or sphinx.

I’d need to go back and review query plans for different database systems, but I believe in some they will use the index. Now when I say “use the index” it’s often not in the way you think, they will do a full index scan. That’s not the same as doing an index lookup.

MySQL indexes are fairly primitive. For instance as far as I know there’s no support for functional indexes.

I believe only the first X characters in a VARCHAR are indexed, which explains why you may find that your query is more efficient reversed like it is. I would recommend doing some testing with very large domain names and many more records. Don’t forget the MySQL server’s index cache size is a factor, if the table has way more VARCHAR indexes than the cache can store then it will still do a full scan.

There definitely more tests that could be done in this area, but I think you’ll find the same general rule of LIKE ‘%blahblahblah’ being slower than LIKE ‘blahblahblah%’ will apply.

Leave a Reply

Your email address will not be published. Required fields are marked *