While trying to get my thoughts down on feed searching a thought came to me about one of the problems with traditional web search engines, figuring out when pages are updated. For dynamic content like feeds this problem has been addressed by using XML-RPC to ping sites to let them know there is new information available. So why couldn’t this be used for static pages that are updated via FTP, WebDAV or FrontPage? Adding pings to those protocols isn’t realistic, but the web server would be able determine when content changes pretty easily. Enter the idea of an Apache module, mod_ping. This module would have a small database (dbm perhaps?) that simply stores the time stamps for the files being served out by Apache. If the time stamp on the file matches the most recent one in the database for that file then mod_ping would do nothing. If the time stamp on the file was newer than the one in the database then it would ping a configured list of services (like Google and Yahoo) letting them know that the page has been updated and that they should come along and reindex it when they get the chance.
Configuration of such a beast would have to be pretty flexible. Once enabled in Apache the configuration of mod_ping could be done in individual .htaccess files. Perhaps even using regexs so that you could easily exclude or include certain types of files, or directories. It would also need configuration directives to specify what URLs to ping.
There are two hurdles to mod_ping right now. The first is that it only exists in my head. The second is that Google and Yahoo would have to be convinced to support ping requests. Perhaps it should be a for pay service with the benefit to customers being that their pages are being indexed in a timely manner. For that matter this could be done with blogs too, pinging Google and Yahoo when a brand new page comes into being because of a new post. Its all about getting web pages indexed faster so that searches are more meaningful.
One reply on “Apache Module Idea mod_ping”
Check out mod_pubsub, sounds similar to what you’re talking about, although it does hold open a socket connection, which can get to be a mess with many subscribers.