Categories
Posts

mod_ping, Maybe I Should Have Called It PubSubHubbub

ReadWriteWeb has an article about Google potentially using PuSH to get updates into the search index. I knew this idea sounded familiar so I went hunting through some of my old posts and found – Apache Module Idea mod_ping from 2004.

Back then I had thought a lot about searching blog feeds. There were some services that offered blog feed searching but they were all pretty bad. I wrote a review of the situation at the time in Why Hasn’t Anyone Figured Out How To Do Feed Searches?. Keep in mind that Google’s blog search feature wouldn’t be announced for another year.

It seemed odd to me that blog feed search would be so bad given how strongly the blogging software community had embraced the idea of pinging updates. This led me to the idea of some sort of mod_ping for Apache that would do similar pings for any type of website updates, it didn’t have to be limited to just blogs. Obviously I never took this idea any where (besides writing that post) and to be honest I hadn’t really revisited the idea much since. Search engines took a different route for update frequency, with features like sitemaps in 2006.

Fast forward to 2010 and we’ve got discussion of Google potentially using PubSubHubbub (PuSH) to subscribe to updates from every single web site out there. This brings up an interesting question though. Since PuSH focuses on feed formats (RSS & Atom) for pings, what format will pings from sites that don’t have feeds look like? Will the ping just contain the entire HTML output of the updated page? What about a diff (unified format of course!) between the new HTML and the previous HTML for a given page?

Folks like Brett Slatkin have been thinking about this sort of thing on a deeper level than I ever did, so I’m curious to see where this goes.