Spam is an unpleasant problem. It has managed to sink its claws into usenet and email to the point where I’ve more than once wondered why I still bother. In the last couple of years spam has entered into the world of blogs. One of the nice things about blogs is the increased interconnectivity (trackback, pingback, comments, feeds, etc), but it is these very same features that spammers are using to “advertise” their wares. Now that we’ve been through this fight a few times (and pretty much lost every time) there has been a lot of discussion about how to solve this problem before our blogs suffer the same fate as usenet.
Jeremy feels that comment spam could be fixed by search engines. The idea being that spammers hit blog comments in an effort to make themselves more visible to search engines and higher up in the results. I suppose on one level this is true, I’m sure that they would be thrilled if this result because it puts more eyeballs on their “advertisements”. Just like the fact that filtering spam won’t stop it from being sent to you, fixing pagerank and other search engine calculations won’t stop spammers from hitting blogs and hitting them hard.
Email spam has already proven this to be true. It wouldn’t matter that if 75% of all email accounts filtered spam with 100% accuracy (which they don’t by the way), spam would still be sent out to everyone (including those who filter it). All of these things bring us back to asking why. I suppose there are many answers to this question, but in the end I believe the simplest answer is: because they can. As long as the ability to spam email accounts exists, there will be those who are willing to do so. I believe the same will hold true for comment spam, as long as it can be done it will be, even if it doesn’t help their standings in search engines.
In the world of blogs though, comment spam is only one portion of the problem. I subscribe to a PubSub search for PostgreSQL. For the most part this is nice because when PostgreSQL gets mentioned in a feed that PubSub tracks it shows up in my subscription feed. This service suffers feed spam because PubSub can’t tell the difference between a feed entry written by a person who is really writing about PostgreSQL (or at least uses that word in the entry) and a bot (or person) who writes a spam entry and just happens to throw the term PostgreSQL in there so that it will show it places like my PubSub search. It’s hard for me to really blame PubSub since they are doing exactly what I asked of them, but it is annoying none the less.
Following down this path, if everything above is true, then how to we stop blog spam (either in comments, feeds, trackbacks, etc)? For now I believe there are ways in which we can try to maneuver around it, but as long as it is still possible it will continue. So if you are looking for techniques to fight blog spam, go for methods that prevent the spam from ever successfully entering your blog, otherwise you will still have to deal with the stuff.