Spam filters

I notice that Larry MacDonald has had some issues with the Akismet spam filter, which has been over-zealous in filtering his comments on other people’s websites.

I notice that on this site, he had three comments which I had to fetch out of the spam filter.

Spam is really all about economics – you can produce enough of it for such a low cost that even with a microscopic payback rate, it still is profitable. It does cause a lot of pollution on the internet, but with some Bayesian filtering you can weed out most of it and have an acceptable false positive rate.

Spam is also about game theory – a cat-and-mouse type game. The new game is that spammers are trying to increase their “credibility scores” by making innocent-sounding comments and posts in the hopes that human operators will “approve” them and then they will be able to deliver their payloads more effectively elsewhere.

Akismet has been good at picking up comments like “Like your site, keep up the good work” and other such spam comments that were socially engineered to let site owners keep them on their site. Filtering techniques are quite effective at weeding out these types of comments, such as using the IP address of the comment origin, and the submitted name and/or email address included with the comment.

However, if filtering gets too hyper-aggressive such that it begins to block out legitimate comments (called false-positives), it undermines the entire system.

Imagine a cellular phone network taking 5% of your incoming phone calls and/or text messages and not relaying them to you. You would consider this unacceptable. In the comment world, the acceptable false positive rate is likely higher, but for emails, it has to be one in a thousand in order for the system to be effective.

If spammers are able to increase the false positive rate, it will also be a victory for them since it undermines confidence in the spam detection system.

Unfortunately for Larry, it is likely that Akismet has somehow flagged his online signature as spam. Not sure how that happened, but at least on this site, I have taken three of his comments out of the spam bucket.