SpamBayes

Shortly after giving up on the SAProxy anti-spam tool the other week, I found an alternative tool that is doing the job very nicely: SpamBayes. It’s an add-in for MS Outlook that acts as a Bayesian filter for your email. (Big, big thanks to Anders Jacobsen for pointing it out.)

Whereas SpamAssassin (and hence SAProxy) uses a fixed set of rules to weed out spam, SpamBayes uses statistical methods to analyse your incoming mail. (See the article “A Plan For Spam” by Paul Graham for an explanation of the technique.) When you first set it up, you show it a bunch of mail that is good (“ham”) and a bunch of mail that is known to be bad (“spam”). It uses this to construct a profiling database. Each new piece of mail is checked against this database to figure out how likely it is to be spam. If it fits the profile, SpamBayes will automatically toss it into a “Spam” folder for you.

The neatest thing, though, is that the system is constantly adapting to new spam as it comes in. If a piece of email comes in that is spam, but doesn’t trigger the filter, then you click a button marked “Delete as Spam.” The offending message is added to the profiling database, so reducing the likelihood of similar junk mails getting through in future. Likewise, if an email is falsely tagged as spam, or as “possible spam” if the system is unsure, you click on the button marked “Recover from Spam.” This tells the profiling system that it made a mistake, and that it should adjust its probability weightings again. The system learns. SpamAssassin doesn’t, and so you have to keep updating it as time goes by, and as spammers learn to circumvent its fixed rules.

So far, it has been pretty good at catching incoming spam, but not quite as successful as SpamAssassin was. That’s probably because I didn’t have many examples of spam lying around in my mail to help it build up its initial profiling database. If you plan to use SpamBayes, I suggest you hang on to some of your recent spam so you can use it for teaching purposes.

An interesting side-effect of using a spam filter on my inbox is that I find incoming spam much more interesting now. Both SpamAssassin and SpamBayes allow you to see what it was that caused them to flag a particular piece of email as spam. SpamAssassin shows you what spam rules were triggered by a junk message, and SpamBayes shows you what words in the email contributed most to its overall Spam probability rating.

Ironically, this geeky fun factor means that I read (some of) my spam a lot more closely than I ever would have before. But in a forensic kind of way. Don’t be thinking that I actually spend time considering whether I really need some more Viagra this month. I keep careful track of my own supply, thanks.

Erm…

Related entries:

Related Links