If the probability was greater than 0.9, the message was considered spam.Īccording to Graham, the advantage of this statistical approach is that it refers to something real – the probability of being spam – and worked with both neutral and spam-indicating words. By examining the top 15 tokens in the header and body of each new email message, he calculated the possibility that it was spam. Graham’s solution was to parse his samples of spam and non-spam into tokens, or individual words, and use Bayesian tools to assign each token the possibility that it indicates spam, biasing them slightly in favor of not being spam to minimize false positives.
After trying to develop filters based on the identifying characteristics of spam, Graham concluded that beyond a certain point, the more rules he added, the more false positives he obtained – that is, the more email messages that were incorrectly identified as spam. However, to make an informed choice between spam filters requires considerably more detail.īogofilter has its roots in “ A Plan for Spam,” a 2002 essay by English developer Paul Graham. The more suspect words contained in an email, the greater the chances it is spam. More specifically, both apply Bayes’ work by collecting words and assigning a probability that each word indicates spam. To call them Bayesian means nothing more than their structure is based on the the 18th century work of Thomas Bayes in statistics and probability. In fact, learning that Bogofilter and SpamAssassin are “Bayesian” is useless for choosing between them. Instead, most users simply nod solemnly when they read that both involve “Bayesian filtering.” Most of us – including many who use the phrase – have no idea what Bayesian filtering is, but it sounds scientific and reassures us that either choice is acceptable. However, what is less often discussed is which filter is the best to use in which circumstances. Although a few other choices (e.g., SpamBayes) are available, when an email reader offers a plugin, it is almost always for either Bogofilter or SpamAssassin.
Other choices, like DSPAM, are no longer in development. These days, the choice of spam filters comes down to Bogofilter and SpamAssassin.