I get roughly about 100 spam messages. My personal account is actually a combination of two e-mail accounts—university e-mail is forwarded to my personal account as well. Another thing is that my name is not your conventional Finnish name, so simple bots with name databases hit my address more likely. However, the main reason is probably that my address has been the same for a decade (I have my own domain), and I’ve had my address in several web pages for years.

So, I get lots of spam. The amount is so high that I can’t do without superb spam filtering software—and Gmail seems to be doing quite well nowadays. It is en par with my highly learned CRM114 filter, maybe even better, which I used to have before switching to Gmail.

There’s Only One Complaint.

When checking if Gmail spam filter has resulted in any false positives (ie. legit e-mail considered spam), I have to wade through my spam folder. Which contains, at the moment, something like 4000 messages. I don’t want to do that, so I risk losing important e-mail which Gmail thought was spam.

The funny thing is that all spam filter software I have used (Spamassassin, Bogofilter, CRM114) allowed me to see score denoting “spamicity” of the message. Bogofilter even allowed me to use tri-stage classification where each message would be classified as either spam, unsure or non-spam. The net result of those features was that it was easier to pick up false positives, because they either ended up in unsure category (I had my own folder for such e-mals) which used to have very few messages, or they were found out easily by sorting all e-mail messages by spam score. Those having very low non-negative spam score used to contain most of the false positives, if there were any.

However, even though I’m quite assured of Google incorporating something very bleeding-edge, Bayesian, Markov -chaining filtering with inoculation and possibly sparse binary polynomial hashes (I didn’t make that last one up: it’s the algorithm behind CRM114, designed and coined by the author), GMail doesn’t allow me to sort spam according to this criteria. Actually it doesn’t allow me to sort messages by any header, which is rather understandable, thinking how much more resource-intensive it would be for their servers (for the first month or so, I bet users would sort all their messages by each column in turn, just because they could).

But Google wouldn’t have to offer us sorting for that. Just a feature which would allow user to se eg. 10 least spammy messages which were still regarded as spam would be enough. Pretty please?

Leave a Reply