One: The anti-spam systems work because they are based on content of emails and ...

edanm · on Aug 28, 2011

Fyi - On HN, after a message is posted, there's a delay before anyone can reply. The farther "down" the message is, the longer the delay. The logic is that this delay will prevent uninteresting back-and-forth flamewars. I'm guessing that's the HN feature you were talking about.

yuvadam · on Aug 28, 2011

One makes lots of sense. Two makes none.

By analogy: "the logic for encrypted two-way communication (e.g. RSA) is public, don't you think that makes it much easier for hackers to intercept your credit card details?".

Enough has been said about security - or spam filtering in this case - by obscurity.

billswift · on Aug 28, 2011

You are drawing an invalid analogy between cryptography and filtering. The only reason cryptography works with open algorithms is that the keys can be kept secret. To a very large extent in filtering the specific algorithms are as analogous to cryptographic keys as they are to the other parts of crypto-systems. That is, filtering algorithms are like very primitive cryptography where there was no separation between the system and the keys.

ehsanu1 · on Aug 28, 2011

If you can propose a spam-filtering algorithm which would not be circumvented if its exact implementation were known, I'd seriously love to hear it. That would basically be a magic bullet for all spam.

dredmorbius · on Aug 28, 2011

Spam filtering is a wicked problem. The solutions are contextual, and there's no one single tool that will slay the dragon.

That said, a great many anti-spam solutions work by well-known and publicly available methods. DKIM (header signing) actually utilizes PKI. DNSBLs are publicly queryable (in some cases the zonefiles may be downloaded), Bayesian and rules-based filters are also generally available.

The real challenges are:

1. Spam is cheap. Spam mail outnumbers ham (non-spam mail) by 100:1 or better. There's a lot of it.

2. Distinguishing spam from ham is contextual, and people's contexts differ.

3. False positives are expensive. Wrongly classifying ham as spam carries far worse consequences than wrongly classifying spam as ham (false negatives). Filters must skew to permissive.

4. There's little central agreement on methods, there are many old systems in existence. We've seen a few small advances (DKIM, SPF) in the past decade, but brute-force content filtering is still required.

5. Even well-established strong verification tools are too technically advanced for the vast majority of the userbase, and/or are unappealing to others. PGP MIME-encoded email signatures (strong cryptographic identity verification) dates to 1991, fer crissakes! Getting even corporate-supported users to employ this properly is at best difficult (though it's becoming ever so slightly more common largely due to compliance requirements). For others, repudiability is important.

6. It's an arms race. Spammers change methods (many based on automated tools assuring rapid widespread adoption of new methods) based on new anti-spam methods.

7. Client and server (MUA/MTA) support for tools which would facilitate whitelisting of users and mail peers is difficult. Centralizing mail gateways can complicate the issue if those core gateways emit proportionately high levels of spam (I see or have seen middlin' amounts of spam from Hotmail, Yahoo, GMail, AOL, and other large email service providers, though generally they're pretty good).

That said: whitelisting, reputation systems (sender, server, DKIM, SPF), authentication (DKIM, PGP), contextual (Bayesian), and rules-based (e.g.: SpamAssassin) properly used do make the situation tenable. But this requires extensive support largely for the administrator of an email gateway. End-users may be forgiven for thinking spam is a "solved problem", though at their level it largely is.

What ultimately will solve the email spam problem will be for email to be superseded by another communications channel (SMS, weblogs, social sites, etc.) to the extent that spammers focus their energies there. It's an economic problem, and if the economics fail to support spamming, the (smart) spammers will move elsewhere.