Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you can propose a spam-filtering algorithm which would not be circumvented if its exact implementation were known, I'd seriously love to hear it. That would basically be a magic bullet for all spam.



Spam filtering is a wicked problem. The solutions are contextual, and there's no one single tool that will slay the dragon.

That said, a great many anti-spam solutions work by well-known and publicly available methods. DKIM (header signing) actually utilizes PKI. DNSBLs are publicly queryable (in some cases the zonefiles may be downloaded), Bayesian and rules-based filters are also generally available.

The real challenges are:

1. Spam is cheap. Spam mail outnumbers ham (non-spam mail) by 100:1 or better. There's a lot of it.

2. Distinguishing spam from ham is contextual, and people's contexts differ.

3. False positives are expensive. Wrongly classifying ham as spam carries far worse consequences than wrongly classifying spam as ham (false negatives). Filters must skew to permissive.

4. There's little central agreement on methods, there are many old systems in existence. We've seen a few small advances (DKIM, SPF) in the past decade, but brute-force content filtering is still required.

5. Even well-established strong verification tools are too technically advanced for the vast majority of the userbase, and/or are unappealing to others. PGP MIME-encoded email signatures (strong cryptographic identity verification) dates to 1991, fer crissakes! Getting even corporate-supported users to employ this properly is at best difficult (though it's becoming ever so slightly more common largely due to compliance requirements). For others, repudiability is important.

6. It's an arms race. Spammers change methods (many based on automated tools assuring rapid widespread adoption of new methods) based on new anti-spam methods.

7. Client and server (MUA/MTA) support for tools which would facilitate whitelisting of users and mail peers is difficult. Centralizing mail gateways can complicate the issue if those core gateways emit proportionately high levels of spam (I see or have seen middlin' amounts of spam from Hotmail, Yahoo, GMail, AOL, and other large email service providers, though generally they're pretty good).

That said: whitelisting, reputation systems (sender, server, DKIM, SPF), authentication (DKIM, PGP), contextual (Bayesian), and rules-based (e.g.: SpamAssassin) properly used do make the situation tenable. But this requires extensive support largely for the administrator of an email gateway. End-users may be forgiven for thinking spam is a "solved problem", though at their level it largely is.

What ultimately will solve the email spam problem will be for email to be superseded by another communications channel (SMS, weblogs, social sites, etc.) to the extent that spammers focus their energies there. It's an economic problem, and if the economics fail to support spamming, the (smart) spammers will move elsewhere.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: