This guy really should go work for Google and figure out the problems they need ...

praptak · on Aug 28, 2011

Thank you, this post was good and informative. Nevertheless I think you missed his main point and concentrated on something that was merely incidental to it.

Yes, spam fighting is hard. Yes, it's probably easier with huge centralized installations (he actually observed that at this point the centralization offers advantages over the decentralized model.) But his main point was not about spam nor even about e-mail in general. His point was that it is worth putting the additional effort into making decentralized systems work. This is definitely not what Google are doing.

mkr-hn · on Aug 28, 2011

I think Google could make a big difference by licensing an email antispam API like Automattic does with Akistmet. You'd get the best of both worlds.

amethyst · on Aug 28, 2011

They already do. Its called Postini, and some of the biggest corporations in the US already use it.

rednaught · on Aug 28, 2011

Just an FYI, Postini was an acquisition. There are many companies that provide filtering,spam detection.

jonknee · on Aug 28, 2011

Yes, they bought best of breed spam filtering technology and gave it way for free to the whole world. It was a great acquisition.

rednaught · on Aug 28, 2011

Google charges for Postini's standalone service. Their integration of Postini's function into Gmail and it being "free" in exchange for some of your freedoms is the entire point of this discussion and the mailing list post.

jonknee · on Aug 28, 2011

The OP said they should license it (the spam filtering portion specifically). They do. If you don't want to pay you can also use it for free. Simply put, I don't see his point.

eitally · on Aug 29, 2011

Gmail doesn't use Postini for security/filtering. That's all Google/Gmail.

mkr-hn · on Aug 28, 2011

That's neat. But it's also the first I've heard of it. Do they promote it outside of the business world? I get the impression that they don't think of it as something a CS department or non-profit would use.

jonknee · on Aug 28, 2011

Have you looked for it? It's pretty tied in with their Google Apps offering which is where I've seen it, despite not being in the market for it.

It's no surprise that they concentrate on marketing their business software to businesses though. That's who's going to pay for it. A CS department is going to use whatever the school uses and a non-profit is better served by Google Apps (which is discounted heavily for non-profits).

sunchild · on Aug 28, 2011

Postini is widely used. I wish Apple would slap it onto me.com. My Apple email is a spamfest.

true_religion · on Aug 28, 2011

If a CS department is using it, they'd have it available through the university wide licence.

Also, 80beans is using it and they only have a team of 6.

mrtron · on Aug 28, 2011

Regulated companies use Postini commonly for compliance reasons. Lots of big financial firms are customers.

stevenbedrick · on Aug 28, 2011

I know of several non-profits using Postini as their anti-spam solution.

shailesh · on Aug 28, 2011

ACM uses Postini.

jasonzemos · on Aug 28, 2011

Your solution to admin incompetence is for a centralized service to eliminate the admin. Why can't the service just provide competence? If dozens are people are working round the clock to eliminate spam for your free mail service, why can't they package that and let you control your own data?

The centralized solution you've proposed carried to it's fullest extent is basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other. If spam detection software must remain secretive and proprietary at these big companies, this is basically a capitulation to the spammers.

DavidMcLaughlin · on Aug 28, 2011

One:

The anti-spam systems work because they are based on content of emails and properties across the providers entire user-base. Every time you click "Mark as spam" you are contributing data for all users in the service. In a decentralised service, even if people agreed to submit all their emails and information for the greater good (which they probably wouldn't), the data still needs to be centralised somewhere and secured by experts. The blacklist/whitelist of notorious spammers and servers needs to be maintained somewhere. You end up having a committee to do that, an elected/trusted group of people and they need to deal with appeals, etc.

Two:

If the logic for blocking spam were public, don't you think that would make it much easier for spammers to circumvent?

Edit - I can't reply to the user below. Must be some HN feature. But the logic for accepting an email is essentially a decision tree, it is based on data and evolves over time. It is a very different problem from something like encryption.

edanm · on Aug 28, 2011

Fyi - On HN, after a message is posted, there's a delay before anyone can reply. The farther "down" the message is, the longer the delay. The logic is that this delay will prevent uninteresting back-and-forth flamewars. I'm guessing that's the HN feature you were talking about.

yuvadam · on Aug 28, 2011

One makes lots of sense. Two makes none.

By analogy: "the logic for encrypted two-way communication (e.g. RSA) is public, don't you think that makes it much easier for hackers to intercept your credit card details?".

Enough has been said about security - or spam filtering in this case - by obscurity.

billswift · on Aug 28, 2011

You are drawing an invalid analogy between cryptography and filtering. The only reason cryptography works with open algorithms is that the keys can be kept secret. To a very large extent in filtering the specific algorithms are as analogous to cryptographic keys as they are to the other parts of crypto-systems. That is, filtering algorithms are like very primitive cryptography where there was no separation between the system and the keys.

ehsanu1 · on Aug 28, 2011

If you can propose a spam-filtering algorithm which would not be circumvented if its exact implementation were known, I'd seriously love to hear it. That would basically be a magic bullet for all spam.

dredmorbius · on Aug 28, 2011

Spam filtering is a wicked problem. The solutions are contextual, and there's no one single tool that will slay the dragon.

That said, a great many anti-spam solutions work by well-known and publicly available methods. DKIM (header signing) actually utilizes PKI. DNSBLs are publicly queryable (in some cases the zonefiles may be downloaded), Bayesian and rules-based filters are also generally available.

The real challenges are:

1. Spam is cheap. Spam mail outnumbers ham (non-spam mail) by 100:1 or better. There's a lot of it.

2. Distinguishing spam from ham is contextual, and people's contexts differ.

3. False positives are expensive. Wrongly classifying ham as spam carries far worse consequences than wrongly classifying spam as ham (false negatives). Filters must skew to permissive.

4. There's little central agreement on methods, there are many old systems in existence. We've seen a few small advances (DKIM, SPF) in the past decade, but brute-force content filtering is still required.

5. Even well-established strong verification tools are too technically advanced for the vast majority of the userbase, and/or are unappealing to others. PGP MIME-encoded email signatures (strong cryptographic identity verification) dates to 1991, fer crissakes! Getting even corporate-supported users to employ this properly is at best difficult (though it's becoming ever so slightly more common largely due to compliance requirements). For others, repudiability is important.

6. It's an arms race. Spammers change methods (many based on automated tools assuring rapid widespread adoption of new methods) based on new anti-spam methods.

7. Client and server (MUA/MTA) support for tools which would facilitate whitelisting of users and mail peers is difficult. Centralizing mail gateways can complicate the issue if those core gateways emit proportionately high levels of spam (I see or have seen middlin' amounts of spam from Hotmail, Yahoo, GMail, AOL, and other large email service providers, though generally they're pretty good).

That said: whitelisting, reputation systems (sender, server, DKIM, SPF), authentication (DKIM, PGP), contextual (Bayesian), and rules-based (e.g.: SpamAssassin) properly used do make the situation tenable. But this requires extensive support largely for the administrator of an email gateway. End-users may be forgiven for thinking spam is a "solved problem", though at their level it largely is.

What ultimately will solve the email spam problem will be for email to be superseded by another communications channel (SMS, weblogs, social sites, etc.) to the extent that spammers focus their energies there. It's an economic problem, and if the economics fail to support spamming, the (smart) spammers will move elsewhere.

T-hawk · on Aug 28, 2011

> basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other.

Arguably, that's exactly what Facebook is. Users whitelist each other and use that channel to communicate, skipping email.

rwmj · on Aug 28, 2011

The real solution is political.

If people are bribing insiders at Yahoo to whitelist email servers in Las Vegas, why aren't the insiders and the spammers all in prison?

true_religion · on Aug 28, 2011

It's not illegal to bribe people inside private institutions.

sk5t · on Aug 29, 2011

One may imagine most bribe recipients don't report their bribes (or value of non-cash consideration) to the IRS...

bhickey · on Aug 28, 2011

They can't provide competence in a box because there's no free lunch. What would motivated a free e-mail provider would hand you the keys to the castle? If you want this product, get ready to pay for it.

I also think you're placing a mistaken emphasis on data. It's address books, not your data, that provide lock-in on these services. As far as I know, any of them will let you wrest your e-mail from their claws via IMAP or POP. The hard part is telling your contacts to mail you at <address>@gmail.com instead of <address>@hotmail.com

Full disclosure: I recently accepted a job from Google. My opinions on this matter are mine alone and are not based on any confidential information. I forward e-mail from my own domain to gmail. I also run a mixmaster anonymous remailer.

joeyh · on Aug 28, 2011

"Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office."

This is not a description of your email server being cracked. It's a description of someone Joe-jobbing pretending to send mail from your domain. Duckgo for mitigation techniques..

spudlyo · on Aug 28, 2011

See also: backscatter

http://en.wikipedia.org/wiki/Backscatter_(e-mail)

dredmorbius · on Aug 28, 2011

In fairness, there's not enough information provided to determine which this is, though my suspicion is that OP wouldn't know the difference regardless.

carbonica · on Aug 28, 2011

It is truly disheartening to see you'd been downvoted when I came into this thread.

The truth is, the OP's domain was probably considered to be in a bad "neighborhood" because his mail server had been compromised for spamming purposes at one point or another. It's dreadfully easy to either misconfigure a mail server or to end up with your mail server compromised.

Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems, even when you feel you've been wronged by an algorithm. Gmail has had an unbelievably successful spam filter for years, forcing the competition to rise to the occasion and match it, to the point where people forget how serious a problem spam is. It's not trivial, and it doesn't mean there's a democratic crisis when your e-mails end up in a spam bin. Especially when it's quite likely because your mail server was compromised.

davidw · on Aug 28, 2011

> Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems,

I didn't feel the 'hate'. I read that he didn't particularly care for Google's approach. He certainly says nothing about Google not solving tough problems.

I thought it was a pretty fair piece actually, giving Google credit where it's due, and without trying to demonize them; just stating that he doesn't agree with where they're going.

kragen · on Aug 29, 2011

No, our mail server has never been compromised for spamming purposes. I'm well aware of how easy it is to misconfigure a mail server, and it's not that I think we are too smart or paranoid to have done so; it's just that in the years that we've been struggling with that problem, we've never discovered that misconfiguration, or discovered outgoing spam (other than bounces from e.g. kragen-tol-request.)

I hope I didn't come across as "hating on Google."

jff · on Aug 29, 2011

All it takes to be considered a "bad neighborhood" is to have a dynamic, ISP-owned IP, as I found out when I tried to send mail from my personal server. And yes, I'm too cheap to pay Comcast even more money for a static IP.

kragen · on Aug 29, 2011

That problem is serious, but our server has never been on a dynamic IP or an ISP-owned IP, so it's a slightly different problem than our problem.

abecedarius · on Aug 28, 2011

I run my own mailserver too and saw some similar problems from early on, though not AFAIK with gmail in particular. If it ever has been compromised, I doubt it was right away.

mgkimsal · on Aug 28, 2011

So... "it's too hard so I'll just let google handle all my email". Works fine until people starting blocking google mail because they don't trust them. This isn't "might happen one day" - it happens to me today already. You're just punting on the real issue, kicking the can a few months down the road.

pyre · on Aug 29, 2011

Who's blocking Google? Do you mean everything from Google servers or only @gmail.com addresses? If so it's trivial to get a domain and still be using Google for your email.

mgkimsal · on Aug 29, 2011

google mail servers from what I can tell.

A family member's office is one I know (along with a few others) that are firmly in "MS Exchange" mode, and they've blocked mail from gmail and other google mail servers, because "google's not secure". Of course, they let hotmail mail through just fine :)

zinkem · on Aug 28, 2011

I never got spam before I used gmail. Now maybe this has more to do with timing, but it seems like putting everyone's emails on the same domain just makes things easier for spammers. Seems to me like spam is a problem caused by centralization, not solved by it.

It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well. Gmail probably has a security hole somewhere, too. If gmail's hole is discovered, everyone's emails are compromised (or a large number of people). If a private server gets compromised, there isn't as much there. There's not as much motivation to hack 1000 servers to get 1000 people's information as there is to hack 1 server to get 1000 people's information (although I recognize that one server is going to be a lot harder to crack on average).

I'm open to an education on this topic, as I don't know the methods of modern spammers/crackers.

j_baker · on Aug 28, 2011

It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well.

Google dreams of being able to handle all that information on one server.

Besides that, it's not incredibly common (albeit not impossible) for people to steal information by actually hacking directly into their servers, especially with someone like Google. More likely ways to get at someone's email is through XSS or phishing attacks.