This guy really should go work for Google and figure out the problems they need to deal with running a service like Gmail. Even for just a little while.
At work we had a researcher from Yahoo Mail come in and give a presentation on the machine learning techniques they use to try and stop spammers abusing their mail servers. It was eye-opening to learn just what kind of hourly battle they face to keep spam out of their systems and the ways they are trying to combat it. It was even more enlightening when the presenter told stories about the problems that machine learning can't solve - like people within the company being bribed to whitelist spam companies based in Vegas.
On the surface it's such a simple problem, and I'm sure anyone who's tried to prevent their web application's outgoing mail being marked as spam by the evil corporations of Yahoo and Google will have had the desire to go write a blog post saying what a crock of shit the whole thing is and how they would never take part in that. But here's the thing - those systems are in place because if they weren't, email would be a completely useless form of communication at this point.
The people sending spam make _millions_ of dollars abusing a system which is popular because its open and based on trust. That kind of money combined with greed gives people all different levels of drive and incentive to get their emails about bigger penises and viagra through to your inbox. Every time they prevent one form of attack, these guys will create a new one.
To do this they do things like install mail servers on unsuspecting user's machines, specifically targeting Yahoo/Hotmail/Google users because their IP will obviously need to be trusted by those companies. They will also hack into other people's private mail servers. They will spoof email headers and pretend they're someone else. They will hire people, experts, who will find new ways of breaking in to servers they detect as having mail servers running on them. All this just to get past the spam filters and prevention that make email a useful form of communication to begin with.
And let's forget the people who couldn't set up their own mail server for just a second. I like to think I know what I'm doing. After installing Postfix and jumping through all the hoops to get my emails whitelisted by Gmail and making sure I didn't have an open relay on my mail server, you know what happened? Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office.
Now, I could have worked hard to fight this. I could have did something other than changing my passwords and hoping they didn't get crack them again. But the point is - I only ran a mailserver to get email delivered to me on my personal domain. I didn't want to have to fight and battle and dedicate myself to solving this problem. I wanted to take this thing for granted. I just wanted to send and receive email. Instead bad people could not only sit there and read all my incoming mail - but they could use my server to spam people and get me blacklisted and blocked from so many other services I worked so hard to be trusted by. And they did all this without even specifically targeting me. I was a statistic to them, someone who simply didn't know what they know. In the end, I moved my personal mail account to Google Apps, free of charge. Problem solved.
By using Gmail or Yahoo Mail or Hotmail - you are almost definitely more secure than setting up your own mailserver. You have people paid hundreds of thousands of dollars a year working full time to make sure your data is secure. I mean if privacy is your reason not to use Gmail, then I hope for your sake your mail server is secure. Maybe you think it is. I know I did too.
And all these people complaining about advertisements based on the content of their emails. Yahoo Mail had a team of like 30 people just doing _research_ on how to stop spammers. Then all these other people working on support. How does that service get provided to us _free of charge_ without advertisements or some sort of monetisation? I know in some people's heads they think it's literally just a Bayesian classifier and some hand-coded rules, but it's so beyond that.
And of course, let's not forget the fact that a lot of people would not be able to set up their own mail server anyway. Maybe you don't need them, but Hotmail, Gmail and Yahoo Mail enable hundreds of millions of people to communicate _for free_ with other people around the world that otherwise wouldn't be technically competent enough to buy a domain name and set up a local mail server. It lets you communicate with them too, because they don't get frustrated wading through hundreds of spam emails just to read the good stuff.
And that system only works because we have good guys that are fighting the bad guys who want to ruin it for the rest of us. And this is just the one example of email. Which has all this decentralised and open properties that you desire. I am reminded of Diaspora when they released a first beta of their code and it got absolutely torn to shreds for security reasons, and we haven't heard much since.
The real world sucks.
That's why I think it might be a good idea for you to go work for Google.
Thank you, this post was good and informative. Nevertheless I think you missed his main point and concentrated on something that was merely incidental to it.
Yes, spam fighting is hard. Yes, it's probably easier with huge centralized installations (he actually observed that at this point the centralization offers advantages over the decentralized model.) But his main point was not about spam nor even about e-mail in general. His point was that it is worth putting the additional effort into making decentralized systems work. This is definitely not what Google are doing.
Google charges for Postini's standalone service. Their integration of Postini's function into Gmail and it being "free" in exchange for some of your freedoms is the entire point of this discussion and the mailing list post.
The OP said they should license it (the spam filtering portion specifically). They do. If you don't want to pay you can also use it for free. Simply put, I don't see his point.
That's neat. But it's also the first I've heard of it. Do they promote it outside of the business world? I get the impression that they don't think of it as something a CS department or non-profit would use.
Have you looked for it? It's pretty tied in with their Google Apps offering which is where I've seen it, despite not being in the market for it.
It's no surprise that they concentrate on marketing their business software to businesses though. That's who's going to pay for it. A CS department is going to use whatever the school uses and a non-profit is better served by Google Apps (which is discounted heavily for non-profits).
Your solution to admin incompetence is for a centralized service to eliminate the admin. Why can't the service just provide competence? If dozens are people are working round the clock to eliminate spam for your free mail service, why can't they package that and let you control your own data?
The centralized solution you've proposed carried to it's fullest extent is basically eliminating email altogether, where a small cabal of whitelisted services are only able to pass messages to each other. If spam detection software must remain secretive and proprietary at these big companies, this is basically a capitulation to the spammers.
The anti-spam systems work because they are based on content of emails and properties across the providers entire user-base. Every time you click "Mark as spam" you are contributing data for all users in the service. In a decentralised service, even if people agreed to submit all their emails and information for the greater good (which they probably wouldn't), the data still needs to be centralised somewhere and secured by experts. The blacklist/whitelist of notorious spammers and servers needs to be maintained somewhere. You end up having a committee to do that, an elected/trusted group of people and they need to deal with appeals, etc.
Two:
If the logic for blocking spam were public, don't you think that would make it much easier for spammers to circumvent?
Edit - I can't reply to the user below. Must be some HN feature. But the logic for accepting an email is essentially a decision tree, it is based on data and evolves over time. It is a very different problem from something like encryption.
Fyi - On HN, after a message is posted, there's a delay before anyone can reply. The farther "down" the message is, the longer the delay. The logic is that this delay will prevent uninteresting back-and-forth flamewars. I'm guessing that's the HN feature you were talking about.
By analogy: "the logic for encrypted two-way communication (e.g. RSA) is public, don't you think that makes it much easier for hackers to intercept your credit card details?".
Enough has been said about security - or spam filtering in this case - by obscurity.
You are drawing an invalid analogy between cryptography and filtering. The only reason cryptography works with open algorithms is that the keys can be kept secret. To a very large extent in filtering the specific algorithms are as analogous to cryptographic keys as they are to the other parts of crypto-systems. That is, filtering algorithms are like very primitive cryptography where there was no separation between the system and the keys.
If you can propose a spam-filtering algorithm which would not be circumvented if its exact implementation were known, I'd seriously love to hear it. That would basically be a magic bullet for all spam.
Spam filtering is a wicked problem. The solutions are contextual, and there's no one single tool that will slay the dragon.
That said, a great many anti-spam solutions work by well-known and publicly available methods. DKIM (header signing) actually utilizes PKI. DNSBLs are publicly queryable (in some cases the zonefiles may be downloaded), Bayesian and rules-based filters are also generally available.
The real challenges are:
1. Spam is cheap. Spam mail outnumbers ham (non-spam mail) by 100:1 or better. There's a lot of it.
2. Distinguishing spam from ham is contextual, and people's contexts differ.
3. False positives are expensive. Wrongly classifying ham as spam carries far worse consequences than wrongly classifying spam as ham (false negatives). Filters must skew to permissive.
4. There's little central agreement on methods, there are many old systems in existence. We've seen a few small advances (DKIM, SPF) in the past decade, but brute-force content filtering is still required.
5. Even well-established strong verification tools are too technically advanced for the vast majority of the userbase, and/or are unappealing to others. PGP MIME-encoded email signatures (strong cryptographic identity verification) dates to 1991, fer crissakes! Getting even corporate-supported users to employ this properly is at best difficult (though it's becoming ever so slightly more common largely due to compliance requirements). For others, repudiability is important.
6. It's an arms race. Spammers change methods (many based on automated tools assuring rapid widespread adoption of new methods) based on new anti-spam methods.
7. Client and server (MUA/MTA) support for tools which would facilitate whitelisting of users and mail peers is difficult. Centralizing mail gateways can complicate the issue if those core gateways emit proportionately high levels of spam (I see or have seen middlin' amounts of spam from Hotmail, Yahoo, GMail, AOL, and other large email service providers, though generally they're pretty good).
That said: whitelisting, reputation systems (sender, server, DKIM, SPF), authentication (DKIM, PGP), contextual (Bayesian), and rules-based (e.g.: SpamAssassin) properly used do make the situation tenable. But this requires extensive support largely for the administrator of an email gateway. End-users may be forgiven for thinking spam is a "solved problem", though at their level it largely is.
What ultimately will solve the email spam problem will be for email to be superseded by another communications channel (SMS, weblogs, social sites, etc.) to the extent that spammers focus their energies there. It's an economic problem, and if the economics fail to support spamming, the (smart) spammers will move elsewhere.
They can't provide competence in a box because there's no free lunch. What would motivated a free e-mail provider would hand you the keys to the castle? If you want this product, get ready to pay for it.
I also think you're placing a mistaken emphasis on data. It's address books, not your data, that provide lock-in on these services. As far as I know, any of them will let you wrest your e-mail from their claws via IMAP or POP. The hard part is telling your contacts to mail you at <address>@gmail.com instead of <address>@hotmail.com
Full disclosure: I recently accepted a job from Google. My opinions on this matter are mine alone and are not based on any confidential information. I forward e-mail from my own domain to gmail. I also run a mixmaster anonymous remailer.
"Someone managed to hack in by brute force anyway. I only
noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people
that were out of office."
This is not a description of your email server being cracked. It's a description of someone Joe-jobbing pretending to send mail from your domain. Duckgo for mitigation techniques..
In fairness, there's not enough information provided to determine which this is, though my suspicion is that OP wouldn't know the difference regardless.
It is truly disheartening to see you'd been downvoted when I came into this thread.
The truth is, the OP's domain was probably considered to be in a bad "neighborhood" because his mail server had been compromised for spamming purposes at one point or another. It's dreadfully easy to either misconfigure a mail server or to end up with your mail server compromised.
Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems, even when you feel you've been wronged by an algorithm. Gmail has had an unbelievably successful spam filter for years, forcing the competition to rise to the occasion and match it, to the point where people forget how serious a problem spam is. It's not trivial, and it doesn't mean there's a democratic crisis when your e-mails end up in a spam bin. Especially when it's quite likely because your mail server was compromised.
> Regardless, it's easy to hate on Google, especially in a primarily entrepreneurial forum where those posting are often trying to solve tough problems with far fewer resources. But Google is solving tough problems,
I didn't feel the 'hate'. I read that he didn't particularly care for Google's approach. He certainly says nothing about Google not solving tough problems.
I thought it was a pretty fair piece actually, giving Google credit where it's due, and without trying to demonize them; just stating that he doesn't agree with where they're going.
No, our mail server has never been compromised for spamming purposes. I'm well aware of how easy it is to misconfigure a mail server, and it's not that I think we are too smart or paranoid to have done so; it's just that in the years that we've been struggling with that problem, we've never discovered that misconfiguration, or discovered outgoing spam (other than bounces from e.g. kragen-tol-request.)
I hope I didn't come across as "hating on Google."
All it takes to be considered a "bad neighborhood" is to have a dynamic, ISP-owned IP, as I found out when I tried to send mail from my personal server. And yes, I'm too cheap to pay Comcast even more money for a static IP.
I run my own mailserver too and saw some similar problems from early on, though not AFAIK with gmail in particular. If it ever has been compromised, I doubt it was right away.
So... "it's too hard so I'll just let google handle all my email". Works fine until people starting blocking google mail because they don't trust them. This isn't "might happen one day" - it happens to me today already. You're just punting on the real issue, kicking the can a few months down the road.
Who's blocking Google? Do you mean everything from Google servers or only @gmail.com addresses? If so it's trivial to get a domain and still be using Google for your email.
A family member's office is one I know (along with a few others) that are firmly in "MS Exchange" mode, and they've blocked mail from gmail and other google mail servers, because "google's not secure". Of course, they let hotmail mail through just fine :)
I never got spam before I used gmail. Now maybe this has more to do with timing, but it seems like putting everyone's emails on the same domain just makes things easier for spammers. Seems to me like spam is a problem caused by centralization, not solved by it.
It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well. Gmail probably has a security hole somewhere, too. If gmail's hole is discovered, everyone's emails are compromised (or a large number of people). If a private server gets compromised, there isn't as much there. There's not as much motivation to hack 1000 servers to get 1000 people's information as there is to hack 1 server to get 1000 people's information (although I recognize that one server is going to be a lot harder to crack on average).
I'm open to an education on this topic, as I don't know the methods of modern spammers/crackers.
It also seems like putting everyone's information in one place makes it easier for hackers to harvest, as well.
Google dreams of being able to handle all that information on one server.
Besides that, it's not incredibly common (albeit not impossible) for people to steal information by actually hacking directly into their servers, especially with someone like Google. More likely ways to get at someone's email is through XSS or phishing attacks.
At work we had a researcher from Yahoo Mail come in and give a presentation on the machine learning techniques they use to try and stop spammers abusing their mail servers. It was eye-opening to learn just what kind of hourly battle they face to keep spam out of their systems and the ways they are trying to combat it. It was even more enlightening when the presenter told stories about the problems that machine learning can't solve - like people within the company being bribed to whitelist spam companies based in Vegas.
On the surface it's such a simple problem, and I'm sure anyone who's tried to prevent their web application's outgoing mail being marked as spam by the evil corporations of Yahoo and Google will have had the desire to go write a blog post saying what a crock of shit the whole thing is and how they would never take part in that. But here's the thing - those systems are in place because if they weren't, email would be a completely useless form of communication at this point.
The people sending spam make _millions_ of dollars abusing a system which is popular because its open and based on trust. That kind of money combined with greed gives people all different levels of drive and incentive to get their emails about bigger penises and viagra through to your inbox. Every time they prevent one form of attack, these guys will create a new one.
To do this they do things like install mail servers on unsuspecting user's machines, specifically targeting Yahoo/Hotmail/Google users because their IP will obviously need to be trusted by those companies. They will also hack into other people's private mail servers. They will spoof email headers and pretend they're someone else. They will hire people, experts, who will find new ways of breaking in to servers they detect as having mail servers running on them. All this just to get past the spam filters and prevention that make email a useful form of communication to begin with.
And let's forget the people who couldn't set up their own mail server for just a second. I like to think I know what I'm doing. After installing Postfix and jumping through all the hoops to get my emails whitelisted by Gmail and making sure I didn't have an open relay on my mail server, you know what happened? Someone managed to hack in by brute force anyway. I only noticed because of the _millions_ of automated replies that were coming in every day from dead email accounts or people that were out of office.
Now, I could have worked hard to fight this. I could have did something other than changing my passwords and hoping they didn't get crack them again. But the point is - I only ran a mailserver to get email delivered to me on my personal domain. I didn't want to have to fight and battle and dedicate myself to solving this problem. I wanted to take this thing for granted. I just wanted to send and receive email. Instead bad people could not only sit there and read all my incoming mail - but they could use my server to spam people and get me blacklisted and blocked from so many other services I worked so hard to be trusted by. And they did all this without even specifically targeting me. I was a statistic to them, someone who simply didn't know what they know. In the end, I moved my personal mail account to Google Apps, free of charge. Problem solved.
By using Gmail or Yahoo Mail or Hotmail - you are almost definitely more secure than setting up your own mailserver. You have people paid hundreds of thousands of dollars a year working full time to make sure your data is secure. I mean if privacy is your reason not to use Gmail, then I hope for your sake your mail server is secure. Maybe you think it is. I know I did too.
And all these people complaining about advertisements based on the content of their emails. Yahoo Mail had a team of like 30 people just doing _research_ on how to stop spammers. Then all these other people working on support. How does that service get provided to us _free of charge_ without advertisements or some sort of monetisation? I know in some people's heads they think it's literally just a Bayesian classifier and some hand-coded rules, but it's so beyond that.
And of course, let's not forget the fact that a lot of people would not be able to set up their own mail server anyway. Maybe you don't need them, but Hotmail, Gmail and Yahoo Mail enable hundreds of millions of people to communicate _for free_ with other people around the world that otherwise wouldn't be technically competent enough to buy a domain name and set up a local mail server. It lets you communicate with them too, because they don't get frustrated wading through hundreds of spam emails just to read the good stuff.
And that system only works because we have good guys that are fighting the bad guys who want to ruin it for the rest of us. And this is just the one example of email. Which has all this decentralised and open properties that you desire. I am reminded of Diaspora when they released a first beta of their code and it got absolutely torn to shreds for security reasons, and we haven't heard much since.
The real world sucks.
That's why I think it might be a good idea for you to go work for Google.