Google tips off cops after spotting child abuse images in email

patio11 · on Aug 2, 2014

Google, Microsoft, and Facebook have been reported as having a list of image hashes which they very strongly do not want you sending through their systems. If you attempt to do so, that won't work out well for you.

One writeup of many: http://www.microsoft.com/en-us/news/features/2009/dec09/12-1...

bcoates · on Aug 2, 2014

That's a nice, subtle way of fucking with someone: spoofing messages that trip these pedo-sensors won't result in an obvious mistaken raid, because the police will want something more solid to go on (like the case in the OP) -- the victim will just show up as 'pedophile, we just can't prove it yet' in a bunch of police databases/background checks/opposition research, and won't know why the random searches randomly choose them every time.

Ueland · on Aug 2, 2014

I suspect that the "hash" actually is the following info per file: sha1-hash, md5-hash, crc32-hash and file size. If a file matches the file size, hash1 is checked, then hash2 and so on. I will be pretty impressed if you can fake that somehow.

Edit: this is a normal algorithm for files in general, but it appears to be a image-specific algorithm used in this case. (A la imageDNA)

parkaboy · on Aug 2, 2014

I'm actually guessing not. It's probably something similar algorithm-wise to what Shazam does. Probably not a crypto-hash, but more likely a hashing of robust features (in a relative as opposed to absolute location form) detected from the image to give a "fingerprint." Such features would be robust to scaling and most compression artifact issues. EDIT: Here's a good example: http://laplacian.wordpress.com/2009/01/10/how-shazam-works/

tptacek · on Aug 2, 2014

If this is Microsoft's hash algorithm, the one from the robust image hashing paper, it's not perceptual (like Shazam); it's coding-theoretic. If I understand it correctly, it leaves room for false negatives from drastic (but intelligible) image changes in favor of avoiding false positives from attackers trying to deliberately trip it with non-matching images.

lifeisstillgood · on Aug 2, 2014

If you are the sort of person able and willing to screw with someone's life like that, you will have no qualms using your stolen credit cards to buy child porn images to use in this way.

I remember a episode of some legal drama where our hero the judge had his laptop "hacked" and child poem images installed on it. (Luckily it was make believe so he was able to "delete" them before the other judges arrived with the tipoff but that's not the point)

I think we shall soon enter a legal landscape where mere possession of digital products does not imply legal possession. It's going to be an interesting world.

tomjen3 · on Aug 2, 2014

Alternatively you could put feathers from endangered birds inside peoples cars. Mere possession of such feathers is a felony.

There are plenty of ways to screw with people.

lifeisstillgood · on Aug 2, 2014

That probably comes under the "hmmm, weird" federal statutes.

pooper · on Aug 2, 2014

This scares me. What can we do though? Can we decriminalize/legalize possession of anything? Perhaps just make it illegal to produce illicit materials? How would it work? Would that mean bad guys would be able to legally own weapons-grade plutonium if they didn't produce it?

lifeisstillgood · on Aug 2, 2014

It only applies to digital products. If you are found with next doors stolen TV it's not much of a defence to say "I did not put it there", but since any script kiddie from Russia or China can dump bitcoins, porn or military files on your bot netted PC we shall have to see a higher level of proof from prosecuters beyond "well it's on his PC so he must be a MtGox exec / paedophile / spy"

pooper · on Aug 3, 2014

I like this idea. It should also apply to people when someone else mooches on one's home wireless network to download copyrighted material.

euroclydon · on Aug 2, 2014

The article said the hash was effective across resized versions of the same file. Pick the top 50 most prevalent colors by pixel and create a normalized vector representing the distance of each color's center of mass from the center of the image.

Natsu · on Aug 2, 2014

For a photo, you might have to be a bit fuzzier than that, given that things like jpeg recompression can change a lot of that. I assume there are techniques to get around that sort of thing, though, and major software vendors like Google & Microsoft ought to have the sort of people who can solve that problem.

Xorlev · on Aug 2, 2014

Also potentially p-hashes or similar for fuzzy matching the images. Works pretty well for at least narrowing the space of what you're looking for.

Given that Google Image has reverse search, you can almost guarantee that they have approximate hashes.

joeframbach · on Aug 2, 2014

Perhaps this may change if a certain list of 535 people in DC are added to this list.

tptacek · on Aug 2, 2014

There's no such thing as a "pedo-sensor". This is a hash-based search against a corpus of known images, where the hash function was designed explicitly (a) to survive image transformations and (b) to prevent unauthorized parties from predicting the hash of a given image.

Either of those two goals could fail and it would still be implausible that an attacker could trigger a false positive without using an image derived from actual known child pornography.

true_religion · on Aug 2, 2014

I think he means spoofing the message was sent by a particular person, but not actually spoofing the content. The content has to be real to trip the hash-search.

dredmorbius · on Aug 4, 2014

I'm actually aware of systems which were built to identify porn through neural-network training. I don't know if these were sensitive to distinguish child porn from other, but as a method of finding images likely to contain more skin than desired, they worked fairly reliably.

That said, the Google/Microsoft tool does seem to work based on a known image corpus.

slashdev · on Aug 2, 2014

That makes more sense, the article makes it sound like Google engineers were browsing through people's private photos, at least that's what every non-technical person who reads it will take away. It seems obvious to me that it must have been some kind of automated process that flagged the images.

GeneralMayhem · on Aug 2, 2014

That's also what any non-technical person who thinks about for more than a millisecond would realize too, given the volume of Gmail traffic. Unfortunately, a millisecond's thought isn't conducive to fearmongering.

dllthomas · on Aug 2, 2014

A millisecond's thought will tell you that Google engineers aren't looking at every email (or a substantive fraction). That doesn't mean they aren't looking at any email, which is all that is necessary for that interpretation.

tptacek · on Aug 2, 2014

The latter interpretation, if I understand it correctly, is that Google engineers randomly poke at people's emails and in this case just happened to get lucky. Even stipulating thousands of Google engineers doing that, that doesn't sound statistically plausible; the base rate of child pornography among Google Mail users is extremely low.

slashdev · on Aug 2, 2014

That was my thoughts as well. But I don't think most people think that way, especially given the misleading wording of the article. I hate to contribute to the "rest of the world is stupid" theory, or pretend that I'm immune from such things, but in this case I really do think the vast majority of people won't evaluate things in a probabilistic way like that. Most people just aren't trained to think in terms of odds instead of emotions (and it constantly annoys me.) At least that is my experience, and it may not be representative.

Anecdote, I go swimming in a river known to have Caiman (small alligators) downstream and everyone thinks I'm a huge risk taker. Nobody comments on the 3 hour drive to get there and back in heavy rain and high speeds. It's obvious, mathematically speaking, which is the riskier activity.

dredmorbius · on Aug 4, 2014

The base rate of child porn could be low, but the predictors for who might be likely to be a pornographer could be fairly high. With the social graph being a primary predictor, but also, say, things such as browser profile, IP, search and browsing history, locations, communications, and other information. Which Google has on individuals in spades.

So: should Google be performing predictive analysis to make precrime forecasts? Has Google been doing this?

dllthomas · on Aug 2, 2014

I agree that it still seems unlikely, but it requires deeper thought to deduce that than "Google engineers can't possibly be looking at every email".

Pacabel · on Aug 2, 2014

It's easy to dismiss a one-in-a-million, or even a one-in-a-trillion event, as being unimportant when it happens to somebody else, especially when that person isn't directly known. But one's thinking can change very, very quickly when one is directly affected by such an improbable event, be it in positive or negative way.

The impact may seem irrelevant or minor to you, but to those affected it could very well be far more significant.

Excavator · on Aug 2, 2014

On that topic:

http://articles.latimes.com/2010/sep/16/business/la-fi-googl...

http://gawker.com/5637234/gcreep-google-engineer-stalked-tee...

bowlofpetunias · on Aug 2, 2014

A non-technical person will not consider that any different. After all, said automated process was created and deployed by Google engineers, so it's still Google engineers browsing through people's private photos.

Just because I use a tool doesn't mean I'm no longer responsible for my actions.

It's exactly this complete dissociation of responsibility for the consequences why we will be facing an increasing anti-techie backlash.

It's time we stopped hiding behind semi-autistic semantics. Google engineers went through people's private photos. Period. What tools they used to do it is utterly irrelevant.

DanBC · on Aug 2, 2014

> It's time we stopped hiding behind semi-autistic semantics.

Semi-autistic is pretty fucking offensive.

> Google engineers went through people's private photos. Period. What tools they used to do it is utterly irrelevant.

Ignoring for a moment the fact that the whole point of Google is to inspect your traffic in order to serve you ads (and that's been used by competitors in their own adverts before; see Microsoft) there's a big difference between taking[1] image hashes to then compare to hashes of data passing through your network and some person sat at a desk looking at the images.

[1] Probably being given image hashes by law enforcement. A much more interesting question is how fast they can do this? Imagine a set of say 500,000 hashes of the worst images. And global Google traffic. That's a lot of hashing and comparing going on. And are they doing simplistic hash checking, or is it more complicated so that people can't just change a few pixels to evade the hashes?

morsch · on Aug 2, 2014

Wait wait wait, so it's acceptable now that government/LEO give a list of forbidden hashes to service providers with the expectation that they filter their networks and report anybody who's a match? Because that sounds bat shit insane.

Who's to say protest-plans.ppt won't be the next thing on the list? Or justin-bieber.mp3? Or cia-torture-prisons.txt? With a hash, the ISP doesn't even know what the file is about before they detect a match, and having them look closer at matches and decide whether the government needs to know is just as insane.

meowface · on Aug 2, 2014

>Who's to say protest-plans.ppt won't be the next thing on the list? Or justin-bieber.mp3? Or cia-torture-prisons.txt?

Because possession of those files doesn't break federal law, but possession of child pornography does?

And even if those things were added, obviously Google does some sort of rough verification process of the nature of the content being sent if it matches a hash.

Service providers have been sharing image hashes of CP for 6+ years now. It's definitely not just Google.

Your slippery slope makes no sense. This is probably the absolute least invasive thing that Google does. You are aware they show ads based on the actual text content of your emails, right?

morsch · on Aug 2, 2014

I have zero confidence in the LEO/domestic intelligence services caring whether protest-plans.ppt break federal law. And I don't want Google to have to implement some completely opaque "rough verification process" to determine whether the government requested trigger is valid in one case but not in another.

tptacek · on Aug 2, 2014

I always do a sort of mental double-take when I see comments like this. If you have zero confidence in law enforcement caring about the law, then what does any of this matter? What's to discuss? The government is going to do whatever it wants, and there are no policy implications to Google's detection of child pornography. There's nowhere to go from that rhetorical position, so why even bother picking specific things in opposing comments to respond to?

ObviousScience · on Aug 2, 2014

This is nonsense.

It's perfectly consistent to think that law enforcement is willing to cheat when they can get away with it, but that other domestic forces are capable of holding them accountable for that.

Further, it strawmans the position: they do think that law enforcement cares about the law, just not so much as catching lawbreakers who aren't part of the political or policing class, and is willing to cheat the law in order to "fulfill the deeper mission", or some ascribed view like that.

A natural reply to thinking that law enforcement will cheat if it can is to think that other large, entrenched players must force law-enforcement to disclose its actions for review before lending their power to law enforcement goals - which is exactly the stance in insisting that Google not respond to hashed lists, and instead demand to see the files that they're supposed to be blocking.

It increases the number of players necessary to cheat successfully, and hence removes some of the incentive of law enforcement to cheat.

tptacek · on Aug 2, 2014

Your comment would be stronger without the first sentence, but glass houses, stones, &c. Since the rest of it was a good-faith response to my question:

I'm not sure I understand how it's consistent to believe that law enforcement (to be precise: "LEOs and domestic intelligence") are inclined to ignore the law entirely, and believe that there's accountability. If there was accountability, they could not easily incline towards ignoring the law, because doing that would have consequences. Belief in the former condition seems to equate to disbelief of the latter.

I don't think I responded to a straw man argument. I think you rehabilitated a broken argument and then supposed that I was responding to that better argument instead of the one that was actually posed.

Dylan16807 · on Aug 2, 2014

I have pretty good confidence in law enforcement caring about the law in general.

I have quite low confidence in intelligence agencies using a law for its intended purpose. I expect them to stretch and distort what they are allowed to do, and to abuse mass filtering mechanisms. So they should for the most part not be allowed to have mass filtering mechanisms.

To put it more simply: I do not expect the details of these laws to be followed, and 'type of content blocked' is a detail. But at a macro scale the law will be followed.

tptacek · on Aug 2, 2014

Sure. I think I agree with this. However: what's being described here is not a system that can be directly abused by the CIA or NSA to trawl for documents. It is coupled in at least two specific ways with the search for child pornography: (1) it only works for images (from a known corpus), and (2) the corpus is managed not by the USG writ large but by a cooperative effort specific to child pornography.

I see the slope, but the traction on it is pretty solid.

meowface · on Aug 2, 2014

Well, I agree with that to some extent.

I have a bit of faith in companies like Google and Microsoft though. If suddenly their feed of illicit image hashes began to contain content related to politics or protecting government secrets, I think they'd raise hell. I also don't think the FBI would be dumb enough to start pushing that sort of stuff and harming the relationship they have with tech companies.

Now, I can very much see NSA doing that kind of content inspection without the companies knowing about it. And that's not just because we already know it's been happening for years.

But law enforcement != intelligence, or at least it shouldn't. The FBI does still have a corps of good old fashioned cops, even if they're now getting more and more into the intelligence side of things.

tptacek · on Aug 2, 2014

You'll probably feel marginally better to know that the corpus of hashes is managed not by the FBI, but by NCMEC, an NGO.

meowface · on Aug 2, 2014

Ah, that makes even more sense. And it makes it even more annoying that the original article didn't include any of this information (what Google is actually doing to find these, how hashing works, who manages the hashes) just to make a more sensationalist piece.

tptacek · on Aug 2, 2014

Welcome to Club "Tracking Down Primary Sources And Talking About Them On HN Threads"! We have jackets!

morsch · on Aug 5, 2014

A bit late for a reply, but for the record: That's a double straw man. I always do a mental double-take when that happens because now suddenly I feel like I have to defend a position that's twice removed from my own.

I did not say "LEO doesn't care about the law", I said I have no confidence that LEO and domestic intelligence agencies will care whether the documents they scan for are against the law.

It doesn't take much imagination that they can easily justify watching for protest-plans.ppt because of terrrism, both towards themselves and against outside supervision, if there is any. Of course they're not going to charge you for protest-plans.ppt, but it sure is a useful document to have and might even come in handy as character evidence.

Furthermore, even if I had said they don't care about the law preventing them from monitoring communication in this way[0] it's unreasonable to assume that I also think they don't care about any law at all and hence "the government is going to do whatever it wants".

[0] in fact, if LEO is monitoring all mail in this manner, I would hope that they are breaking the law (domestic intelligence can probably do whatever the hell they want); but again, that was not what I had in mind

DanBC · on Aug 2, 2014

> Because possession of those files doesn't break federal law

Depends what country you're in.

> Your slippery slope makes no sense.

I agree. There is a very big difference between a pirated Justin Bieber MP3 file and images of children being raped.

greggman · on Aug 2, 2014

His slippery slope does make sense. Your argument seems to suggest you would think it's okay for the police to show up at your door everyday and go though your files looking for child porn as long as they promise not to care about anything else like that weed you've got or those taxes you didn't file perfectly.

I'm assuming you wouldn't be happy with that .

Google shouldn't be looking in my email in the first place period for anything . The end. I'll give them permission to automate scanning to place ads but that's it. Anything past that is out full stop

meowface · on Aug 2, 2014

You do realize that automation of scanning to place ads is FAR more invasive to your privacy than what's being described in this article?

Scanning if any of the images you sent, when hashed, meet a certain hash is about as blind and automated as you can possibly get. They have no clue of what images you send or what they look like, they just know if you sent an image that is an exact match to an image of child pornography. No one is actually reading through emails and "looking" at the pictures; a program is hashing them all.

DanBC · on Aug 2, 2014

> Google shouldn't be looking in my email in the fiery place period for anything . The end. I'll give them permission to automate scanning to place ads but that's it. Anything past that is out full stop

All services have clauses saying what you're not allowed to do on those services; and that they monitor to ensure that you're not doing that. I'd be amazed if the user agreement you signed when you started using Google (assuming you use Google) didn't have that.

martey · on Aug 2, 2014

> I'll give them permission to automate scanning to place ads but that's it. Anything past that is out full stop

If you look at the relevant section of Google's Privacy Policy - https://www.google.com/intl/en/policies/privacy/#nosharing - you have already given them permission to access your email to "meet any applicable law, regulation, legal process or enforceable governmental request".

waterlesscloud · on Aug 2, 2014

Transmission of classified documents to unauthorized persons does in fact break federal law.

Why wouldn't that also be included, then?

tptacek · on Aug 2, 2014

One reason is that it's extraordinarily expensive to enlist Google and Microsoft in the pursuit of government secrets compared to the alternatives.

The USG (like most F-500 companies) invests a lot of money in patrolling the borders of its networks for breaches of sensitive information. For commercial entities, there's a whole class of product ("data loss prevention" systems) that do this out of the box.

Since (a) the information the USG is watching for is generated from within their network borders, and (b) USG isn't comfortable sharing that information or artifacts of that information with third parties, it doesn't make a lot of sense for them to invest a huge amount of effort (both technically and legally) to come up with some cockamamie scheme to have Google look for those secrets without knowing what the secret is.

Instead of engaging with the intractable problem of having Google dragnet GMail for documents Google is not allowed to read, the USG is much more likely to just subpoena the accounts of people it suspects to be trafficking in those documents.

waterlesscloud · on Aug 2, 2014

Not that I have much faith in the idea that something being extraordinarily expensive prevents the government from pursuing a particular course, but what you say would cover most cases.

But let's say instead of classified documents, we're talking about al-qaeda-training-manual.pdf. One can easily imagine such a document violating federal laws.

Say there are various documents the government catches during a raid, on say Bin Laden's compound for example. Say the government would like Google to look around and see who has trafficked in these documents. They don't have to tell Google what the documents are, exactly, just provide hashes/file sizes/ etc.

Why wouldn't this happen?

tptacek · on Aug 2, 2014

I don't know. Maybe it would. And, my comfort level with that happening depends a whole lot on the circumstances.

im3w1l · on Aug 2, 2014

But it is a lot easier to abuse existing infrastructure, than to build abusive infrastructure from scratch.

tptacek · on Aug 2, 2014

Well, for one thing, no law exists that would criminalize "protest-plans.ppt", and it's unlikely that any law proposed to do that would survive in court. So there's that difference.

ObviousScience · on Aug 2, 2014

Yes, the US government is full of respectable agencies like the CIA, who obviously would never do something like spy on the team investigating their likely criminal actions, and attempt to dynamically reclassify materials so as to stall out the investigation team.

How could a citizen possibly think that government agencies aren't infallibly interested in the citizen's good and question secret block lists?

tptacek · on Aug 2, 2014

The CIA doesn't have the authority to criminalize "protest-plans.ppt". They're part of the executive branch. To criminalize protest plans, Congress would need to pass a law, one that at the very least left room for executive branch rulemaking about which slide decks were or weren't criminal. And, of course, that law would not survive in court.

Of course, you aren't talking about outlawing documents. You're suggesting a different case, one in which hashes of documents are used not for criminal investigations but to keep tabs on citizens. But that wasn't the case I was responding to.

ruloo · on Aug 2, 2014

We don't need the courts to make it illegal. You can damage political adversaries in court of public opinion by leaking other damaging information. protest-plans.ppt could just be a way to identify political threats in order to flag and search other traffic records. we know that the intelligence agencies spy on politicians. A dirty picture or chat affair leaked to the press destroys a user before they even get into an office of power. or detects and removes someone in position to be a whistleblower.

DanBC · on Aug 2, 2014

It only sounds batshit insane if you're used to oppressive misuse of law enforcement. Living in the UK it sounds reasonable to me that when a child abuser is caught with images of child sexual abuse that law enforcment:

1: tries to identify and help the victim

2: tries to identify and punish the perpetrator

3: tries to find and punish other people using these same images

Whilst I disagree with drug laws I think it is sensible (within the context of thise laws) that customs officers inspect packages passing the borders to find contraband.

JoshTriplett · on Aug 2, 2014

This is not 4chan; "autistic" is not an appropriate description for anything other than the actual psychological disorder.

ColinCera · on Aug 2, 2014

What about the word "insane"? Is it okay to use that? Also, what's the rule on using "paranoid" in a non-clinical context? Can I use "schizophrenic" and "narcissistic" and "borderline" colloquially, or is that also offensive?

I just want to make sure I'm following the rules and not offending anybody.

tomjen3 · on Aug 2, 2014

This is the internet. If you are going to be offended over the use of words you are going to have a bad time and should log of right now.

AndrewDucker · on Aug 2, 2014

This is a site on the internet.

Which doesn't mean that it can't have standards of conduct that make it possible to have civilised discussion.

Pacabel · on Aug 2, 2014

When read without putting political correctness above all else, the comment from bowlofpetunias is quite civilized.

I think it brings up some very good points about how non-technical people and technical people can interpret situations involving technology in very different ways, for example. And I think it's actually trying to hold the technical community to a much higher standard when it comes to justifying actions that society at large will very likely disagree with.

It's far more harmful and disruptive to see the perfectly legitimate argument of bowlofpetunias being totally ignored and even obscured, merely because some people find a single word in the comment to potentially be politically incorrect or potentially offensive.

Discussion is much better when the focus is on the argument or topic itself, rather than going off on political correctness tangents over a single word in somebody's comment.

27182818284 · on Aug 2, 2014

As a community, a lot of us want to keep the site to higher standards than other websites. Straight from the guidelines:

"When disagreeing, please reply to the argument instead of calling names. E.g. "That is an idiotic thing to say; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3.""

With respect to the "autistic" comment, the "autistic" could have similarly been dropped and the point made much stronger. Instead most people are going to ignore any points that could have been made±—the comment is already greyed out.

https://news.ycombinator.com/newsguidelines.html

Pacabel · on Aug 2, 2014

When I read that earlier comment, I don't see "semi-autistic semantics" being used as an insult. I think it's actually quite relevant given the context, and appropriate given the argument that was being made.

Defining characteristics of autism (that is, the medical condition) include a deficiency when it comes to social understanding, a tendency to have an extreme focus on minor details, and an inherent inability to grasp the "big picture".

The earlier comment describes how certain attempts to justify what happened in a particular situation end up exhibiting similar traits. These justifications do not correspond well at all with how society at large interprets the situation, and they have a very narrow focus that ignores the larger social aspects of the situation.

I could see it being insulting if the earlier comment labelled somebody as being autistic in an attempt to discredit them, but in this case it's describing certain traits that correlate quite well with the medical condition and how those who suffer from said condition often behave.

If a person who suffers from a poor ability to understand the society around them, and who also tends to be unrelentingly focused on minute details, can be diagnosed as having "autism", then I think a similar diagnosis is perfectly reasonable when applied to an argument or justification that exhibits the same traits. It's a perfectly legitimate way of describing such arguments/justifications that don't mesh well with the social reality.

DanBC · on Aug 2, 2014

It's pure and simple stigma.

Your ignorance of ASD does not excuse piss-poor metaphor.

Pacabel · on Aug 3, 2014

Putting your nonsense accusations and political correctness hypersensitivity aside, can you please provide us with a better single-word term that precisely describes something (such as an argument or a justification) that's so narrowly and uncompromisingly focused so as to totally miss how society at large will interpret it?

And, please, be serious about this. Suggesting words like "intolerant" or "wrong" wouldn't be helpful, obviously. If you have such a problem with the term used previously, provide us with one that offers just as much meaning, but isn't "offensive" to people who are overly sensitive.

DanBC · on Aug 3, 2014

For this example: autistic misses the mark because people with autism will be described as being too literal. Hence books like "It's Raining Cats and Dogs: An Autism Spectrum Guide to the Confusing World of Idioms, Metaphors and Everyday Expressions"

http://www.amazon.co.uk/Its-Raining-Cats-Dogs-Expressions/dp...

> The earlier comment describes how certain attempts to justify what happened in a particular situation end up exhibiting similar traits.

Here's what that comment said: "A non-technical person will not consider that any different. After all, said automated process was created and deployed by Google engineers, so it's still Google engineers browsing through people's private photos."

But this thread is full of people saying that comparing hashes is not anything like engineers looking at your photos, and most of those people do not have ASDisorders, thus claiming only a person with autism would claim such is wrong.

> Putting your nonsense accusations and political correctness hypersensitivity aside

Great way to hold a constructive discussion.

Pacabel · on Aug 4, 2014

Like I said, please put your political correctness to the side so that we can discuss this like adults.

What is an alternative term that conveys the exact same concept that was discussed earlier, but without being deemed "offensive" by people who are particularly sensitive?

If you have a problem with the perfectly legitimate term that was being used earlier, then the constructive thing to do would be for you to provide an equivalent or better alternative.

Wohui · on Aug 4, 2014

In the spirit of HN; this post and your previous one could be shortened to just this one ;)

tomjen3 · on Aug 2, 2014

The only reason it was greyed out was that people got upset and downvoted it.

And while my comment now has -4 it had previously +4 so evidently not everybody agrees with your community standards.

Pacabel · on Aug 2, 2014

This site needs some kind of meta-moderation system, or some other way to moderate the moderation.

What you describe seems to be happening more and more often these days, and I think it's harmful. Comments will receive many upvotes, and then suddenly start getting many downvotes. Or it'll happen the other way around.

When that kind of fluctuation is happening, I think the only appropriate thing to do, especially when downvotes or downvoted comments are involved, is to show the comment normally, instead of obscuring it with harder to read grey text. Clearly there's some agreement, and there's some disagreement, so the only sane thing to do is to show the comment so readers can make their own analyses of it.

krapp · on Aug 2, 2014

I believe the site's answer to that is to have the downvotes by higher karma users count more.

It's not fair, but the staff have mentioned that the site isn't intended to be fair, it's intended to preserve intellectual quality and signal over noise at all costs.

Given the choice between showing a potentially controversial comment by default and censoring it by default, they would probably opt for the latter, because it makes them easier to ignore.

You can still select the text to read it if you want to. Which I guess is slightly better than just automatically deleting them or something.

tptacek · on Aug 3, 2014

Quis meta-arbitrat ipsos meta-arbitri?

icebraining · on Aug 2, 2014

A non-technical person will not consider that any different.

That's not my experience at all. I stopped using Gmail in part due to privacy concerns, and when I explain it to non-techies, they're OK with it specifically because it's software and not people looking at the photos.

People, in my experience, aren't worried about potential false positives or abuses of power. They just don't like the idea of other persons looking at their private pictures.

psykovsky · on Aug 2, 2014

PhotoDNA[0], for the ones interested in knowing a little bit more

[0] http://en.wikipedia.org/wiki/PhotoDNA

onewaystreet · on Aug 2, 2014

Anyone know of a similar technology (generating a hash from an image that is resistant to alterations) that is open source?

krapp · on Aug 2, 2014

If you don't mind php, I wrote this ages ago specifically to catch image spam on forums... the project is more or less dead now because i'm not technically capable of improving it, but it does work (albeit with false positives in some instances): https://github.com/kennethrapp/phasher

You may find the actual algorithm used more interesting (http://phash.org/)

troels · on Aug 2, 2014

I'm fairly sure [OpenCV](http://opencv.org) can do that for you.

dredmorbius · on Aug 4, 2014

The question of what Google might search for, using what hashes, on whom, why and when, and how, comes to mind.

Google didn't "look at pictures". It had a database of known child porn images for which it computed hashes or signatures, which then could be checked, presumably of attachments in emails. This lets Google avoid the matter of "we looked at your specific images". It also means that your random personal photos aren't going to be examined (though their hashes are computed, and possibly retained).

The hashing approach could apply to other aspects of email as well, though. Google could, perhaps, be encouraged to maintain a set of hashes corresponding to musical performances (it uses a similar technology AFAIU for YouTube copyright determination). Or of books. It could even take tuples of word sequences within email, compute hashes, and compare these with known word strings, forwarding relevant information to interested parties should certain sequences occur.

All without "reading" your email.

Where do you draw the line?

Is it "think of the children"?

Do we extend this to "think of the terrorists"? How about seeking out white-collar crime -- a set of account numbers, or key phrases, or transaction sequences?

Or do we look for drugs crimes? Or gang activity? Or political insurrection?

Can different national jurisdictions come up with their own sets of keywords? Say, those associated with specific religious or ethnic groups and activities? Uighurs, or Tibetan independence movement? Boko Haram? Aum Shinrikyo? Provisional IRA? Shining Path? ELF and PETA? Bangsamoro Islamic Freedom Fighters?

Because those are all possible with slight variations on the methods described here.

Or how should (or must) Google respond if users employ encryption to defeat such attempted detection?

lifeisstillgood · on Aug 2, 2014

From the write up

>> Since 2003, NCMEC has reviewed and analyzed almost 30 million images and videos of child pornography

I deeply deeply hope that is not 30M distinct photos ... thats an industrial level output not a few ugly people trafficking in horror.

I was struck by a recent podcast from BBCs More Or Less which tried to estimate number of paedophiles in population - and came up with 1%-5%. The problem was the definition - sexual thoughts, sexual actions with pre-pubescent of pubescent children upto the age of consent. That lumped people who thought Kate Moss' early nude photos were sexy together with people who rape 4 year olds which I don't think helps the debate.

Anyway - I think the prevalence is greater than is generally recognised - but even so - 30 M images seems enormous.

dm2 · on Aug 2, 2014

Here is a very simple overview of how it works: http://www.microsoft.com/global/en-us/news/publishingimages/...

doe88 · on Aug 2, 2014

And what happens if someone (who wish you bad) anonymously send you such images as if he was your longtime buddy? Do they also report you?

higherpurpose · on Aug 2, 2014

This is why we need end-to-end encrypted communications to become mainstream. Why the hell are our service providers becoming Internet police these days? We didn't join Gmail or Facebook to "protect us from the bad guys", nor to have them turn people in for whatever arbitrary "serious crime" they decide they should support. It could be "child abuse" today, and "terrorism" (with whatever vague definition accompanying it) tomorrow.

It's bad enough that the US government and Courts now think they have access to anyone's information in the world, if they use American services. But US service providers acting like they work for the government makes it even worse.

Just stay out of our private communications. How hard can that be, really?

tptacek · on Aug 2, 2014

Really? Detection of known instances of child pornography is the reason we need end-to-end encryption? As an advocate for end-to-end security, I find this disquieting, in the "I think you're doing more harm than good with that argument" sense.

DanBC · on Aug 2, 2014

"These days" - this is nothing new. I'm genuinely confused by the number of people who think that any form of online communication is secure. I understand the outrage about government snooping. But this is a company checking its users do not break the law on the company's service and it is not in any way new.

And having seen what idiot politicians would want I'm glad that services are doing this instead.

DannyBee · on Aug 2, 2014

" Why the hell are our service providers becoming Internet police these days?"

Because laws keep getting passed that ask them to?

reuwsaat · on Aug 2, 2014

Google's actions were legal. The problem is, we, as a country, have still not come to an agreement on what 'privacy' should entail in this brave new world. Given that Google was not breaking the law in reporting this person, I would argue it would have been immoral for them to have sat on this information.

TehCorwiz · on Aug 2, 2014

I do have one question. What if it wasn't child porn? Would it be right for them to go looking through your email for evidence of tax fraud? How about smoking weed? Where do you draw the line? The question isn't "do the ends justify the means?" the question should be "What if they're mistaken?"

EDIT: While I stand by my questions I acknowledge the use of PhotoDNA as opposed to simply rifling through someone's inbox.

lclarkmichalek · on Aug 2, 2014

Your question doesn't seem to be "What if they're mistaken?"; it seems you are more asking "will they stop at child porn?". Anyway, I'd agree with jfoutz, in saying that society has dealt with these problems before, and has managed to create a fairly good line as to where something must be reported, and where a person's privacy becomes more important. That's not to say they will manage to create that line again, but it's also not to say they won't. Regardless, I don't think it's an unavoidable slippery slope.

As for "What if they're mistaken?", well I highly doubt anyone will be convicted off the basis of an automated image recognition tool alone. If they are mistaken from time to time, then the person will probably have a warrant (ha) issued for their email, and any investigation would follow normally. If they're mistaken very often (i.e. the PhotoDNA implementation turns out to be shit), then the police (or whoever is receiving these reports) will probably stop caring, and nothing will have changed.

PhasmaFelis · on Aug 2, 2014

> As for "What if they're mistaken?", well I highly doubt anyone will be convicted off the basis of an automated image recognition tool alone.

"Convicted" isn't the fear, post-9/11. The fear is getting put on a secret government watch list and being harassed for the rest of your life with no chance to ever clear your name, or even to be told why you're being screwed.

deong · on Aug 2, 2014

That's a legitimate concern, but the root problem is the existence of secret government watch lists free of oversight and lacking any transparent or available process for being inadvertently put on one.

If we grant that such a list is wise to have, then sure, let's focus as much effort as possible on making sure that we don't put innocent people on those lists. But I'd much rather focus my efforts on not having the lists in the first place.

lclarkmichalek · on Aug 2, 2014

Isn't that mainly terrorism, not paedophilia? Anyway, at least in this case, the tip was sent to the National Center for Missing and Exploited Children, which, being a private non profit, doesn't seem too closely associated with the NSA/DHS/whatever government organisation you believe maintains these lists.

throwaway0010 · on Aug 2, 2014

"Isn't that mainly terrorism, not paedophilia?"

It's nothing at all. Plenty of innocent people are on those lists having done nothing.

That the pretext is terrorism is irrelevant.

DanBC · on Aug 2, 2014

The UK version - the Internet Watch Foundation - is also a charity but you'd be a foolish UK ISP to ignore what they say.

https://www.iwf.org.uk/

lclarkmichalek · on Aug 2, 2014

The UK equivalent would probably be the NSPCC. If Google had a tip of an individual person distributing these images they would not contact the IWF. That would be for a website that was distributing these images.

DanBC · on Aug 2, 2014

In the UK the IWF provide information about what to block and take reports about hosted images.

CEOP are the group that Google would report people to.

http://ceop.police.uk/

PhasmaFelis · on Aug 2, 2014

> Isn't that mainly terrorism, not paedophilia?

"Terrorism" was just how they learned they could get away with it. Having established that, it's now any damn thing they want.

TehCorwiz · on Aug 2, 2014

I'm not claiming a slippery slope. They already are scanning all emails, we know this for certain.

And as for "What if they're mistaken?" There's a fair amount of fallout from simply being accused of a crime. Between arrest, jail-time waiting for arraignment, bail, legal fees, news coverage which will forever attach your name to whatever it was...etc. The question squarely is "What if they're wrong?"

lclarkmichalek · on Aug 2, 2014

You are claiming slippery slope. They've reported child porn cases, and you are worrying about the possibility they could begin reporting tax fraud and other more minor crimes.

at-fates-hands · on Aug 2, 2014

You mean like the football coach in Minnesota that had some videos of his kids playing after a bath in his cellphone and got arrested for child porn?

http://news.yahoo.com/ex-minnesota-state-mankato-coach-retur...

I'm not sure if there is a ton of oversight for what Google does, but I agree with your point there needs to some kind of vetting to determine guilt beyond a reasonable doubt.

In the case here, the man had quite a history of previous behavior that would make it a pretty clear choice as to what Google needed to do. In the case of the football coach who only had the innocent video of his kids playing, there was some obvious signs that were ignored in the process of steamrolling a mans career, his reputation and his standing in the community where he's lived his whole life.

IanCal · on Aug 2, 2014

Isn't what's happening here quite different? They'd be matching against a know list of images.

tptacek · on Aug 2, 2014

Yes. Robust hashing wouldn't flag a parent's personal pictures of their young kids at bathtime. We aren't discussing nudity detection algorithms.

tptacek · on Aug 2, 2014

They can't use the same technique, or any technique with similar privacy tradeoffs, to look for tax fraud.

The scheme they use here only works with documents known to authorities, whose possession is criminalized. The technique (searches for collisions in a corpus of robust hashes) can't generate new information for authorities about documents they haven't seen. And there is no case in which a person could possess those documents where the government wouldn't have a reasonable interest in knowing that; in other words, there's no valid privacy interest intrinsic in possessing one of the specific documents they're looking for.

None of those conditions exists for tax fraud, or for that matter terrorism.

The slippery slope you're invoking doesn't really exist.

potatosareok · on Aug 2, 2014

Searching for collisions in a corpus of robust hashes seems to me like the post office drug sniffing packages, and people seem to be ok with that. The same way the drug sniff dog won't give away anything other then drug/no drugs (is that how they work? I thought so?), this scheme shouldn't give anything away more then CP/no CP.

At the same time I think that to eliminate CP entirely you need to get rid of some of the freedoms we enjoy. I'm sure you can 100% get rid of CP if you track what everyone is looking at on their computers, but is that a tradeoff you want to make? Even if the filter really only can ever report looking at CP/not looking at CP, would you be comfortable with that running on everything you own?

I could be arguing to a nonsensical extreme, but the NSA tracking all data is following this to some perverted extreme - if we can track EVERYTHING that is going on, and eventually actually make actionable data out of it, we can catch all the criminals/stop crime. But I think we accept the possibility of a bit more crime in exchange for preserving some of our freedoms.

TheOtherHobbes · on Aug 2, 2014

>They can't use the same technique

It's not the technique, it's the precedent. The technology is really not the point here.

>None of those conditions exists for tax fraud, or for that matter terrorism.

Actually they exist for both. Google is only one possible access point for monitoring.

The question is whether we want Internet services of all kinds of to be part of a culture of automated state surveillance.

I'd suggest there are good reasons for answering that question with a firm 'No'.

tptacek · on Aug 2, 2014

I don't think it's about the technique. The "precedent" (set at least 3 years ago, when this was all announced publicly) involves the tradeoffs.

The technique comes into the picture because there is no technique for detecting tax fraud that makes the same tradeoffs.

I don't have any trouble believing simultaneously that we shouldn't have a "culture of automated state surveillance" and that it's OK to sweep image uploads for matches against known child pornography. Just like I had no problem with metal detectors, but do have a big problem with millimeter wave imaging.

krapp · on Aug 2, 2014

Is Google legally liable if it's discovered that they're hosting emails with child porn attached? If so, then does that suggest that Google owns the email hosted on their servers, and has the right to examine them if they want?

icebraining · on Aug 2, 2014

I think Google's only liable if they know about it and didn't report.

According to paragraph (f) of the 18 U.S. Code § 2258A,

Protection of Privacy.— Nothing in this section shall be construed to require an electronic communication service provider or a remote computing service provider to—

(1) monitor any user, subscriber, or customer of that provider;

(2) monitor the content of any communication of any person described in paragraph (1); or

(3) affirmatively seek facts or circumstances described in sections (a) and (b).

http://www.law.cornell.edu/uscode/text/18/2258A?quicktabs_8=...

DannyBee · on Aug 2, 2014

FWIW: Child abuse of most forms is subject to mandatory reporting.

It is highly likely this is not as voluntary as you think. While in the US, they are not required to scan, this is not always true in every other country.

DannyBee · on Aug 2, 2014

FWIW: Child abuse of most forms is subject to mandatory reporting.

It is highly likely this is not as voluntary as you think.

Google would likely have liability if they discovered this and didn't report.

ruloo · on Aug 2, 2014

"Google would likely have liability if they discovered this and didn't report."

I agree. However, they have to look for it to discover it. The provider is safe in ignorance by default. Google has chosen to pursue this activity like an investigative agency and is using a system that seeks out specific material in an automated fashion.

But i agree that now the ignorance is gone they must report it.

Further i think subject nature of the content is distracting to the conversation. The problem IMO is not that google reported content. Its that google is looking for it.

chrismcb · on Aug 2, 2014

Just because something is legal doesn't mean it is right nor that someone should do it. Yes, if Google knew about it they should do something. But Google shouldn't have known.

psykovsky · on Aug 2, 2014

Google won't be breaking the law either when they ask them to report some other stuff that might make a target out of you. I say it's moral for them to report on you when the time comes. You've got nothing to hide, amirite?

pooper · on Aug 2, 2014

How can Google and Microsoft comply with these requests and still claim that the emails they data held in Ireland is exempt?

Moving on, what if blasphemy is illegal in another country and they (Google, Microsoft) spots someone who sent a a private message to someone else where they are making jokes about the prophet of Islam? What about countries ass-bent like India where you can't say anything bad about politicians? Will Google volunteer data about people who make bad jokes about politicians in private?

I know it sounds ridiculous but you're right. Where does the slippery slope end?

Problem is that on the other end... if they don't do this then they will have a PR nightmare. If people can spin their services as a safe haven for pedophiles, they will have a hard time.

reuwsaat · on Aug 2, 2014

Did not know about those laws in India. That is mind-blowing! Not good.

I personally agree with this line of logic. I'm working on a personal solution for moving my data off the cloud. But, that isn't a substitute for having this discussion as a country. Until we draw lines that slippery slope exists and these things are legal.

EDIT: Regarding the safe-haven issue, I think that is a major issue with services like SpiderOak. They play up the ability to hide data. I personally believe we should be aiming for a solution that is closer to the locks on the doors of our homes. I feel secure at night, but I also know the police can knock it down by force if given a warrant. SpiderOak and other such services try to be more of a Fort Knox than a dead bolt. This is the cloud after all. Not a thumb drive in my safe in my home office.

pooper · on Aug 2, 2014

About the safe haven issue, I'd imagine that if there is probable cause, they can just serve me a warrant rather than serving the guard who is on duty at Fort Knox. From what I know, it is wrong to bypass me to get to my data. If my data happens to reside outside the country, well tough luck. Transfer the case over to the other country and stop being a bully.

kumarm · on Aug 2, 2014

Not sure why you are being down voted.

Laws vary by country (Even if they are democratic for 50 year and more) and laws in some developing country appear stupid to any one in a developed country.

US has Freedom of Speech. India has exact opposite. You can get into trouble because no matter what you do, you will offend someone.

true_religion · on Aug 2, 2014

I don't think they're doing this to follow a particular law. They're doing it to follow their own internal ethics about child abuse.

If they start reporting sedition and blasphemy, it'd mean that Google's ethics as a group has fundamentally changed, and we'd see far greater effects than their occasionally reporting someone.

jfoutz · on Aug 2, 2014

Doctors and Lawyers are allowed and even obligated to break confidentiality in some special cases. We'll probably come up with a collection of special protections and special exceptions for remote storage of our data.

Although, it's pretty scary that some random person could send an email that would then send me to jail. I would hope the standard for possession is higher than "WTF is this? [delete]"

icebraining · on Aug 2, 2014

I would hope the standard for possession is higher than "WTF is this? [delete]"

Well, according to the linked article, the guy was sending the picture, not receiving it.

saalweachter · on Aug 2, 2014

I would still hope that "their email account sent an image" is only sufficient to start an investigation and not end it. Email accounts can be broken into, many people have access to the same computer, etc etc.

glynjackson · on Aug 2, 2014

When you use Google (Gmail) you signup to their terms of service. It states that incoming and outgoing emails are analysed by automated software. The article makes it seems like staff at Google are reading peoples emails or spying on its customers. They don't!

lcedp · on Aug 2, 2014

Usually they don't. [1]

[1] http://gawker.com/5637234/gcreep-google-engineer-stalked-tee...

DanBC · on Aug 2, 2014

That case was very bad. It gets mentioned a lot. Is that because it happens all the time, but that's the only case that was reported? Or is it because it's very rare, and that's the only case there is to report?

(I doubt very much that Google has only had one employee who accesses user data inappropriately and I really hope they take severe action when they find it. I would really like to know the rates from similar large companies - FB, MS, etc.)

lcedp · on Aug 2, 2014

I'm not attacking Google in particular. It's just that the statement "Staff at [X] aren't reading peoples emails or spying on its customers" is simply logically false. Because sometimes they do it and we know it. Also, spying by the means of automatic analysis is still spying. (Also, I believe the reason it's done automatically is simply because it's more efficient, not ethics issues)

s_q_b · on Aug 3, 2014

Wait, what? Gmail is reading the content of email messages, and giving them to law enforcement? There are a bevy of constitutional issues with this.

Now, I will preface this by saying that child molesters are despicable, no one will argue that. But because they are so vile, they make great test cases for pushing the boundaries of search and seizure law.

But the problem is that once the courts create a legal rationale to justify Google reading the child molester's email, that rationale can be applied to anyone.

So there's a whole host of defenses that could be raised:

1. The Wiretap Act - prohibits interception of electronic communications by private entity. Even Google's automated scanners, if applied to Gmail, could fall afoul of this law.

2. The Stored Communications Act - The Fourth Amendment creates very limited protections for stored electronic communications like email under the "third party doctrine." The third party doctrine essentially states that you have no privacy rights under the Fourth for any information you voluntarily send to another company or individual. The rationale being that, since you knowingly transmitted it, there is no reasonable expectation of privacy. However, The Stored Communications Act, part of ECPA, creates several statutory quasi-Fourth Amendment rights for stored electronic communications, especially email.

3. Agent of the State This is the weakest argument, but essentially the defendant could argue that Google was acting at the behest of the government, and thus full constitutional protections would attach. The rationale here is that Google and law enforcement were working so hand-in-glove that Google was a de facto state agent. This would make it very messy, as more powerful Constitutional protections would attach.

Now, the bottom line here is that Google is reading content and transmitting it to law enforcement. I'm sure the right to read your email is in their ToS, but that could be successfully challenged if the language is either too narrow or overly broad as to be vague and thus void (two mistakes highly-paid attorneys make quite often.)

EDIT: Forgot the exclusionary rule doesn't apply to evidence obtained illegally by private parties. You'd need to show the public-private nexus to have a chance in the criminal prosecution, but if you can brush Google back off the plate a bit using the private causes of action in #1 and #2, you may be able to get the discovery you need to prove the state-agent argument. For example, I guarantee Google has government Powerpoints floating around about activities certain agencies would like reported.