Hacker News new | past | comments | ask | show | jobs | submit login
Google: Transparency for copyright removals in search (googleblog.blogspot.com)
80 points by sjbach on May 24, 2012 | hide | past | favorite | 22 comments



What's amusing is that the takedown notices are a matter of public record, so it's not like the URLs are actually disappearing from the Internet.

Here's a recent example:

http://www.chillingeffects.org/dmca512c/notice.cgi?NoticeID=...

Someone should create a searchable database of these URLs, so that you can search for pirated content that's already been validated as authentic by RIAA/MPAA lawyers. Perhaps that would put a "chilling effect" on Internet censorship.


> Perhaps that would put a "chilling effect" on Internet censorship.

Actually, it would most likely invite new theories of vicarious liability for infringement and make such transparency more legally risky.

That said, reading through the ChillingEffects list is a great way to gather news. For example, you find out about leaked materials rather quickly.


What a fascinating window into this hidden yet enormous neverending game of whack-a-mole.

I was initially surprised that Microsoft dwarfs the RIAA in requests, but of course their software sells at much higher prices than albums. I wonder how popular an artist has to be for the RIAA to consider it worth paying someone to find links and submit requests...

It also sounds like an enormous burden (read: barrier to entry) for search engines. I'm sure Google is constantly optimizing just how much of the process it can automate. (ex. If a submitter has had X requests approved for Y domain, remove it automatically?) I love how they let webmasters know about it, though, to remove fears of false positives.


"As a percentage of site's URLs" is a disingenuous statistic. Google has a notion of which URLs on a site are germane; for instance, how likely they are to come up in searches. Reported URLs on TORRENTZ.EU as a percentage of germane URLs will probably tell a different tale than "<0.1%".

Put differently: you'd have to be made of stupid to have a ratio of infringing URLs to overall URLs that looked unfavorable; all you have to do to minimize that metric is to spray crap all over some portion of your site that nobody but Googlebot cares about.


I think I misunderstood you, because it sounds like you're implying that these sites have "spray[ed] crap" all over their sites to look better in this newly released report?

You may know more than I do about the existence of pages of crap on torrentz.eu because I have never been there, but it's rather difficult for me to believe that anyone is intentionally gaming a report they had no way of knowing the existence of until Google's announcement.


http://torrentz.eu/i

You have to dig not to find copyright-infringing links at the root of TORRENTZ.EU's index.

At yet the statistic on Google's summary page suggests that TORRENTZ.EU is primarily --- no, overwhelmingly --- a non-infringing site. And that is the reason the statistic is there: to put forward that argument.

We don't have to agree on the policy debate here, but let's at least call spades spades.


I never said there wasn't infringing content there. I had never been there, after all, though I just took a quick peek after you linked to it just now. Nothing loaded. That's probably due to noscript, which I wouldn't dare turn off on a site like that.

I still think it unlikely that anyone is gaming that metric. It might be a bad one for whatever reason, but I can't imagine that anyone has been gaming it given that nobody knew of it until just now.


A metric that says the overwhelming majority of TORRENTZ.EU is noninfringing is a bogus metric. My point is, I think, pretty clear. We can debate whether infringement is worth caring about, but no reasonable debate suggests that TORRENTZ.EU is mostly rightsholder-neutral.


I've never argued for it being a good metric, only against it being intentionally gamed by any of the sites.


http:// torrentz.eu/675ebffc3f6a94c8571b2a2f4e2fb19d93a863


I had to dig into the FAQ [1] to find what I wanted:

"We removed 97% of search results specified in requests that we received between July and December 2011."

It doesn't say anything about how the requests must be formatted and if they are legally enforceable. i.e. Can just anyone submit a request? Does it have to include any kind of evidence?

[1]http://www.google.com/transparencyreport/removals/copyright/...


The form is linked to multiple times in the same FAQ. If you want the DMCA requirements, you can find them here:

http://www.law.cornell.edu/uscode/text/17/512#c_3

edit: I'm not exactly sure what you're asking with "legally enforceable", but again, if you mean in terms of the DMCA, my understanding is that if a notification satisfies those requirements, you must comply. That's why the counter-notification process is so important.


Small correction: You must comply if you intend to keep yourself protected by the safe-harbor provisions specified in the DMCA. If you do not comply, you then become exposed to liability, though not necessarily guilty of infringement.


I don't know about the enforcement/evidence, but the end of the original link sheds some light, sounds like they have a pretty mature process for trying to handle illegitimate requests:

> we try to catch erroneous or abusive removal requests.... [examples of bad requests]... We try to catch these ourselves, but we also notify webmasters in our Webmaster Tools when pages on their website have been targeted by a copyright removal request, so that they can submit a counter-notice if they believe the removal request was inaccurate.


Google has a form you fill out that makes you put in all of the proper info. The DMCA itself spells out the requirements for what you must include and it does have some provisions that describe what happens if the notice is only partially defective.

Formatting isn't really an issue, the real issues are whether the notice contains the proper information and whether it's sent to the right place.

As always, get proper legal advice if any of this information is of more than academic interest to you.


Link for the lazy like me, who couldn't find the page, initially:

http://www.google.com/transparencyreport/removals/copyright/


I'm building an app that will have a lot of user generated content. How much should I worry about implementing something similar to this on our system before going on air? This sounds extremely cumbersome and expensive to operate. I don't think we could afford this initially.

Should we only bother with it after it becomes a problem (hopefully by then we'll have enough revenue to afford this)? Or is it too risky to launch without such system and get brought down ourselves?


Megaupload is a good cautionary tale. IP 'holders' have enough power to do bad things if they can cast you as a bad actor, so far better to have some kind of take down support that works for the country you operate in.

Manual processing of requests is probably fine for many launching apps (unless there is the possibility of automated content creation on the part of your users), which also lets you be really careful with complying with applicable laws while (crucially) maintaining the rights of your users.

But, yes, if you're in the US and want to maintain safe harbor protections, you will have to handle any takedown requests you receive.


If you're launching a site (or ISP), you will need to register a designated DMCA agent with the US copyright office. You'll then be listed here: http://www.copyright.gov/onlinesp/list/a_agents.html The fee is $35 and there is a form you'll fill out on copyright.gov. They don't allow PO Boxes, you will need an address that is available for service of process.

If you don't register an agent, you are not protected by the DMCA safe harbor provisions.

Once you've got a registered agent, you'll need to internally create a DMCA notification procedure that you'll follow once you receive them. The system is very simple and extremely fair to ISPs/sites. Talk to a lawyer, but most don't know the DMCA well and you can find this stuff out on your own generally.


It's too risky to not launch before doing this.

It's far more likely that no one will even notice your site, rather than spammers start piling on. Implement basic logging & monitoring and you'll be fine. It may suck when/if it comes, but scaling & abuse defense is usually a good problem to have. Worst case, you've got a completely idle server!


That's something you should get proper legal advice about. I'm in the same boat, actually, and plan to talk to someone once my friend gives me a referral.


Number of "user data requests" by governments is scary. What do the government do with the user's data?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: