1. Create fake url endpoint. And go to that endpoint in the adversary's website,...

mkoryak · on Dec 12, 2022

I like how you think. These are all great ideas!

Reminds me of a time some real estate website hotlinked a ton of images from my website. After I asked them to stop and they ignored me I added an nginx rewrite rule to send them a bunch of pictures of houses that were on fire.

For some reason they stopped using my website as their image host after that.

smaudet · on Dec 12, 2022

Is the primary motivator to do this?

I'm curious if they are stealing anything else, e.g. are they selling ads/tracking, do they replace order forms with their own...

mkoryak · on Dec 12, 2022

because I asked them to stop doing it, and they didn't. Technically they were stealing my bandwidth.

Also to teach them an important lesson about the internet.

Firmwarrior · on Dec 12, 2022

haha, they're just lucky you didn't introduce them to Goatse

mkoryak · on Dec 12, 2022

well actually...

there was another time a site hotlinked to a js file. After asking them to stop, i found that they had a contact form with a homebrew captcha which created the letters image like http://evilsite.com/cgi-bin/captcha.jpg?q=ansr

A little while later, their captcha form had a hidden input appended with the correct answer value, and the word to solve was changed to a new 4 letter word from a dictionary of interesting 4 letter words. The form still worked because of the hidden input. I might have changed the name on the "real" input also.

spmurrayzzz · on Dec 12, 2022

Signal boosting suggestion #1 here. Great idea.

Additionally if they decide to blackhole the fake/honeypot url, since you mentioned they pass along the user agent, you could mixin some token in a randomized user agent string that your scraper uses so that you could duck-type the request on your end to signal when to capture the egress ip.

pwdisswordfish9 · on Dec 12, 2022

#5 and #6 are key. Don't try to block them directly, just get them delisted. When you've worked out a way to identify which requests belong to the scammer, feed them content that the search engines and their ad partners will penalize them for.

davidrupp · on Dec 12, 2022

Bummed that I can upvote this only once. Excellent work.

graderjs · on Dec 12, 2022

LOL! Thank you for the laugh. This is great.

egberts1 · on Dec 12, 2022

What a sure-fire way to toast them! Kudos!

DoctorOW · on Dec 12, 2022

In my search for this I found @document isn't super supported [0] I suggested something like:

    a[href*= "sukuns.us.to"] {
     display:none; 
    }

Then use SRI to enforce that CSS.

[0]: https://caniuse.com/mdn-css_at-rules_document

ChrisMarshallNY · on Dec 12, 2022

How about something like...

    body[href*= "<OFFENDING URL>"] {
        background-image: url("http://goatse..."); 
    }

Ala: http://ascii.textfiles.com/archives/1011

petepete · on Dec 12, 2022

Or just make the whole page rotate

    body[href*= "<OFFENDING URL>"] {
      animation: rotation 20s infinite linear;
    }

    @keyframes rotation {
      from {
        transform: rotate(0deg);
      }
      to {
        transform: rotate(359deg);
      }
    }

hbn · on Dec 12, 2022

We're trying to punish the people running the proxy mirror, not the users who stumble upon them just trying to use the site

LawTalkingGuy · on Dec 12, 2022

You could look at it as trying to get them blocked by search engines. Can you detect when they're proxying a search bot as opposed to a user? As for punish, you don't have to make it eye-bleach, just enough to make it firmly NSFW so nobody can get any business value from it, or even use it safely at work.

A little soft NSFW would also greatly accelerate them being added to a block list, especially if you were to submit their site to the blocklists as soon as you started including it. You can include literally anything that won't get you arrested. Terrorist manifestos, the anarchists cookbook, insane hentai porn... Use all those block categories - gore/extreme, terrorist, adult, etc.

ChrisMarshallNY · on Dec 12, 2022

In that case, write some JS, that wanders around the Hubble site, randomly downloading full-res TIFF images for the background, or that randomly displays Disney images.

JohnAaronNelson · on Dec 12, 2022

Seems like it would be fairly easy to use this pseudo selector, and apply it to every element on the page. Making them show up as empty to the user

DoctorOW · on Dec 12, 2022

You could add a data attribute to the html tag of the document with the current URL, I.E.

  <html data-path="https://www.saashub.com/about">

then hide the full page with:

  html {display: none;}
  html[data-path*="saashub.com"] {display:block;}

emsixteen · on Dec 12, 2022

This seems quite elegant and easy. Obviously in addition to other measures, but I like it.

DoctorOW · on Dec 12, 2022

Honestly this is my favorite HN post in a while I've had a lot of fun thinking over this challenge.

asciii · on Dec 12, 2022

I'm with you, too!

sublinear · on Dec 12, 2022

I know this is just a game that never ends, but if they're already rewriting the HTTP requests what's stopping them from rewriting the page contents in the response?

SRI is for the situation where a CDN has been poisoned, not this.

DoctorOW · on Dec 12, 2022

It might not explicitly be what SRI is meant for but it'll narrow the proxy's options to:

A. Blank page

B. Let the find and replace update the CSS. Generate new hashes in the HTML.

C. Find someone new to pick on.

B is time and potentially computationally expensive, so it makes C a better option.

sublinear · on Dec 12, 2022

A doesn't work because B doesn't prevent the attacker from regexing out the hash altogether and changing the domain name in the tags to their own.

ignoramous · on Dec 12, 2022

If they're rewriting html, I guess sanitizing css won't be beyond them.

blantonl · on Dec 12, 2022

Shadow nefarious techniques are the best. Don't give them clear indications that there is a problem.

For example, I had an app developer start stealing API content, so once I determined points to key from them, instead of blocking them I simply randomized the API content details returned to their user's apps.

Hey, API calls look good, the app looks like it is working, no problem right? Well, the users of the app were pissed and the negative reviews rolled in. It was glorious.

kokekolo · on Dec 12, 2022

Serious question — is there a way to defend from this "stealing the API" thing? E.g. building an authentication of some sort and then including a key with your app?

supriyo-biswas · on Dec 12, 2022

Of course HN doesn’t like anything that’s reminiscent of DRM, but Apple’s App Attest and Google’s Play integrity API can help dispense online services to valid clients only.

Bender · on Dec 12, 2022

These are the best ideas, especially SEO poisoning and alternate images. If their point is to steal content and rankings then poisoning the well should discourage this in the future. I suspect their actual goal is to have a low-effort high SEO site to abuse as a watering hole for phishing attacks.

As a side note, their domain is linked in this thread so they are seeing HN in their access logs and probably reading this. It should make for an interesting arms race. Or red/blue team event.

IMSAI8080 · on Dec 12, 2022

They said the attacker was passing through the client's user agent. If they get a user agent that is GoogleBot, they could check if the requesting IP is actually a valid Google data centre (there is a published list of IPs). If the IP is not Google directly, they could return a blank page therefore causing Google to index nothing through the mirrored site.

Bender · on Dec 12, 2022

This is a good idea, though it may be short lived since the attackers are likely reading this due to the referrers in the logs. They may add an ACL to counter this but it might be interesting to see how long that works.

slashdev · on Dec 12, 2022

Seems like a good use case for a zip bomb. Return some tiny gzipped content that expands to 1gb.

christophilus · on Dec 12, 2022

Yeah. Their proxy is parsing the HTML and stripping it / modifying it, so they're obviously unzipping the responses on their servers. Create the honeypot endpoint, and if you get a request from that endpoint, reply with a zip bomb.

Then, write a little script that repeatedly hits that honeypot URL. I quite like this idea.

slashdev · on Dec 12, 2022

Awesome, do post a follow-up on HN, I want to hear how this war with the proxy asshats plays out.

spiffytech · on Dec 12, 2022

> 5. Instead of banning them, return fake content (fake titles and fake images etc) if proxy is detected OR the ip is flagged.

> 6. Don't ban the flagged ip's. She/He's gonna find another one. Make them angry and their user's angry so they give up on you.

There's a popular blog that no longer gets linked on HN.

The author didn't like the discussions HN had around his writing, so any visitors with HN as the referer are shown goatse, a notorious upsetting image, instead of the blog content.

mschuster91 · on Dec 12, 2022

Goatse? I assume you're referring to jwz - that blog shows a testicle in an egg cup if it sees a HN referrer.

spiffytech · on Dec 12, 2022

Yeah, jwz. Looks like I got mixed up - goatse has been a popular choice for this kind of thing, but jwz went with a different image.

Fortunately, there are many upsetting images for the OP to choose from!

GTP · on Dec 12, 2022

Out of curiosity, which blog are you talking about?

ignoramous · on Dec 12, 2022

https://news.ycombinator.com/from?site=jwz.org

almostnormal · on Dec 12, 2022

Does anyone not have their referer header supressed or faked?

zhfliz · on Dec 13, 2022

I strip the referrer generally via https://wiki.mozilla.org/Security/Referrer, unfortunately it breaks a small number of sites very badly, such as web.archive.org and a few others. some of them claiming it was done to combat scraping.

almostnormal · on Dec 13, 2022

Breaking is only part of the problem. The pages that rely on the referer header take it for granted and do not implement any meaningful error handling. They just die a horrible death, instead of responding with an error message stating that they need a referer.

One bad example is relying on the referer only for log-out, everything else works. That site also runs massive js on log-out, as if it really needs to rely on explicit log-out, and not just the user disappearing.

heleninboodler · on Dec 12, 2022

I have never considered faking or suppressing my referer header. I don't know why I would care. I suspect I'm in the company of well over 99% of all internet users.

aliswe · on Dec 12, 2022

Why return big files when you can return small files at excruciatingly slow speeds? modems are hot again!

luch · on Dec 12, 2022

that's probably the best advice. Instead of denying the proxy, just make it shitty to use for the end-user.

dspillett · on Dec 12, 2022

> Maybe write some bad words to the user on random places in the HTML

> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Be careful when doing things like this, including the shock image option mentioned in other comments, as then it could become an arsehole race with them trying to DoS your site in retribution. Then again, going through more official channels could also get the same reaction, so…

> When you detect proxy, return too big fake HTML files (10GB) etc. That could crash their server if they load the HTML into the memory when parsing.

Make sure you are setup to always compress outgoing content, so that you can send GBs of mostly single-token content with MBs of bandwidth.

scarmig · on Dec 12, 2022

> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Doesn't that also cost you an equal amount? You'll be serving them an equal amount that they proxy to the end user.

It's not even necessarily a cost for them; you're assuming that the host is owned and paid for by the abuser. If it's simply been hijacked (quite possible), you're just racking up costs for another victim.

MadVikingGod · on Dec 12, 2022

I remember years ago there was a way to DDoS a server by opening the connection and sending data REALLY slow, like 1 byte a second. I wonder if there is a way to do the opposite of that, where ever request is handed off to a worker which slow enough to keep the connection alive. I doubt this can scale well, but just a thought.

ambicapter · on Dec 12, 2022

Slow loris attack https://en.wikipedia.org/wiki/Slowloris_(computer_security)

macNchz · on Dec 12, 2022

The “opposite” thing you’re describing sounds like a tarpit: https://en.m.wikipedia.org/wiki/Tarpit_(networking)

zhfliz · on Dec 13, 2022

you can have some fun with nginx if you can identify on your backend whether the request is coming from a malicious source, e.g. with X-Accel-Limit-Rate

rich_sasha · on Dec 12, 2022

I read once a suggestion to serve gzipped requests which, gzipped, are tiny, but un-gzipped are enormous. Like GBs of 0s.

Not sure how you actually do it and if it serves your purpose but sounded neat.

e1g · on Dec 12, 2022

It's called a "zip bomb" (popularized by Silicon Valley [1]), and there is a good guide (and pre-generated 42kB .zip file to blow up most web clients) at https://www.bamsoftware.com/hacks/zipbomb/

[1] https://www.youtube.com/watch?v=jnDk8BcqoR0

rgrieselhuber · on Dec 12, 2022

Any recommendations on proxy database providers?

gary_0 · on Dec 12, 2022

http://iplists.firehol.org/ looks free and very comprehensive. It has whole bunch of sub-lists of IPs that are likely to be sources of abuse, including datacenters and VPNs, and it gets updated frequently. Github: https://github.com/firehol/firehol

RektBoy · on Dec 13, 2022

> 1. Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Nope, since anybody doing this and it has at least minimum intelligence are using residential botnets as proxies.

tgtweak · on Dec 12, 2022

Going defcon3 on proxies

You can also write some obfuscated inline JavaScript that checks the current hostname and compares to the expected one and redirects when not aligned.

aembleton · on Dec 12, 2022

They are stripping all JS.

geocrasher · on Dec 12, 2022

Passive Aggressive FTW. These are all fantastic ideas.

jwsteigerwalt · on Dec 12, 2022

I really like #9, this seems like a simple way to make your site unusable except via the methods you desire.

stanislavb · on Dec 12, 2022

Oh, I love these. I will use some of them. Many thanks!

auselen · on Dec 12, 2022

Fake 10GB html can be a zip bomb?

habibur · on Dec 12, 2022

point no.1 will do. that's the solution.