Hacker News new | past | comments | ask | show | jobs | submit login
The fastest de-referer service... with AWS Global Accelerator (lifeofguenter.de)
129 points by lifeofguenter on Nov 9, 2021 | hide | past | favorite | 77 comments



Full disclosure, I work at Fly.io now.

This exact setup is easier on Fly.io - our proxy layer runs in 20 regions worldwide with anycast, so your requests hit the nearest region and quickly terminate TLS there.

You can also run any Docker container, and either choose regions to run them in, or just set the min/max and ask us to start and stop containers in whichever region has demand, so your deployment follows the sun.


Correct me if I am wrong: fly's anycast has its limitations compared to Global Accelerator (GA) though:

On ocassion, it breaks UDP protocols that are "connection oriented" (like QUIC and WireGuard, though both have built-in capabilities to recover).

There is no way to pin traffic to VMs (route / client affinities) or shape traffic.

100+ locations with GA, and two Anycast IPs (in two distinct "zones").

---

Alternatives to Fly on AWS that I know of:

Anycast on AWS without global accelerator: S3 buckets with transfer acceleration); (edge optimized) API Gateway to Lamba / Fargate; S3 + CloudFront.

AWS AppRunner + Copilot (which are comparable to Fly + Flyctl) can be geo-routed to nearest instance by DNS-based load balancing with Route53 (not anycast specifically).

---

Fly's killer feature (and why we are transitioning to it) is its cost-effectiveness and almost 'zero-devops' setup.

- Super cheap bandwidth ($2 per 100GB!)

- Free deploys (AppRunner charges for deploys)

- Free monitoring (versus expensive but comprehensive CloudWatch)

- Free orchestration

- Free anycast transit (expensive on aws)

- Cheaper, zero-touch, global/cross-region private-network across VMs in the same fly org (zero-muck: transit gateway, nat gateway, internet gateway, vpc private endpoints, iam policies...).

- Super cheap and fast disks ($0.15 per GB/disk!)

- Easier HA Postgres and HA Redis setups.

- GA's TCP proxy does not preserve source/client ip-ports (Fly can).


AWS engineer from the container services team here. One small point of clarification on "AppRunner charges for deploys": We only charge for deploys if you are using App Runner's built-in integration to watch your repo, and automatically build/rebuild your container image from source code.

This is not a required feature for App Runner to function though. For example if you are using Copilot with App Runner you can drive the build and release to App Runner from your local dev machine, so there is no extra deployment charge beyond what it costs you for electricity to build your container on your own laptop. You only get charged for App Runner deployment automation when you are using AppRunner as a Github Actions / Jenkins replacement to do the Docker image build on the server side.


For the most part, yes, everything makes sense. There are some things worth noting though:

> On ocassion, it breaks UDP protocols that are "connection oriented" (like QUIC and WireGuard, though both have built-in capabilities to recover).

Yes, and no, in that QUIC and WireGuard do work consistently, it's not that they break. But Fly doesn't currently offer UDP flow hashing or sessions or pinning.

> There is no way to pin traffic to VMs (route / client affinities) or shape traffic.

No, but the system is built to obviate the need for this — you can choose which regions your app runs in and Fly will balance them for you based on the strategy you choose. I'm not sure what benefit is being missed out on by not having it — if there is a clear benefit that's not achievable under the current design we can make a case for building it.

> 100+ locations with GA, and two Anycast IPs (in two distinct "zones").

Fly lets you allocate and buy more IPs under Anycast, so more than two should be possible. Regarding the 100+ locations, that's technically true but irrelevant — GA doesn't serve requests, so they still need to hit apps deployed on one of the AWS regions (usually provisioned and managed separately). With Fly your app is running on the edge regions pretty much automatically.

The closest alternative to Fly on AWS would be (1) Global Accelerator pointing at (20) Application Load Balancers in every region, each backed with (1+) Fargate containers maybe? Would also need Lambda triggers to turn off traffic to a region if it wasn't worth running containers there, and turn them back on again.


Could one run caddy and serve their own TLS termination?


Yes, it’s possible to ask for the raw connection to be passed to the application and self-manage TLS.


>Super cheap bandwidth ($2 per 100GB!)

That’s only “super cheap” if you’re comparing to AWS’s outright highway robbery bandwidth pricing.


Sure, there are lots of VPS and dedicated server providers that offer lots of bandwidth, but they're not playing the same game as the big cloud providers, or even fly.io, when it comes to auto-scaling, self-healing, multi-data-center deployments.


I really like Fly and would love to move some side project workloads to it, the only thing holding me back is the Postgres product which seems to be a little bit 'not ready for production'. I'm referring to point-in-time recovery and ease of backup restoration mostly.

The product looks too good to be true, and when you dig into a little deeper it seems like it isn't totally 100%.

Amazon RDS is something that I really trust, but I didn't get the same vibe looking at Fly Postgres.


Our Postgres is not an RDS replacement. Lots of devs use RDS with Fly. In fact, Postgres on Fly is just a normal Fly app that you can run yourself: https://github.com/fly-apps/postgres-ha

Ultimately, we think devs are better off if managed database services come from companies who specialize in those DBs. First party managed DBs trend towards mediocre, all the interesting Postgres features come from Heroku/Crunchy/Timescale/Supabase.

So we're "saving" managed Postgres for one of those folks. For the most part, they're more interested in giving AWS money because very large potential customers do. At some point, though, we'll be big enough to be attractive to DB companies.


I mentioned a very similar thing to them on this community post. (May 18th) https://community.fly.io/t/fly-with-a-managed-database/1493

Their response was this: > Our goal in life is to solve managed postgres in the next few months. Either us managing, or with someone like Crunchy Data running on top of Fly. So “hold tight” might also be good advice.


I have a few toy apps on Fly and while I do like the service, it has been flaky. E.g. 12 hours ago my app raised a number of errors as it got disconnected from the Postgres database.

This isn't a show stopper for me as they're toys, but I would be somewhat wary of moving critical production apps to it just yet. (Also, everything else aside from PG has been rock solid for me)


Also way cheaper, if your container is efficient you’ll probably pay under the cost of Global Accelerator alone, and bandwidth is way cheaper as well.


Do you have your own servers or you build your service on top of the aws/gcp/azure?


We have our own servers.


If I'm not mistaken, you were using Equinix Metal (formerly Packet) at some point. Did that change?


Kind of! We still use Equinix Metal in some regions. Leasing servers and buying colo isn't all that different from Equinix Metal so I'd class that as "our own hardware".


Fair enough. And using multiple upstream providers is of course a good idea.


It is 2021, and AWS Global Accelerator still does not support IPv6. Google Cloud has supported this on their global anycast load balancers since September 2017.


I get the distinct impression that big, core chunks of AWS don't support IPv6, and where they do, it's very much a bolted-on second class citizen for which one should keep the expectations low.


Google is by far the worst cloud provider when it comes to IPv6 support. They only support IPv6 on VPC networks in asia-east1, asia-south1, europe-west2, and us-west2, you cannot access the metadata service over IPv6, and IPv6 on load balances just left Alpha last year I believe.


I do not care about internal network IPv6 at all. What I do care about is external network accessibility for IPv6. For incoming IPv6 GCP is doing great, outgoing not so much (I agree!).


Then you'd be happy to hear AWS and Azure had support for IPv6 on inbound load balancers years before GCP.


Google doesn't own nearly as many IPs as Amazon, so they have to do better at IPv6. AWS is incentivized to keep the world IPv4 as long as possible.


I find it hilarious that they're avoiding sending the referer to a 3rd party, by sending the referer to a 3rd party..


In the context of the problem being solved – removing Referer headers – isn't this optimising the wrong thing?

Almost every browser, except for IE, supports the `Referrer-Policy` header. We should be aiming to avoid additional redirects, not to make them faster.


I would say the vast majority of solutions do it properly, and it's hard to fathom why someone would ever use a service like this. Quite aside from adding additional latency, improve privacy/security by looping in another party? That does not follow.

Set the header and call it a day, and at this point browsers should default to same-origin. The only outlier is IE 11, with 0.5% usage, and it is so grossly out of date it's pretty reasonable to just dump.

I remember in the very early Internet trying to raise an alarm that a lot of people didn't realize the privacy implications of referral headers (run a website and you could find all sorts of crazy niche discussion forums, see the resources people were referencing, etc). I certainly am not claiming prescience, but it was amazing how little anyone cared. Mind you, I also once promoted the notion that browsers should only ever send strongly, expensively domain-specific hashed passwords from password elements, and that too was pooh poohed. Several billion pwned passwords later...


Won't a JS-based backup work for 95% of 0.5%? So, there are many, many options on the table.


But what are the costs and scalability?

EC2s in many regions + Global Accelerator is a completely different pricing model to pay as you go serverless (and, depending on your use case and usage patterns, more or less expensive), and scaling is entirely up to you. It's cool to optimise for latency, but it would have been nice to have resulting costs and downsides (scaling).


Global accelerator $0.025/hr = $18.27/month.

3x t3a.small @ $13.72/month. = $41.16/month.

A 3 region single node web app could be done for under $60, dropping to around $43 with instance reservations. With a config this small there is no need for regional LBs, and even if you have multiple instances per region, GA has configurable internal flow hashing.

The dominant cost for any real site will likely continue to be bandwidth. GA pushes egress pricing up substantially in many cases, from 1.5c/GiB to as much as 10.5c/GiB, but this is unlikely to be a concern unless you're serving heavy images or video.

Autoscaling works the same as anywhere else. You can use spot instances, or even front ALBs with GA and build your whole app out of Lambda functions (although this would assuredly completely negate any latency benefit gained by using GA in the first place!).


I read the first six paragraphs and still really had no idea what this is talking about. Can anyone help me understand?


The author likes AWS Global Accelerator. The author likes GA mainly because it uses BGP anycast, which means that the packets drop into AWS' network at the closest AWS point-of-presence, instead of traversing the internet. The author's experience is that this feature of GA, when combined with always-on VMs that do not need to cold start, provides him low latency for a particular service he is running.

The author also mentions Cloudflare and Google App Engine, but rejects these because on those services he chose to use the lambda-like compute functionality and wasn't prepared for the cold starts. He doesn't appear to have tried using Lambda on AWS, or dedicated VMs with Google. Thus it is a slightly apples-to-oranges comparison.


Lambda@Edge unfortunately does not support provisioned concurrency, so you are bound to cold-starts as well (I tried, measured, and reverted before even going live with it).


I opened a request on GitHub that Global Accelerator support AWS AppRunner (persistent containers in Fargate VMs with HTTP/S gateway) as an endpoint. That'd cover all of OP's usecases (and the one that I also have): https://github.com/aws/apprunner-roadmap/issues/83


I guess the blog is using referrer stripping as an example to demo/play with AWS global accelerator.

If it was me, I would set referrer-policy: origin (or no-referrer) header[1] and avoid all the overhead. Supported on all current browsers[2], and no additional costs or code to worry about.

Also, does it seem a bit insecure to anyone else to outsource referrer stripping services if your goal is to secure the content of the referrer from third parties? How well do you trust the middlemen in this case?

1. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Re...

2. https://caniuse.com/referrer-policy


Looks to be a service to remove the HTTP referer header when linking to other sites.

Say your on example.com and click a a link to foo.com the browser will send the http header `Referer: example.com` in the HTTP get to foo.com, this means foo.com can then track how you came to their site.


The only place I've even seen this is torrent sites linking to IMDB. Does it have a legitimate use?


Links in web email clients


LOL, why do you need an "accelerator" for this?


It’s explained in detail in the article.


It's about accelerating the networking performance from the user to a server doing the work. You don't need to do it but it's always going to be a limit on performance if you don't, no matter how fast you make the application server itself.


Using a Rwandan TLD negates pretty much all the effort in finding the fastest provider. No glue records and an unresponsive TLD nameserver translates into 100+ms for the initial DNS lookup (likely the only one that matters for such a service).


I'm not sure it's _so_ bad, in practice?

If you dig +trace url.rw, you can see that the NS record for url.rw is held on ns-rw.afrinic.net, pch.ricta.org.rw, ns1.ricta.org.rw, ns3.ricta.org.rw and fork.sth.dnsnode.net. It's true that some of those servers are slower than others (for me, the AfriNIC server is 500ms whilst the dnsnode.net server is 50ms), but that shouldn't really matter because the TTL on the record is 86400. So the chances are that all the big DNS services (8.8.8.8 etc) should have the correct nameservers for url.rw in the cache already. Yes, if you're running a local-ish resolver, things are different... but most folks are dependent on Google, Cloudflare or their (large) ISP.

The actual A record for url.rw is held on AWS's DNS servers, with a TTL of 300. But AWS's DNS servers are fast.


If you don't know, DNS are cached at multiple layers, including ISP.


This is technically correct but don’t rely on caching to solve this problem. Unless you’re getting a ton of widespread traffic you’ll probably be getting more cache misses than you expect - every time I did client-side monitoring, the DNS 90th percentile was quite notably higher than the 50th.


The trend has definitely been towards lower and lower TTLs with cloud deployments and such. What used to be a 1 day TTL with a static host is now 5 minutes in the cloud.

Servers that use geo DNS (EDNS-Client-Subnet) also cause considerable cache misses since the caching becomes very granular.


Good points - also people started using more TLDs and hostnames as the average page started loading different service endpoints directly in the client, and I got the impression that a fair number of places were slow to increase their DNS cache sizes.


Well the homepage of the site opened pretty much instantly for me, and I've never visited it before to have cached its pages or IP address.


> ”Cloudflare will not serve traffic from all edge locations, especially Australia is hit hard.”

I’ve read a few (dated) posts about bandwidth costs being extreme in Australia and Cloudflare would route around it, causing higher latency in that region.

Is that still the case though?


Yes, this is still true in 2021 (why I switched) and even true that you will not always receive the closest PoP even in bandwidth-cheap countries.

For example from Sweden I was usually routed via Denmark. I was Pro user, more PoPs are available to Business or Enterprise customers however.


Neat, I didn't know AWS had an AnyCast service like that. The Cloudflare workers issue is a bit disappointing, I remember their blog about eliminating cold starts and how clever it seemed, but I guess it's still not there? On another note have you evaluated fly.io at all for this? This looks right in their wheelhouse of many regions + AnyCast + IPv6, and I could definitely see configuration being simpler with them than AWS. Not sure if they'd meet all your requirements though/how they'd compare on price etc.


The cloudflare workers issue should be resolved now, they eliminated cold starts in 2020 (https://blog.cloudflare.com/eliminating-cold-starts-with-clo...) and it seems like he tested it in 2019


Ah thanks, that was the blog I was thinking about but I didn't realise the timing there. That makes sense, I wonder how that would be if they re-tested it now.


The latency that this individual was experiencing with Workers wayyyyy back in 2019 was not caused by cold starts, but rather a bug in the workers runtime which caused rare but long pauses. Specifically, the bug involved a hash map with a broken hash function. Here's me publicly facepalming over the issue at the time: https://twitter.com/KentonVarda/status/1189966124688953344

But that was two years ago, Workers has grown ~an order of magnitude since then (both traffic and team size), and such a bug would be immediately noticed today.

The article author says that after the fix he still wasn't happy with performance, but I don't know what he was experiencing exactly. We've done a lot of other things to improve performance since then; it may very well be better today.


I used cloudflare-workers still this year and only switched a couple of months ago: https://securitytrails.com/domain/url.rw/history/a

The performance was still very bad from a 95 percentile perspective.


This isn't the feedback we get from most customers, so I guess there was something unusual going on. Sorry we weren't able to dig into this while it was still relevant to you.


> https://twitter.com/KentonVarda/status/1189966124688953344

What exactly was the bug though? bool effectively reducing hash(key) to 0 or 1?


Yes. So almost all keys had a hash code of 1, forcing linear search lookups. Insidious since the code still works, it's just slow -- and only really slow in production, after seeming fine in tests that only put a few items in the map.


This sounds like the Azure cross-region load balancer: https://docs.microsoft.com/en-us/azure/load-balancer/cross-r...

"Cross-region IPv6 frontend IP configurations aren't supported." -- sigh.

It would be an interesting experiment to see what is the cheapest set of VM-like workloads that can be spun up in every region...


If you need to access the client IPs for several reasons (throttling, analytics, security etc), be aware that AWS Global Accelerator still only supports IP preservation in some specific setups: https://docs.aws.amazon.com/global-accelerator/latest/dg/pre...


Another option would be using Fastly and writing something as simple as this (literally under 10 lines): https://fiddle.fastlydemo.net/fiddle/9069876a It probably doesn't handle url encoded strings because I came up with it in a few minutes, but it's zero hassle, is extremely fast, and won't ever break.


I think you could run this service on CloudFront Functions, if you didn't mind using a Refresh header instead of the meta tag.

(CloudFront Functions can't generate body content, but it looks like this service works by responding with a refresh meta tag. So if you change that to a Refresh header you wouldn't need to write the body.)

CloudFront functions run on the 200+ edge locations of CloudFront and, according to the docs, have sub-millisecond startup times. So might be a viable option?


So I have found if you put your origin behind cloudfront and set it to not cache you can get similar if not better performance since you have low latency at the edge and the benefit of shared tcp connection for all assets with the origin… the last part means using http2 and serving up your assets on the same domain as the origin….


Isn’t AWS Global Accelerator more like CloudFlare Argo, rather than CloudFlare Workers?


No, the equivalent Clouflare service is Spectrum, which is tellingly expensive at $1/GB (!) than AWS Ubiquity / Global Accelerator.

Cloudflare Argo is "more like" AWS Edge-optimized API Gateway.


These terms are (or the lack of terms is) pretty horrible: Ubiquity, Argo, Spectrum, Accelerator. Sounds like names given to ironic sci-fi story characters.


If we are nitpicking, then I am compelled to point out that "ironic" doesn't mean what you may think it does: https://medium.com/@frithahookway/the-ironic-misuse-of-irony...


I think we could debate this.

> irony (ˈaɪənɪ) adj: of, resembling, or containing iron.

In all seriousness, if we want to get nitpicking let's go back to the fact that the word comes from the Greek eirōneia, which means feigned ignorance, which in turn comes from the Greek eirōn, dissembler. I'd then argue that naming things silly names can be called ironic if the story world feigns ignorance of them being silly.


Point taken (:

> I think we could debate this.

Wouldn't be worth both our time though.


Yes, the whole point is to absorb user traffic into their own backbone in the first-ish mile rather than let it travel across public Internet up to last-ish mile.


I didn't know this! How is this different from AWS CloudFront?


Cloudfront caches content at edge locations near users whilst Accelerator routes requests to the nearest endpoint to the user.


If I am understanding it correctly, Cloudfront can use Accelerator to find the nearest edge location to cache, right?


Not quite. Cloudfront predates Global Accelerator and has all its own infrastructure/routing/etc to answer client queries, get them to a nearby AWS pop, and return local content or fetch content from your single origin.

GA is similar, but more like a global anycast routing layer/load balancer. You can have multiple backends, different regions etc, use a single public GA endpoint, and GA will route the request to your nearest backend.

You could use a GA endpoint as an origin for Cloudfront, I guess, to keep all traffic as near to the client as possible.



On AWS with Global Accelerator as well.


Did you kill it?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: