Hacker News new | past | comments | ask | show | jobs | submit login

Its very much down to your workload and how you want it to work.

Short answer: k8s/fargate/ECS/batch will do what most people want. Personally I'd steer clear of k8s until you 100% need that overhead. Managed services are ok.

Long answer:

K8s has a whole bunch of scheduling algorithms but its a jack of all trades, and only really deals with very shallow dependencies (there are plugins but I've not used them). For < 100 machines it works well enough. (by machine I mean either a big VM or physical machine). Once you get more than that, K8S is a bit of a liability. its chatty, slow and noisy. It also is very opinionated. And let us not start on the networking.

Like most things there are tradeoffs. Do you want to prioritise resiliency/uptime over efficiency? do you want to have batch processing? do you want it to manage complex dependencies (ie service x needs 15 other services to run first, which then need 40 other services) are you running on unreliable infra (ie spot instances to save money) do you need to partition services based on security? are you running over many datacentres and need the concept of affinity?

More detail:

The scheduler/dispatcher is almost always deigned to run the specific types of workload that you as a company run. The caveat being that this only applies if you are running multiple datacentres/regions with thousands of machines. Google have a need to run both realtime and batch processing. But as they have millions of servers, making sure that all machines are running at 100% utilisation (or as close to it as practical) is worth hundreds of millions. Its the same with facebook.

Netflix I guess has a slightly different setup as they are network IO orientated, so they are all about data affinity, and cache modelling. For them making sure that the edge serves as much as possible is a real cost saving, as bulk bandwidth transfer is expensive. The rest is all machine learning and transcoding I suspect (but that's a guess)




Thanks for the reply!

Not really an answer to the scheduler question, but at least it mirrors some of my experience.

That K8s is something to avoid, and that it does not scale, is a known (at least to me).

But that doesn't answer what people would put on the metal when building DCs…

I was not asking out of the perspective of an end-user. I was asking about (large) DC scale infra. (As dev I know the end-user stuff).

As I see it: You can build your own stuff from scratch (which is not realistic in most cases I guess), or you can use OpenStack or Mesos. There are no more alternatives at the moment I think, and it's unlikely that someone comes up with something new. OTOHS that's OK. A lot of people will never need to build their own DC(s). For smaller setups (say one or two racks) there are more options of course. (You could run for example Proxmox and maybe something on top).


Sorry yeah, I didn't really answer your question.

Here is a non-exhaustive list of schedulers for differing use cases:

https://slurm.schedmd.com/documentation.html << mostly batch, not entirely convinced it actually scales that well

https://www.altair.com/grid-engine << originally it was sun's grid engine. There are a number of offshoots tuned for different scenarios.

https://rmanwiki.pixar.com/display/TRA/Tractor+2 << thats for the VFX crowd

https://www.opencue.io/ << open source which is vaguely related to the above

https://abelay.github.io/6828seminar/papers/hazelwood:ml.pdf << Facebook's version of airflow. It sits on top of two schedulers, so its not really a good fit. I can;t find what they publicly call their version of the borg

I'm assuming you've read about borg

As you've pointed out mesos is there as well.


Cool! Thanks! That's a lot of stuff I didn't hear about until now.

It's really nice that one can meet experts here on HN and get free valuable answers form them. Thank you.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: