If there's a CVE vulnerability that is being actively exploited on your network, you should preempt running processes to deal with it, and absolutely must take the boot+nuke approach, because it already could be affecting any host that has not already been boot+nuked?
If there's not a CVE, AWS can significantly manage the lifecycle of their machines, and have ~5% of all of their machines "unschedulable" at any one time, waiting for existing processes to complete so that they may use an orderly restart before doing a boot+nuke. A SLA of "Tasks may never run longer than X days"(x=10-30) allows them to perform orderly restarts.
I don't know your background but the way you respond makes me think you have not been responsible for systems that multiple tenants rely on for varying workloads.
These assumptions you're making are dangerous because the variety of workloads across tenants is extreme. If you're going to do something like "kill compute no matter what" then you better have a good reason for it.
You may want to look at my resume. I've seen what happens when you don't "kill compute no matter what" - When compute does get killed no matter what (hardware problems happen quite often at scale), you have problems. I've also seen it done right. Clearly, Fargate has not - I could also tell you that from having used the service.
If there's not a CVE, AWS can significantly manage the lifecycle of their machines, and have ~5% of all of their machines "unschedulable" at any one time, waiting for existing processes to complete so that they may use an orderly restart before doing a boot+nuke. A SLA of "Tasks may never run longer than X days"(x=10-30) allows them to perform orderly restarts.