I've done pretty extensive work in all three major cloud providers. If you were to ask me which one I'd use for a net new project, it would be GCP -- no question. Nearly all of their services I've used have been great with a feeling that they were purposefully engineered (BigQuery, GKE, GCE, Cloud Build, Cloud Run, Firebase, GCR, Dataflow, PubSub, Data Proc, Cloud SQL, goes on and on...). Not to mention almost every service has a Cloud API, which really goes a long way towards eliminating the firewall and helps you embrace the Zero Trust/BeyondCorp model. And BigQuery. I can't express enough how amazing BigQuery is. If you're not using GCP, it's worth going multi-cloud for BigQuery alone.
But there is something to be said of AWS. Their SDKs are complete and predictable, their APIs are very fast and consistent, and AWS IAM, while having a steep learning curve, never leaves you guessing around what your principals have access to. For me, the real challenge with AWS has been introducing multiple AWS accounts. Governance just flat out sucks when you begin to scale past a handful of accounts (but it is getting better).
Azure on the other hand, has terrible consistency issues between their APIs, their SDKs are awful, and it just feels like the entire product is an extension of the MCP System Administrator persona of old, where it's expected that someone's job will be sitting in front of a UI and clicking around to get things done (the whole blade thing with their portal has to be one of the worst user experiences I've ever seen). However, I do like their Logic Apps, and Azure Policy with auto remediation (when it works as advertised -- ref API consistency and how long it takes for things to propagate through their system) has tons of potential. But they still have a ways to go before I'd consider it for my workloads.
Pangloss, who was as inquisitive as he was argumentative, asked the old man what the name of the strangled Mufti was. ‘I don’t know,’ answered the worthy man, ‘and I have never known the name of any Mufti, nor of any Vizier. I have no idea what you’re talking about; my general view is that people who meddle with politics usually meet a miserable end, and indeed they deserve to. I never bother with what is going on in Constantinople; I only worry about sending the fruits of the garden which I cultivate off to be sold there.’ Having said these words, he invited the strangers into his house; his two sons and two daughters presented them with several sorts of sherbet, which they had made themselves, with kaimak enriched with the candied-peel of citrons, with oranges, lemons, pine-apples, pistachio-nuts, and Mocha coffee… – after which the two daughters of the honest Muslim perfumed the strangers’ beards. ‘You must have a vast and magnificent estate,’ said Candide to the turk. ‘I have only twenty acres,’ replied the old man; ‘I and my children cultivate them; and our labour preserves us from three great evils: weariness, vice, and want.’ Candide, on his way home, reflected deeply on what the old man had said. ‘This honest Turk,’ he said to Pangloss and Martin, ‘seems to be in a far better place than kings…. I also know,” said Candide, “that we must cultivate our garden.’
"Serverless" is basically equivalent to a supercomputer in that context, but then it goes on to exhibit latency characteristics that would be considered a non-starter for a supercomputer.
Latency is one of the most important aspects of IO and is the ultimate resource underlying all of this. The lower your latency, the faster you can get the work done. When you shard your work in a latency domain measured in milliseconds-to-seconds, you have to operate with far different semantics than when you are working in a domain where a direct method call can be expected to return within nanoseconds-to-microseconds. We are talking 6 orders of magnitude or more difference in latency between local execution and an AWS Lambda. It is literally more than a million times faster to run a method that lives in warm L1 than it is to politely ask Amazon's computer to run the same method over the internet.
This stuff really matters and I feel like no one is paying attention to it anymore. Your CPU can do an incredible amount of work if you stop abusing it and treating it like some worthless thing that is incapable of handling any sizeable work effort. Pay attention to the NUMA model and how cache works. Even high level languages can leverage these aspects if you focus on them. You can process tens of millions of client transactions per second on a single x86 thread if you are careful.
Furthermore, the various cloud vendors have done an exceptional job at making their vanilla compute facility seem like a piece of shit too. These days, a $200/m EC2 instance feels like a bag of sand compared to a very low-end Ryzen 3300G desktop I recently built for basic lab duty. I'm not quite sure how they accomplished this, but something about cloud instances has always felt off to me. I can see how others would develop a perception that simply hosting things on one big EC2 instance would mean their application runs like shit. I am unsurprised that everyone is reaching for other options now. On-prem might be the best option if you have already optimized your stack and are now struggling with the cloud vendors' various layers of hardware indirection. Simply going from EC2 to on-prem could buy you an order of magnitude or more in speedup just by virtue of having current gen bare metal 100% dedicated to the task at hand. Obviously, this brings with it other operational and capital costs which must be justified by the business.
But there is something to be said of AWS. Their SDKs are complete and predictable, their APIs are very fast and consistent, and AWS IAM, while having a steep learning curve, never leaves you guessing around what your principals have access to. For me, the real challenge with AWS has been introducing multiple AWS accounts. Governance just flat out sucks when you begin to scale past a handful of accounts (but it is getting better).
Azure on the other hand, has terrible consistency issues between their APIs, their SDKs are awful, and it just feels like the entire product is an extension of the MCP System Administrator persona of old, where it's expected that someone's job will be sitting in front of a UI and clicking around to get things done (the whole blade thing with their portal has to be one of the worst user experiences I've ever seen). However, I do like their Logic Apps, and Azure Policy with auto remediation (when it works as advertised -- ref API consistency and how long it takes for things to propagate through their system) has tons of potential. But they still have a ways to go before I'd consider it for my workloads.