I agree, except about this statement: "it wouldn’t be too difficult to port most...

ein0p · on March 12, 2024

Well, I should have said “it wouldn’t be too difficult for me” then. I keep forgetting why I get paid so much.

VirusNewbie · on March 12, 2024

I would love for you to expound, I found it interesting that you qualified your "should you bother, no" with "unless you are doing inference at scale". But in the previous paragraph you explained why you can get better performance with GPUs.

So is there some advantage of TPU, assuming there was SWE/API parity between GPUs?

ein0p · on March 12, 2024

Could be cheaper, depending on workload, and if you’re large that could justify the cost of additional SWE time required to port and support. Triton/CUDA requires people who know both DL and low level programming. Whether you get better performance _per dollar_ really depends on workload and also on the size of your workload. Here I don’t just mean the cost of buying compute in cloud, I mean the more broad definition: total cost of doing business, all in, including SWE cost. If you’re huge (eg Anthropic), SWE cost at scale is a lot easier to justify. If you’re on the smaller side, SWE cost matters a lot more. It’s way easier to hire PyTorch people (market share 60%) than eg Jax (market share 3%). And yeah I know there’s Torch XLA, but it’s basically the same thing with a different frontend.