Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree, except about this statement: "it wouldn’t be too difficult to port most models to Jax"

--> We tried such ports at https://kwatch.io (the company I work for), and it appeared to be much harder than expected (at least for us). I don't think so many people are capable of porting an LLM based on PyTorch + GPU to Jax + TPU.



Well, I should have said “it wouldn’t be too difficult for me” then. I keep forgetting why I get paid so much.


I would love for you to expound, I found it interesting that you qualified your "should you bother, no" with "unless you are doing inference at scale". But in the previous paragraph you explained why you can get better performance with GPUs.

So is there some advantage of TPU, assuming there was SWE/API parity between GPUs?


Could be cheaper, depending on workload, and if you’re large that could justify the cost of additional SWE time required to port and support. Triton/CUDA requires people who know both DL and low level programming. Whether you get better performance _per dollar_ really depends on workload and also on the size of your workload. Here I don’t just mean the cost of buying compute in cloud, I mean the more broad definition: total cost of doing business, all in, including SWE cost. If you’re huge (eg Anthropic), SWE cost at scale is a lot easier to justify. If you’re on the smaller side, SWE cost matters a lot more. It’s way easier to hire PyTorch people (market share 60%) than eg Jax (market share 3%). And yeah I know there’s Torch XLA, but it’s basically the same thing with a different frontend.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: