Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Main problem with the TPU Research Cloud is you get dragged down a LOT by the buggy TPU API-- not just the Google Cloud API being awful but the Tensorflow/Jax/Pytorch support also being awful too. You also basically must use Google Cloud Storage, which is also slow and can be really expensive getting anything into / out-of.

The Googlers maintaining the TPU Github repo also just basically don't care about your PR unless it's somehow gonna help them in their own perf review.

In contrast with a GPU-based grid, you can not only run the latest & greatest out-of-the-box but also do a lot of local testing that saves tons of time.

Finally, the OP here appears to be offering real customer engagement, which is totally absent from my own GCloud experiences across several companies.



Could you share a few technical details about the issues you've encountered with TF / JAX / PyTorch on Cloud TPUs? The overall Cloud TPU user experience improved a whole lot when we enabled direct access to TPU VMs, and I believe the newer JAX and PyTorch integrations are improving very rapidly. I'd love to know which issues are currently causing the most friction.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: