We tried hard to move some of our inference workloads to TPUs at NLP Cloud, but ...

We tried hard to move some of our inference workloads to TPUs at NLP Cloud, but finally gave up (at least for the moment) basically for the reasons you mention. We now only perform our fine-tunings on TPUs using JAX (see https://nlpcloud.com/how-to-fine-tune-llama-openllama-xgen-w...) and we are happy like that.

It seems to me that Google does not really want to sell TPUs but only showcase their AI work and maybe get some early adopters feedback. It must be quite a challenge for them to create a dynamic community around JAX and TPUs if TPUs stay a vendor locked-in product...