Nice big cost reduction! Is this using BudgetML from the post? Have you tried op...

Nice big cost reduction! Is this using BudgetML from the post? Have you tried optimizing the model (i.e quantization and converting to something like ONNX)? I know this can bring big speed gains on T5 on CPU (5x faster), another generative model. More info here: https://discuss.huggingface.co/t/speeding-up-t5-inference/18...