Hacker News new | past | comments | ask | show | jobs | submit login

Nice big cost reduction! Is this using BudgetML from the post? Have you tried optimizing the model (i.e quantization and converting to something like ONNX)? I know this can bring big speed gains on T5 on CPU (5x faster), another generative model. More info here: https://discuss.huggingface.co/t/speeding-up-t5-inference/18...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: