Hacker News new | past | comments | ask | show | jobs | submit login

What's the average inference time like for your model on DO?



65 seconds


This is now 240 seconds on a Google Cloud N2D instance. I successfully recreated my inference worker on a Google preemptible instance. My monthly cost went from $40 on Digital Ocean to ~$16. It's much slower though.


Nice big cost reduction! Is this using BudgetML from the post? Have you tried optimizing the model (i.e quantization and converting to something like ONNX)? I know this can bring big speed gains on T5 on CPU (5x faster), another generative model. More info here: https://discuss.huggingface.co/t/speeding-up-t5-inference/18...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: