Oobagooda and other front ends and similar projects have in my testing had upwar... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

UnlockedSecrets on Jan 6, 2024 | parent | context | favorite | on: Nitro: A fast, lightweight inference server with O...

Oobagooda and other front ends and similar projects have in my testing had upwards of a 50% difference in inference speed on the same model and settings, So benchmarks are still useful.

brucethemoose2 on Jan 6, 2024 [–]

Ooba is an outlier, and has tons of overhead over llama.cpp and llama-cpp-python for some reason.

Most llama.cpp openai servers are pretty close to vanilla llama.cpp, albeit without the batching support.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact