If anyone is interested in avoiding bloat, llama.cpp already includes an OpenAI ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Art9681 on Jan 6, 2024 | parent | context | favorite | on: Nitro: A fast, lightweight inference server with O...

If anyone is interested in avoiding bloat, llama.cpp already includes an OpenAI compatible API:

https://github.com/ggerganov/llama.cpp/blob/master/examples/...

brucethemoose2 on Jan 6, 2024 [–]

There's also a native C++ implementation now.

...Unfortunately they have issues. The C++ version straight up ignores parameters like temperature, the python implementation does not support batching.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact