Hacker News new | past | comments | ask | show | jobs | submit login

If anyone is interested in avoiding bloat, llama.cpp already includes an OpenAI compatible API:

https://github.com/ggerganov/llama.cpp/blob/master/examples/...




There's also a native C++ implementation now.

...Unfortunately they have issues. The C++ version straight up ignores parameters like temperature, the python implementation does not support batching.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: