https://github.com/ggerganov/llama.cpp/blob/master/examples/...
...Unfortunately they have issues. The C++ version straight up ignores parameters like temperature, the python implementation does not support batching.
https://github.com/ggerganov/llama.cpp/blob/master/examples/...