It’s sustainable. Llama.cpp is orders of magnitude more efficient than Llama bec...

donkeybeer · on Aug 24, 2023

C++ not Rust. And most ML libraries being called from python or whatever are themselves C++ but of course there's always room for improvement, esp if you reimplement the entire corebase in C++ and specialize the inference program for your architecture. Specialization often leads to possibility of more efficient code. You are not running a generic neural net but you know the architecture beforehand so you can design its memory layout etc for efficiency yes.

raincole · on Aug 24, 2023

Are you seriously not knowing what .cpp in llama.cpp stands for?