Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, there is a clear bottleneck somewhere in llama.cpp. Even high end hardware is struggling to get good numbers. The theoretical limit should be higher, but it's not yet.

Benchmarks: https://github.com/ggerganov/llama.cpp/issues/11474#issuecom...



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: