Yeah, there is a clear bottleneck somewhere in llama.cpp. Even high end hardware...

		deoxykev 5 months ago \| parent \| context \| favorite \| on: How to Run DeepSeek R1 671B Locally on a $2000 EPY... Yeah, there is a clear bottleneck somewhere in llama.cpp. Even high end hardware is struggling to get good numbers. The theoretical limit should be higher, but it's not yet. Benchmarks: https://github.com/ggerganov/llama.cpp/issues/11474#issuecom...