I think the size is not very important but it can be a good measure of the depen... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

andy99 on Jan 6, 2024 | parent | context | favorite | on: Nitro: A fast, lightweight inference server with O...

I think the size is not very important but it can be a good measure of the dependencies. If something needs pytorch and other python stuff it's a more complex install than something that's stand alone. That said, llama.cpp (around which this is based), to run on a GPU still needs cuda toolkit (a 4 GB install?). On a mac I'm not so sure. So it's a bit of misdirection, unless we're only talking on CPU.

(Someone correct me if I'm wrong, I'm only familiar with building llama.cpp, can it run from just the binary without cuda?)

DarmokJalad1701 on Jan 6, 2024 [–]

The original llama.cpp did not have CUDA support. It was a pure CPU binary using vector instructions for acceleration. IIRC it uses Apple's "Accelerate" framework for faster computation on the M-series CPUs.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact