I think the size is not very important but it can be a good measure of the dependencies. If something needs pytorch and other python stuff it's a more complex install than something that's stand alone.
That said, llama.cpp (around which this is based), to run on a GPU still needs cuda toolkit (a 4 GB install?). On a mac I'm not so sure. So it's a bit of misdirection, unless we're only talking on CPU.
(Someone correct me if I'm wrong, I'm only familiar with building llama.cpp, can it run from just the binary without cuda?)
The original llama.cpp did not have CUDA support. It was a pure CPU binary using vector instructions for acceleration. IIRC it uses Apple's "Accelerate" framework for faster computation on the M-series CPUs.
(Someone correct me if I'm wrong, I'm only familiar with building llama.cpp, can it run from just the binary without cuda?)