Hacker News new | past | comments | ask | show | jobs | submit login

The code is a basically irrelevant fraction of the model weights. The raw FP16 is like 17GB.

In practice your priority would be fancy quantization, and just any library that compiles down to an executable (like this, MLC-LLM or llama.cpp)




17GB looks like a lot. Thanks, I will wait until people figure how to make these smaller before trying to use to make something standalone.


It's always going to be a huge quantity of data. Even as efficiency improves, storage and bandwidth are so cheap now that the incentive will be to convert that efficiency towards performance (models with more parameters, ensembles of models, etc) rather than chasing some micro-model that doesn't do as well. It might not always be 17GB, but don't expect some lesser order of magnitude for anything competitive.

As maturity arrives, we'll likely see a handful of competing local models shipped as part of the OS or as redistributable third-party bundles (a la the .NET or Java runtimes) so that individual applications don't all need to be massive.

You'll either need to wait for that or bite the bullet and make something chonky. It's never going to get that small.


These won't be smaller I guess. Given we keep the number of parameters same.

Pre LLM era (let's say 2020), the hardware used to look decently powerful for most use cases (disks in hundreds of GBs, dozen or two of RAM and quad or hex core processors) but with the advent of LLMs, even disk drives start to look pretty small let alone compute and memory.


And cache! The talk of AI hardware is now "how do we fit these darn things inside SRAM?"


The average PS5 game seems to be around 45GB. Cyberpunk was 250GB.

Distributing 17GB isn’t a big deal if you shove it into Cloudflare R2.


In theory quantized weights of smaller models are under a gigabyte.

If you are looking for megabytes, yeah, those "chat" llms are pretty unusable at that size.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: