The code is a basically irrelevant fraction of the model weights. The raw FP16 i...

a1o · on Feb 23, 2024

17GB looks like a lot. Thanks, I will wait until people figure how to make these smaller before trying to use to make something standalone.

swatcoder · on Feb 23, 2024

It's always going to be a huge quantity of data. Even as efficiency improves, storage and bandwidth are so cheap now that the incentive will be to convert that efficiency towards performance (models with more parameters, ensembles of models, etc) rather than chasing some micro-model that doesn't do as well. It might not always be 17GB, but don't expect some lesser order of magnitude for anything competitive.

As maturity arrives, we'll likely see a handful of competing local models shipped as part of the OS or as redistributable third-party bundles (a la the .NET or Java runtimes) so that individual applications don't all need to be massive.

You'll either need to wait for that or bite the bullet and make something chonky. It's never going to get that small.

wg0 · on Feb 23, 2024

These won't be smaller I guess. Given we keep the number of parameters same.

Pre LLM era (let's say 2020), the hardware used to look decently powerful for most use cases (disks in hundreds of GBs, dozen or two of RAM and quad or hex core processors) but with the advent of LLMs, even disk drives start to look pretty small let alone compute and memory.

brucethemoose2 · on Feb 23, 2024

And cache! The talk of AI hardware is now "how do we fit these darn things inside SRAM?"

sillysaurusx · on Feb 23, 2024

The average PS5 game seems to be around 45GB. Cyberpunk was 250GB.

Distributing 17GB isn’t a big deal if you shove it into Cloudflare R2.

brucethemoose2 · on Feb 23, 2024

In theory quantized weights of smaller models are under a gigabyte.

If you are looking for megabytes, yeah, those "chat" llms are pretty unusable at that size.