It's always going to be a huge quantity of data. Even as efficiency improves, storage and bandwidth are so cheap now that the incentive will be to convert that efficiency towards performance (models with more parameters, ensembles of models, etc) rather than chasing some micro-model that doesn't do as well. It might not always be 17GB, but don't expect some lesser order of magnitude for anything competitive.
As maturity arrives, we'll likely see a handful of competing local models shipped as part of the OS or as redistributable third-party bundles (a la the .NET or Java runtimes) so that individual applications don't all need to be massive.
You'll either need to wait for that or bite the bullet and make something chonky. It's never going to get that small.
These won't be smaller I guess. Given we keep the number of parameters same.
Pre LLM era (let's say 2020), the hardware used to look decently powerful for most use cases (disks in hundreds of GBs, dozen or two of RAM and quad or hex core processors) but with the advent of LLMs, even disk drives start to look pretty small let alone compute and memory.