What's rapidly changing are quantization algorithms, and hardware features to support those algorithms. For example, Blackwell GPUs support dynamic FP4 quantization with group size 16. At that group size it's close to lossless (in terms of accuracy metrics).
The professional side of things, yes. For consumer grade GPUs, despite the trends in gaming markets otherwise needing such, the values have stagnated a bit.
I'd assume that, in the context of LLM inference, "recent" generally refers to the Ampere generation and later of GPUs, when the demand for on board memory went through the roof (as, the first truly usable LLMs were trained on A100s).
We've been stuck with the same general caps on standard GPU memory since then though. Perhaps limited in part because of the generational upgrades happening in the bandwidth of the memory, rather than the capacity.