The sweet spot for running local LLMs (from what I'm seeing on forums like r/localLlama) is 2 to 4 3090s each with 24GB of VRAM. NVidia (or AMD or Intel) would clean up if they offered a card with 3090 level performance but with 64GB of VRAM. Doesn't have to be the leading edge GPU, just a decent GPU with lots of VRAM. This is kind of what Digits will be (though the memory bandwidth is going to be slower with because it'll be DDR5) and kind of what AMD's Strix Halo is aiming for - unified memory systems where the CPU & GPU have access to the same large pool of memory.
The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)
Because if you don't have infinite money, considering whether to buy a thing is about the ratio of price to performance, not just performance. If you can get enough performance for your needs out of a cheaper chip, you buy the cheaper chip.
The AI industry isn't pausing because DeepSeek is good enough. The industry is in an arms race to AGI. Having a more efficient method to train and use LLMs only accelerates progress, leading to more chip demand.
In the long run, yes, they will be cheaper due to more competition and better tech. But next month? It will be more expensive.