I genuinely recommend considering AMD options. I went with a 7900 XTX because it has the most VRAM for any $1000 card (24 GB). NVIDIA cards at that price point are only 16 GB. Ollama and other inference software works on ROCm, generally with at most setting an environment variable now. I've even run Ollama on my Steam Deck with GPU inferencing :)
Thanks, I chose a 3090 instead of 4070ti, it was around $200 cheaper and has 24GB vs 16GB VRAM and a similar performance. The only drawback is the 350W TDP.
I still struggle with the RAM issue on Ollama, where it uses 128GB/128GB RAM for Mixtral 24.6GB, even though Docker limit is set to 90GB.