I have an A6000, it’s about the most affordable for 48 GB VRAM (you can find for...

paxys · 2024-12-06T18:38:37 1733510317

Why not 2 x 4090? Will be cheaper than A6000 if you can manage to find them at msrp, and will perform a lot better.

griomnib · 2024-12-06T18:49:39 1733510979

My time is worth a lot of money and 2x 4090 is more work, so it’s net more expensive in real terms.

BoorishBears · 2024-12-06T19:13:57 1733512437

For both inference and training I haven't seen any modern LLM stack take more time for multiple GPUs/tensor parallelism

I would take 1 RTX 6000 Ada, but if you mean the pre-Ada 6000, 2x4090 is faster for minimal hassle for most common usecases

griomnib · 2024-12-06T19:32:58 1733513578

I mean the newest ones. I only do LLM inference, whereas my training load is all DistilBERT models and the A6000 is a beast at cranking those out.

Also by “time” I mean my time setting up the machine and doing sys admin. Single card is less hassle.

BoorishBears · 2024-12-06T20:46:32 1733517992

The A6000 predates Ada?

There is the RTX 6000 Ada (practically unrelated to the A6000) which has 4090 level performance, that what you're referring to?

griomnib · 2024-12-06T21:10:30 1733519430

This one.

https://www.bhphotovideo.com/c/product/1607840-REG/pny_techn...

zargon · 2024-12-07T01:35:36 1733535336

That's an Ampere A6000, one generation older than the Ada A6000. Nvidia decided that confusing model names are a good way to sell old products at a premium.

zh3 · 2024-12-08T11:59:36 1733659176

Running llama3.3:70b here on a pair of eBay Dell RTX3090s in an old (2012!) i3770 workstation - ollama reports 16.67 tokens/sec.