For both inference and training I haven't seen any modern LLM stack take more ti...

griomnib · 2024-12-06T19:32:58 1733513578

I mean the newest ones. I only do LLM inference, whereas my training load is all DistilBERT models and the A6000 is a beast at cranking those out.

Also by “time” I mean my time setting up the machine and doing sys admin. Single card is less hassle.

BoorishBears · 2024-12-06T20:46:32 1733517992

The A6000 predates Ada?

There is the RTX 6000 Ada (practically unrelated to the A6000) which has 4090 level performance, that what you're referring to?

griomnib · 2024-12-06T21:10:30 1733519430

This one.

https://www.bhphotovideo.com/c/product/1607840-REG/pny_techn...

zargon · 2024-12-07T01:35:36 1733535336

That's an Ampere A6000, one generation older than the Ada A6000. Nvidia decided that confusing model names are a good way to sell old products at a premium.