Hacker News new | past | comments | ask | show | jobs | submit login

For both inference and training I haven't seen any modern LLM stack take more time for multiple GPUs/tensor parallelism

I would take 1 RTX 6000 Ada, but if you mean the pre-Ada 6000, 2x4090 is faster for minimal hassle for most common usecases




I mean the newest ones. I only do LLM inference, whereas my training load is all DistilBERT models and the A6000 is a beast at cranking those out.

Also by “time” I mean my time setting up the machine and doing sys admin. Single card is less hassle.


The A6000 predates Ada?

There is the RTX 6000 Ada (practically unrelated to the A6000) which has 4090 level performance, that what you're referring to?



That's an Ampere A6000, one generation older than the Ada A6000. Nvidia decided that confusing model names are a good way to sell old products at a premium.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: