Hacker News new | past | comments | ask | show | jobs | submit login

It is model-dependent. I've seen that (NVLink benefits) when comparing against PCIe-3 connection, with small batch size, no gradient accumulation.

Once you have larger batch size and gradient accumulation, DDP won't be improved by NVLink I believe (the all-reduce traffic on gradients will be small comparing to your computation overhead).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: