Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You would likely be limited by the communication latency between nodes, unless you come up with some unique model architecture or training method. Most of these large scale models are trained on GPUs using very high speed interconnects.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: