Hacker News new | past | comments | ask | show | jobs | submit login

"To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs."



That line refers to training the model from scratch. You can still run the trained model very quickly with one "cheap" GPU.

That said, I'm not sure why one wouldn't get a similar result training on the EC2 or GCE instances that have 8 V100s. Or even training with fewer GPUs but accumulating gradients to get the same batch size.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: