That line refers to training the model from scratch. You can still run the trained model very quickly with one "cheap" GPU.
That said, I'm not sure why one wouldn't get a similar result training on the EC2 or GCE instances that have 8 V100s. Or even training with fewer GPUs but accumulating gradients to get the same batch size.