Using a large & diverse training set is the best regulariser, but I think there is also weight decay and dropout in transformers
Using a large & diverse training set is the best regulariser, but I think there is also weight decay and dropout in transformers