They don't address that. They just assume random sampling, so there's no equival...

They don't address that. They just assume random sampling, so there's no equivalent to human curation or quality metrics, which would preserve tails or, by manual use, create tails. The contraction they observe is pretty much what you would expect in the random sampling setting, since you can only lose tails with a finite sample, and never gain them. (They also need to have a very large ratio of synthetic to real/original data.)

So, while interesting for nailing down that phenomenon, the broader implications everyone wants to draw from it are not very good - very few people are using random GPT-3/4 or Stable Diffusion samples!