If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)
Yeah, I guess so. The models are only 4 or 8 GB. A giant list of hashes would be bigger, sure. But they're 2 very different things. Model is for generating new images, this hash database is copyright enforcement. If you really want to check for violations I don't know how else you're going to do it.