Where I work, a tax policy think tank, we purchase from the IRS an anonymized data set of about 100k sample tax returns that we use for modeling the effects of changes to tax policy. We've got an agreement with the IRS that we can never share that dataset, and I'm pretty sure this is why. While I'd love to be able to make our tax model results more transparent, the risk of de-anonymization is too high. I think another tax policy group is trying to create a synthetic dataset that is close to the sample in terms of outputs but is entirely made up, so that it could be used for verification of results by third parties--I hope they succeed.