They can't publish their training databases because that would be publishing of copyrighted material which is illegal. They can only train which is potential fair use.
It would be more accurate to say that they don't publish their training databases (including sanitized pointers to the copyrighted stuff) because they aren't sure that training is fair use.
They are sure, however, that it is a kind of infringement. Citing "fair use" is an admission of infringement - just a specific kind of infringement that is allowed.
Great many, if you care to read a bit more of the biographies, autographies, history of music books, interviews, blogs, etc.