Instead of the Reddit corpus you may just as well use a picture library of human...

Instead of the Reddit corpus you may just as well use a picture library of human footprints. It would be no more optimistic.

Human speech is produced from the conscious experience of being a human being. If your dataset contains just the speech, without the experience, there's simply not enough there. Any machine trained on this data is doomed to talk hollow rubbish.