Note this is exactly how a smart spammer would generate text (sampling from a la...

Note this is exactly how a smart spammer would generate text (sampling from a language model, built on a public ally available data set like google ngrams or Wikipedia). If you wanted to catch someone doing this, you're much better off using your own corpus to generate a language model, as a spammer would have to scrape all your data to reconstruct the same thing.

Then, run the model over your data and start playing whack-a-mole (and refining the model).