That just means you used a predictor that wasn't good enough (was this at least ...

That just means you used a predictor that wasn't good enough (was this at least 4 ? or 3.5 ?) not that a GPT-X would need to physically taste recipes to generate novel recipes that tasted good.

GPT-2 was mostly an incoherent babbling mess but that didn't mean a better predictor couldn't be coherent.

GPT-3 could not play chess at all but that didn't mean a better predictor couldn't play chess (3.5-turbo-instruct)

Taste is implicit in recipes so a good enough predictor has to model it somehow to succeed, no physical experimentation necessary.