That just means you used a predictor that wasn't good enough (was this at least 4 ? or 3.5 ?) not that a GPT-X would need to physically taste recipes to generate novel recipes that tasted good.
GPT-2 was mostly an incoherent babbling mess but that didn't mean a better predictor couldn't be coherent.
GPT-3 could not play chess at all but that didn't mean a better predictor couldn't play chess (3.5-turbo-instruct)
Taste is implicit in recipes so a good enough predictor has to model it somehow to succeed, no physical experimentation necessary.
GPT-2 was mostly an incoherent babbling mess but that didn't mean a better predictor couldn't be coherent.
GPT-3 could not play chess at all but that didn't mean a better predictor couldn't play chess (3.5-turbo-instruct)
Taste is implicit in recipes so a good enough predictor has to model it somehow to succeed, no physical experimentation necessary.