There is an obvious win/loss situation for games though, the same is not true fo...

whimsicalism · on May 21, 2023

Right, as I said this is an unsolved problem.

dzamo_norton · on May 21, 2023

I wonder whether the problem could even become sufficiently well defined to admit any agreed upon loss function? You must debate with the goal of maximising the aggregate wellbeing (definition required) of all living and future humans (and other relatable species)?

whimsicalism · on May 21, 2023

It would require some sort of continuously tuned arbiter, ie. similar to in RLHF as well as an adversarial-style scheme a la GAN. But I really am spitballing here - research could absolutely go in a different direction.

But lets say you reduced it to some sort of 'trying to prove a statement' that can be verified along with a discriminator model, then compare two iterations based on whether they are accurately proving the statement in english language.