I read somewhere which I can't find now, that for the -reasoning- models they tr...

		huseyinkeles 6 months ago \| parent \| context \| favorite \| on: QwQ-32B: Embracing the Power of Reinforcement Lear... I read somewhere which I can't find now, that for the -reasoning- models they trained heavily to keep saying "wait" so they can keep reasoning and not return early.