> This is a large number of long chain-of-thought reasoning examples (600,000 of...

> This is a large number of long chain-of-thought reasoning examples (600,000 of them). These are very hard to come by and very expensive to label with humans at this scale. Which is why the process to create them is the second special thing to highlight

I didn't know the reasonings were part of the training data. I thought we basically just told the LLM to "explain its thinking" or something as an intermediate step, but the fact that the 'thinking' is part of the training step makes more sense and I can see how this improves things in a non-trivial way.

Still not sure if using word tokens as the intermediate "thinking" is the correct or optimal way of doing things, but I don't know. Maybe after everything is compressed into latent space it's essentially the same stuff.