This is something we're planning on doing - just generate a large bit of text wi...

This is something we're planning on doing - just generate a large bit of text with markdown text and code in the middle. This is actually how the newer models already generate code - with the only difference being there's only one code block.

Via the use of <thinking></thinking> blocks, it's pretty straightforward to get the the model to evaluate it's own work and plan the next steps (basically chain of thought) but then you can filter out the <thinking> block in the final output.

The last trick to making this actually work is to give the AI model evaluation power - make it be able to run certain inspection code to evaluate its decisions so far and feel that evaluation to the next set of steps.

Combining all of this, it's very possible to convert an AI chat into a multi-step markdown + code notebook that actually works.