You’re right that it’s analogous in concept, but strategy distillation happens at a higher level: it encodes and transfers successful latent reasoning patterns as reusable “strategies,” without necessarily requiring direct gradient updates to the original model weights.