Yes, it's more a rule of thumb than napkin math I suppose. The difference allows space for the KV cache which scales with both model size and context length, plus other bits and bobs like multimodal encoders which aren't always counted into the nameplate model size.