Image generation models didn't improve. Someone figured out how to use them to do something new.
Anyway models have been improving for a long time but when was the last time you saw a news item about the progress in machine translation? That was probably around 2016-17 when the hype cycle for neural machine translation was at its apex. You don't hear anything more about it in the news because the hype cycle has now dipped to its bottom-most pit of indifference and the hype about being able to speak to a machine and communicate with any human on the planet in their native language has been replaced with, for example, the hype about being able to generate human-like art.
My point being, just because performance increased in the past in one particular task, it doesn't mean that it will improve in all other tasks, or, indeed, that it will keep improving in the same task. For all we know what we've seen so far of image generation is all we'll get for the next ten years or so.
I don't have a crystal ball, but I do have history books, so I don't make predictions. We can't even speak about the past with confidence, let alone the future.
As we have seen with text-to-image, sometimes AI models can improve at an astonishing rate.