Training can be done in under an hour[1], really not that long. And yes, what OP is saying is already possible, which seems to be par for the course for this "new" AI space, as it's moving so fast.
It is possible, but right now it still takes quite a bit of time and effort to get it right.
The main challenge is finding the right balance between "make something that looks exactly like this" and "put it in a completely different context". Better similarity equals less flexibility.
For now, the most effective combination will be artist + AI, although it does feel a bit like those that incorporate it in their workflow are helping to dig their own grave.
Do you happen to have any screenshots of what you mean? I’m really curious to see dreambooth’s capabilities in the field, and it sounds like you’ve had experience with some of its pitfalls.
Basically OP says that overfitting is a common pitfall, you don't want to overtrain the model, because then everything will look like your training data, and vice versa with not enough training steps. So it's a bit of a balance. If you search for "dreambooth" on the SD subreddit, you will see a lot of examples of dreambooth results and also some that show overfitted and underfitted results. https://www.reddit.com/r/StableDiffusion/search?q=dreambooth...
https://textual-inversion.github.io/
https://dreambooth.github.io/