Yeah I'm having trouble understanding why this isn't something that's already accomplishable using SD + ControlNet. I guess Segment Anything lets you not need to manually segment an image? The example images don't seem to be particularly better than what one could generate without segmenting using autogenerated canny or scribble or depth controlnet images to me though.
Segment Anything is orders of magnitude more powerful than anything we've ever seen before within the real of image segmentation and the first 'foundation model' in that regard, so this is a real step-change in background removal/editing, etc. -- hell, we might not even need green screens anymore!
Segmenting really well, like how a human would do it has been for some reason really really hard for a computer to do. I guess because it requires understanding of what objects actually are. So that part is amazing that it works so well.
And then the control net part seems more magical just because you have a REALLY accurate input