Not as much as you'd think. The original paper sets up its models so that Swin-T...

Not as much as you'd think. The original paper sets up its models so that Swin-T ~ ResNet-50 and Swin-S ~ ResNet-101 in compute and memory usage. They're still a bit higher in my experience, but i can also do drop-in replacements for ResNets and get better results on the same tasks and datasets, even when the datasets aren't huge.