That's how i felt at first, but getting deeper into the Swin transformer paper i...

robbedpeter · on Dec 16, 2021

The transformer in transformer (TnT) model looks promising - you can set up multiple overlapping domains of attention, at arbitrary scales over the input.

algo_trader · on Dec 16, 2021

But you have to pay the price for losing the inductive bias of cnns

Swin are still cpu/memory (and data) intensive compared to CNNs, right?

heyitsguay · on Dec 16, 2021

Not as much as you'd think. The original paper sets up its models so that Swin-T ~ ResNet-50 and Swin-S ~ ResNet-101 in compute and memory usage. They're still a bit higher in my experience, but i can also do drop-in replacements for ResNets and get better results on the same tasks and datasets, even when the datasets aren't huge.