Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What we have seen in the past 10 years of machine learning is that it’s extremely hard to know the next technique that will be practical in a vast array of problems. We had CNNs, batch norm, lstms, transformers, self supervised learning, reinforcement learning and a few other techniques that need to be perfected, thousands of ideas to be built upon, but nobody knows the next big thing that will work on real life problems.


Transformers subverting convolution on their own turf (vision) was certainly unexpected.


That's how i felt at first, but getting deeper into the Swin transformer paper it actually makes a fair bit of sense - convolutions can be likened to self-attention ops that can only attend to local neighborhoods around pixels. That's a fairly sensible assumption for image data, but it also makes sense that more general attention would better capture complex spatial relationships if you can find a way to make it computationally feasible. Swin transformers certainly go through some contortions to get there, and I bet we'll see cleaner hierarchical architectures in the future, but the results speak for themselves.


The transformer in transformer (TnT) model looks promising - you can set up multiple overlapping domains of attention, at arbitrary scales over the input.


But you have to pay the price for losing the inductive bias of cnns

Swin are still cpu/memory (and data) intensive compared to CNNs, right?


Not as much as you'd think. The original paper sets up its models so that Swin-T ~ ResNet-50 and Swin-S ~ ResNet-101 in compute and memory usage. They're still a bit higher in my experience, but i can also do drop-in replacements for ResNets and get better results on the same tasks and datasets, even when the datasets aren't huge.


For me it was quite the opposite feeling: after the attention all you need paper I thought that convolutions will become obsolete quite fast. AFAIK it still didn't happen completely, something is still missing in unifying the two approaches.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: