transformers are GNNs where all nodes are connected to all nodes (well in the de...

eachro · on Sept 19, 2023

If they don't actually help then the attention weight for that connection would tend to 0, right? Then it becomes a problem of overfitting which we have a large arsenal to combat.

SpaceManNabs · on Sept 19, 2023

> If they don't actually help then the attention weight for that connection would tend to 0, right?

Not necessarily. That is what this paper is about. From what I understand, they also consider graph attention networks.