I suspect it's far more likely for "foo_xs[k] += foo_dxs[k]" to be auto-vec'd than for "foo.x += foo.dx" via templates to be auto-vec'd. The first is very straightforward and obvious with a clean auto-vec route, the later is not.
I've obviously not benchmarked, but from what I can see all the complexity is in the templates, which only exist at compile-time. From what it looks like to me, it should all dissolve before the optimizer gets to peek at it.