Saxe Ganguli McClelland, 2013, about linear nets and orthogonal initialization. But then, read Li Jiao Han Weissman 2017 (maybe preprint), "Demystifying ResNet", which makes a nice claim about the niceness being conditioning of Hessian at init.
Tldr: it's good conditioner but you can do better ab initio
Tldr: it's good conditioner but you can do better ab initio