Another paper related to AD was posted last month: The Simple Essence of Automat...

YorkshireSeason · on Nov 20, 2018

Also not mentioned in the Baydin et al paper (which has been knocking about for a while: it was submitted in 2017, so not surprising):

* F. Wang, X. Wu, G. Essertel, J. Decker, T. Rompf, Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator, see [1], and [4] for a presentation based on this paper.

* S. Laue, M. Mitterreiter, J. Giesen, Computing Higher Order Derivatives of Matrix and Tensor Expressions, see [2], discussed in [3].

Quite a bit of exciting work in better understanding backpropagation going on right now.

[1] https://arxiv.org/abs/1803.10228

[2] http://www.matrixcalculus.org/matrixcalculus.pdf

[3] https://news.ycombinator.com/item?id=18464003

[4] https://www.youtube.com/watch?v=igRLKYgpHy0