Another paper related to AD was posted last month: The Simple Essence of Automatic Differentiation [0]. Video presentation [1]. I found it pretty awesome, just in case you hadn't seen it.
Also not mentioned in the Baydin et al paper (which has been knocking about for a while: it was submitted in 2017, so not surprising):
* F. Wang, X. Wu, G. Essertel, J. Decker, T. Rompf, Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator, see [1], and [4] for a presentation based on this paper.
* S. Laue, M. Mitterreiter, J. Giesen, Computing Higher Order Derivatives of Matrix and Tensor Expressions, see [2], discussed in [3].
Quite a bit of exciting work in better understanding backpropagation going on right now.
[0]: https//news.ycombinator.com/item?id=18306860
[1]: https://www.youtube.com/watch?v=ne99laPUxN4