My impression from your comment is that you don't care that much about "standard...

borodi · on Dec 25, 2021

The idea, I imagine, is to differentiate what the julia ML stack offers over what is already on python, if it offers the same thing, but without the funding from facebook or google, why bother switching? It has to offer something more.

ChrisRackauckas · on Dec 26, 2021

If the purpose of the AD was to do something simply for standard ML workflows, the current AD tools are not the right design for that. They are too complex and solve a harder problem than they would need to. A better approach would be to use the abstract interpretation afforded by Symbolics.jl, mixed with the array symbolics, and use MetaTheory.jl to define simplification rules similar to XLA. You'd essentially get a slightly expanded Jax/TensorFlow with graph simplification rules that could be adjusted/improved directly from the host language. There's a prototype (ReversePropogation.jl), but it needs array support to actually be useful for this application. Or if you don't need the whole stack to be modifiable from Julia, a better interpreter on XLA.jl would get you there. A sufficiently decent lone coder could get those up and optimized fairly quickly for standard ML applications, if that's the goal.

Diffractor.jl has a much loftier goal: optimized differentiable programming of any code from any package in the Julia ecosystem. Because it's building typed IR, it will need a full set of Julia-based analysis tools (escape analysis, loop-invariant code motion, etc.) to approach the amount of optimization XLA can do when XLA optimizations are applicable. While such passes are being developed (for example, this is the PR for putting immutable array optimizations into the language so that Diffractor-friendly immutable can generate the optimized mutable form: https://github.com/JuliaLang/julia/pull/42465), it's at least a few years away before it's doing something like reliably combining multiple matrix-vector products into a matrix-matrix BLAS3 call. That would put it on even footing today to compete against PyTorch in a kernel vs kernel optimization battle, but not against TensorFlow code in cases where XLA optimizations are doing something more.

"Just wait 3 years and it will be really cool" is not a good way to start building a robust community, instead those interested in it need to ask how to demonstrate the improvements afforded by the added generality today. That's why what's making the project tick right now is the ARPA-E projects for physics-informed neural networks, the DJ4Earth project to do direct differentiation of the CLIMA climate model without changing any of the model code (https://dj4earth.github.io/), etc. Those kinds of projects are what is keeping a lot of the dev team open to be full time on these AD and compiler optimization projects. But if successful, it will also give an AD that is great for standard ML.