I believe this is more of an optimization layer to be utilized by libraries like...

jacoblambda · on July 28, 2021

Im curious how it would compare to Halide Lang. They both seem to be targetting the same problem.

sanxiyn · on July 29, 2021

https://triton-lang.org/programming-guide/chapter-2/related-... has comparison with Halide, which is categorized as "Scheduling Languages" there.

blueblisters · on July 28, 2021

So is this similar to XLA?

jpf0 · on July 28, 2021

XLA is domain-specific compiler for linear algebra. Triton generates and compiles an intermediate representation for tiled computation. This IR allows more general functions and also claims higher performance.

obligatory reference to the family of work: https://github.com/merrymercy/awesome-tensor-compilers

anon_tor_12345 · on July 28, 2021

Without reading the paper, I think you have it a little backwards - the IR doesn't itself allow for more general functions. More general functions are possible (in theory) because the frontend (this Triton language) is decoupled from the backend (CUDA) through the IR as an interface. In this way the Triton IR is no less domain specific than XLA (because both are IRs that represent sequences of operators that run on GPU (or TPU or whatever). I guess in theory Triton could be eschewing all of eg cuDNN but most likely it's not as NVIDIA's closed source kernels perform best on their closed source hardware.

Edit: should've read the post before commenting. Looks like they are in fact using LLVM's PTX backend (ie generating cuda kernels from scratch). Kudos to them