Yep, that's it. Different platforms need slightly different assembly and, although I wouldn't choose Perl for it, it's not fundamentally that different from any other text-to-text transform.
Handwritten asm still outperforms compiler output by enough to warrant its use in hot functions like AES-GCM.
(NaCl also uses a text-to-text transform system to output asm, although it's written in C and does register allocation too. The important aspect of NaCl is djb & friend's attention to quality.)
While some of NaCl's kernels are written using qhasm, qhasm itself is not part of the build process of NaCl. This means that those functions cannot be ran on, say, Windows, because the calling convention (and the assembly syntax) is different. qhasm also does not have programming capabilities, so one would need a lot of code duplication to write an SSE2, SSSE3, AVX2, etc variant of the same function. While OpenSSL's perlasm is an abomination, it does produce "portable" assembly output, and there is no obvious superior option to accomplish that. This is something sorely missing from our collective toolchains.
It sounds like this perlasm is the state of the art in maximally efficient high level languages. (Half joking.)
I feel like if you took something like qhasm and extended it to be a full language rather than sort of a hack, you would end up with something really beautiful, like Forth minus the crazy and plus both standard assembly and high level syntax. It could be a better C than C.
I don't see how this is any improvement on just writing plain assembler? Apart from the perl-style comments, is this any cleaner than say, some code compiled with yasm?
[ed: I guess what I'm missing is that there's probably more similarities between the x86 flavours and ARM flavours -- but I still think one would need to look at, and understand, the generated assembly for each platform in order to audit the code. So I still wonder a bit about what the gain is.]
Handwritten asm still outperforms compiler output by enough to warrant its use in hot functions like AES-GCM.
(NaCl also uses a text-to-text transform system to output asm, although it's written in C and does register allocation too. The important aspect of NaCl is djb & friend's attention to quality.)