This is a story that probably works better with its original title ("Exploiting LaTeX with CVE-2018-17407") than with the editorialized one, since the editorialized title makes it sound like the vulnerability is what's good about this story, when in fact it's the writeup about the exploit --- the vulnerability itself is maybe not that urgent for most users.
This is so cool! If the bug originates in dvips code then it's probably decades old.
Confession: About a couple of years ago I tried running AFL (as the author here did) on tex, but got nowhere; couldn't even get it to get to the interesting parts. (Took ages even figuring out how to compile TeX.) Good call starting with dvips which works with binary formats... and really cool exploitation here.
There is a lot of code in a TeX distribution: there's the “core” code written by Knuth in WEB, and then there's (probably orders of magnitude larger) all the code of LaTeX (and other) macro packages (written “in TeX”), which are both quite likely harmless. But there's also a lot of other code that gets much less attention... from everything that's been written for TeX (and its extensions) to interface with the system, to common utilities, etc.
What I find frustrating is that there still has to be an exploit to have these crashes taken seriously/be blog worthy. We know that in c/C++ based programs input parsing errors carry high probability for arbitrary code execution. Instead of just supplying 50 PDFs that seem to crash the program or lib in unique ways and author/vendor fixing their code researchers have to ‘waste’ time writing exploits to really rub it in.
Any sort of memory corruption is usually considered to be “potential arbitrary code execution” unless proven otherwise, even if the bug finder hasn’t written up a PoC for it. Even the most unlikely corruptions have been shown to be exploitable given enough effort, so usually they’re just lumped in the “we should fix this” bin.
Reading this makes me think the author could skip the fuzzer altogether, grep the C FLOSS universe for the set of old-school, free-wheelin' string handling functions, and then iterate over the results to find the (hopefully) smaller set which can take arbitrary input for at least one of the arguments.
It reads them both and concatenates them together into t1_buf_array with a call to strcat() — but without a bounds check! Oops.
Things like this make me wonder what was going through the mind of the programmer who wrote the code. I learned C less than a decade after it was invented, but the lack of implicit bounds-checking wasn't something I ever forgot. Perhaps it helps that I was using Asm before that. Of course then it was not thought of as a security thing, but just basic correctness.
It's such a simple concept --- make sure there's enough room --- and there is a concrete analogy to it in the real world --- that I continue to be disappointed and amazed at how many times someone manages to get it wrong. Then again, maybe it's just a bias: no one makes the news for doing it right.
Well you have to _never make a mistake_ to not have issues.
I know my phone won’t follow me magically out the door, I take it with me 99% of the time. I still leave it at home sometimes.
Of course here the “chain my phone to my pants “ solution exists, in the form of linting rules preventing usage of unsafe APIs, and having a safer API that enforces checks (for example a strcat variant that requires reporting the destination container size). Or using checked string libraries instead of raw char*. Not 100% foolproof but could help things.
The biggest difficulty is C’s abstraction ceiling being so low. Hard to do stuff like this without making code much bigger than it already is
Until recently, fonts were something that either came from the vendor, or which installed like software. So they weren't considered to be an exploitation vector, any more than installing any software is an vulnerability. It's only in the last several years with loadable web fonts that the font rendering code has needed armoring against malicious font files.
which means the core pdftex binary does not have PIE (sadly a very common occurrence on Linux). pdftex handles certain TeX functions that invoke external commands, so it has calls to system().
Ah, I should have seen that. I guess I'm too used to macOS, where you have to go out of your way to compile binaries with PIE so it's basically always enabled.
That makes me wonder...if my binaries are compiled on some fedora buildbot and distributed to everyone then wouldn't they all have the same "randomized" layout?
The randomization I’m talking about is where the binary is loaded into memory, which protects against the issue you’re talking about of everyone having the same binary (and hence, things are at the same address).
I do not use LaTeX or dvips or Type1 fonts, so I suppose I am not affected. I wrote my own DVI driver that supports PK fonts and converts directly to PBM without needing PostScript.
Probably, other users who do not add any new fonts also would not be affected, I suppose.
That article says "This same vulnerable function is used by other tools in TeX Live: pdflatex, pdftex, dvips and luatex. I only built an exploit for pdflatex, the most widely used of the vulnerable tools." I only use the program "tex", not any of those four.
Still, the article is good and is interesting and explains it, and is good to fix them for users who do use these things.