Exploiting LaTeX with CVE-2018-17407

tptacek · on Dec 15, 2018

This is a story that probably works better with its original title ("Exploiting LaTeX with CVE-2018-17407") than with the editorialized one, since the editorialized title makes it sound like the vulnerability is what's good about this story, when in fact it's the writeup about the exploit --- the vulnerability itself is maybe not that urgent for most users.

tlb · on Dec 15, 2018

Changed from "Arbitrary code execution vulnerability discovered in pdflatex", thanks.

svat · on Dec 15, 2018

This is so cool! If the bug originates in dvips code then it's probably decades old.

Confession: About a couple of years ago I tried running AFL (as the author here did) on tex, but got nowhere; couldn't even get it to get to the interesting parts. (Took ages even figuring out how to compile TeX.) Good call starting with dvips which works with binary formats... and really cool exploitation here.

There is a lot of code in a TeX distribution: there's the “core” code written by Knuth in WEB, and then there's (probably orders of magnitude larger) all the code of LaTeX (and other) macro packages (written “in TeX”), which are both quite likely harmless. But there's also a lot of other code that gets much less attention... from everything that's been written for TeX (and its extensions) to interface with the system, to common utilities, etc.

rixrax · on Dec 15, 2018

What I find frustrating is that there still has to be an exploit to have these crashes taken seriously/be blog worthy. We know that in c/C++ based programs input parsing errors carry high probability for arbitrary code execution. Instead of just supplying 50 PDFs that seem to crash the program or lib in unique ways and author/vendor fixing their code researchers have to ‘waste’ time writing exploits to really rub it in.

userbinator · on Dec 15, 2018

I couldn’t resist writing an exploit to go along with it

That doesn't sound like the case here. He wrote an exploit because he wanted to, not because he needed to convince anyone.

saagarjha · on Dec 15, 2018

Any sort of memory corruption is usually considered to be “potential arbitrary code execution” unless proven otherwise, even if the bug finder hasn’t written up a PoC for it. Even the most unlikely corruptions have been shown to be exploitable given enough effort, so usually they’re just lumped in the “we should fix this” bin.

jancsika · on Dec 15, 2018

Reading this makes me think the author could skip the fuzzer altogether, grep the C FLOSS universe for the set of old-school, free-wheelin' string handling functions, and then iterate over the results to find the (hopefully) smaller set which can take arbitrary input for at least one of the arguments.

saagarjha · on Dec 15, 2018

> grep the C FLOSS universe for the set of old-school, free-wheelin' string handling functions

You’ll likely find too many to be useful.

userbinator · on Dec 15, 2018

It reads them both and concatenates them together into t1_buf_array with a call to strcat() — but without a bounds check! Oops.

Things like this make me wonder what was going through the mind of the programmer who wrote the code. I learned C less than a decade after it was invented, but the lack of implicit bounds-checking wasn't something I ever forgot. Perhaps it helps that I was using Asm before that. Of course then it was not thought of as a security thing, but just basic correctness.

It's such a simple concept --- make sure there's enough room --- and there is a concrete analogy to it in the real world --- that I continue to be disappointed and amazed at how many times someone manages to get it wrong. Then again, maybe it's just a bias: no one makes the news for doing it right.

rtpg · on Dec 15, 2018

Well you have to _never make a mistake_ to not have issues.

I know my phone won’t follow me magically out the door, I take it with me 99% of the time. I still leave it at home sometimes.

Of course here the “chain my phone to my pants “ solution exists, in the form of linting rules preventing usage of unsafe APIs, and having a safer API that enforces checks (for example a strcat variant that requires reporting the destination container size). Or using checked string libraries instead of raw char*. Not 100% foolproof but could help things.

The biggest difficulty is C’s abstraction ceiling being so low. Hard to do stuff like this without making code much bigger than it already is

tlb · on Dec 15, 2018

Until recently, fonts were something that either came from the vendor, or which installed like software. So they weren't considered to be an exploitation vector, any more than installing any software is an vulnerability. It's only in the last several years with loadable web fonts that the font rendering code has needed armoring against malicious font files.

saagarjha · on Dec 15, 2018

How are you grabbing the address of the place to jump to that calls system? Is the binary not position-independent?

nneonneo · on Dec 15, 2018

The pmap output he shows near the end has

    Address           Kbytes     RSS   Dirty Mode  Mapping
    0000000000400000    2460     832       0 r-x-- pdftex
    0000000000400000       0       0       0 r-x-- pdftex
    0000000000867000       8       8       4 r---- pdftex

which means the core pdftex binary does not have PIE (sadly a very common occurrence on Linux). pdftex handles certain TeX functions that invoke external commands, so it has calls to system().

saagarjha · on Dec 15, 2018

Ah, I should have seen that. I guess I'm too used to macOS, where you have to go out of your way to compile binaries with PIE so it's basically always enabled.

UncleEntity · on Dec 15, 2018

That makes me wonder...if my binaries are compiled on some fedora buildbot and distributed to everyone then wouldn't they all have the same "randomized" layout?

saagarjha · on Dec 15, 2018

The randomization I’m talking about is where the binary is loaded into memory, which protects against the issue you’re talking about of everyone having the same binary (and hence, things are at the same address).

zzo38computer · on Dec 15, 2018

I do not use LaTeX or dvips or Type1 fonts, so I suppose I am not affected. I wrote my own DVI driver that supports PK fonts and converts directly to PBM without needing PostScript.

Probably, other users who do not add any new fonts also would not be affected, I suppose.

That article says "This same vulnerable function is used by other tools in TeX Live: pdflatex, pdftex, dvips and luatex. I only built an exploit for pdflatex, the most widely used of the vulnerable tools." I only use the program "tex", not any of those four.

Still, the article is good and is interesting and explains it, and is good to fix them for users who do use these things.