That depends entirely on what the software needs to do.
For image decoding in particular, you can put the software into an exceptionally restrictive sandbox, or use a language that builds in the same restrictions.
No I/O. No system calls. Just churn internally and fill a preallocated section of memory with RGBA.
The broader system will still have weaknesses, but it won't have this kind, and this kind keeps happening.
That's an awfully big preallocated array you have there. It would be pretty inefficient for that section of memory to be copied around, right? Let's map it into both processes. Also, image decoding is pretty hard, let's offload some of it to dedicated hardware. Of course, that hardware needs to have access to it mediated by the kernel. And the hardware needs to be able to access that shared memory, which was of course allocated correctly and the IOMMU setup was done correctly…
You see how even simple things are difficult to secure when they have to be implemented in practice?
Mapping pure RGBA across processes is safe, but also a single extra copy is not a big performance impact in the first place for an image decoder.
Configuring the IOMMU is one of the easiest parts of doing it in hardware. That's not going to make things "difficult to secure". And allocating the chunk of memory is trivial.
If you've seen an exploit caused by a big pre-allocated array of untrusted RGBA data, please explain how.
(If you mean they put evil data through it and then used a separate exploit to run it, that's not a vulnerability, that's just "data transfer exists".)
And you seeing someone screw up an IOMMU doesn't disqualify it from being one of the easiest parts of a hardware decoder.
Code to calculate size of preallocated array is incorrect. Size ends up too small or underflows.
Buffer is reused across calls. Buffer is actually mapped across processes and thus page-aligned. Code to check how much space is needed checks number of pages versus actual number of bytes, and fails to clear leftover data correctly.
Code receives RGBA buffer but expects some other encoding. Accidentally reads out of bounds as a result.
You can definitely say “oh these are stupid and I wouldn’t screw this up” but people do and that’s what really matters.
> Code to calculate size of preallocated array is incorrect. Size ends up too small or underflows.
If you go outside the array you copied/mapped out of the sandbox, then that doesn't let the attacker code escape the sandbox, you just put some of your own data onto the screen.
If you mean the sandbox isn't given enough memory, then that will make the sandbox exit when it hits unmapped addresses.
And how did you screw up length x width x 4?
> Buffer is reused across calls. Buffer is actually mapped across processes and thus page-aligned. Code to check how much space is needed checks number of pages versus actual number of bytes, and fails to clear leftover data correctly.
The sandboxed process doesn't have any way to exfiltrate data. At most it can display it back to you, which is not really any worse than innocent code which could also send back the leftover data.
> Code receives RGBA buffer but expects some other encoding. Accidentally reads out of bounds as a result.
Reads out of bounds and does what with it? That doesn't sound like a vulnerability to me. It might display private data or crash, but that's entirely of its own volition. The behavior would be the same between innocent code in the sandbox and malicious code in the sandbox.
There’s a million ways to screw that up. People botch SCM merges. People are hungover. People are distracted. People are tired. People are heartbroken. People are going through divorces. People have parents dying. People forget numbers. People make copy-paste mistakes. All the time.
> The sandboxed process doesn't have any way to exfiltrate data.
You can abuse it to gain a (known-page-offset) write primitive in the other, non-sandboxed process to which the buffer is also mapped.
But if you do that every image looks wrong and it's vanishingly unlikely to get into a release.
> You can abuse it to gain a (known-page-offset) write primitive in the other, non-sandboxed process to which the buffer is also mapped.
There's no reason to have the memory mapped into both processes at once, and you can't exploit the bytes you write without a real vulnerability.
Since it's a one-shot write into the buffer, if your intent is using it as an exploit step then you might as well encode an actual image with your exploit-assisting bytes.
> But if you do that every image looks wrong and it's vanishingly unlikely to get into a release.
The code doesn't have to be wrong for every input. It may be wrong just for pathological cases that don't occur in the field unless specifically crafted.
> Since it's a one-shot write into the buffer, if your intent is using it as an exploit step then you might as well encode an actual image with your exploit-assisting bytes.
The assumption was that the code tries to clean up the buffer immediately after use.
> The code doesn't have to be wrong for every input. It may be wrong just for pathological cases that don't occur in the field unless specifically crafted.
I could argue this more but it doesn't matter, that was just a little tangent, getting the size wrong will not let anything out of the sandbox.
> The assumption was that the code tries to clean up the buffer immediately after use.
Cleaning up would be removing the mmap. How are you going to exploit that? Your scenario is not very clear.
I think you're going for a situation where the sandboxed process can write to data in the host process outside the buffer? In a general sense I can imagine ways for that scenario to occur, but I can't figure out how you could get there via mmapping a buffer badly. A buffer mmap won't overlap anything else. If the mmap is too small then either process could read past the end, but would only see its own data (or a page fault).
> Cleaning up would be removing the mmap. How are you going to exploit that? Your scenario is not very clear.
If a buffer is going to be reused across calls, then cleaning after use is not the same thing as unmapping. One example for cleaning up a buffer after use would be zeroing.
If there's a bug in the calculation for the amount of zeroing needed, then leftover attacker-controlled data can bleed back from the sandboxed into the unsandboxed process and survive beyond the current transaction (because the code failed to zero the buffer correctly after use).
In other words, the attacker can now write arbitrary data into the unsandboxed process's memory at a semi-known location (known page offset) inside the mapped buffer. That data may not be very useful on its own, because it's still confined to the mmapped buffer. But it's now relatively well protected from reuse (until the next decoding task arrives).
That's plenty of time to do shenanigans. For example, you can combine it with an (unrelated) stack buffer overflow that may exist in the unsandboxed process, harmless on its own but more powerful if combined with an attacker-controlled gadget in a known location.
It's hard to see why the buffer wouldn't be per image. There's no reason to reuse that.
> In other words, the attacker can now write arbitrary data into the unsandboxed process's memory at a semi-known location (known page offset) inside the mapped buffer.
But what is the arbitrary data going to be?
1. If it's gadgets with known lower bits, then you could put that into a plain-old image file, no decoder exploits needed. Also this requires the second dumb mistake of the coder going out of their way to mark the buffer as executable.
2. If it's data you want to exfiltrate, you could just gather that after you trigger your unrelated exploit. This is only useful if everything aligns to drop the private data you want in that specific section of memory, and then the buffer is reused, and then the private data is removed from everywhere else, and then you run an unrelated exploit to actually give you control. This is exceptionally niche.
> It's hard to see why the buffer wouldn't be per image. There's no reason to reuse that.
Premature optimization is a thing. Most software developers are prone to it in one way or another. They may just assume a performance gain, design accordingly and move on. They may be working under a deadline tight enough so they never even consider checking their assumptions.
Or maybe the developer has actually run the experiment and found that reusing the buffer does yield a few percent of extra performance.
> But what is the arbitrary data going to be?
An internal struct whose purpose is to control the behavior of some unrelated aspect in the unsandboxed process. The struct contains a couple of pointers and, if attacker-controlled, ends up giving them an arbitrary process memory read/write primitive.
It sounds like you picked option 1 then, which means you don't need to take control of the sandbox. "Create an image that put arbitrary bytes into the buffer that stores its decoded form." simplifies to just "Create an image." There is no vulnerability here. This is just image display happening in a normal way. It's something to keep an eye on but not important itself. You have to add a vulnerability to get a vulnerability.
The original problem of preventing image decoding exploits has been solved in this hypothetical.
> Your original request was: “If you've seen an exploit caused by a big pre-allocated array of untrusted RGBA data, please explain how.”
I asked that in a context of whether you can contain vulnerabilities in a sandbox. If something doesn't even require a vulnerability, then it doesn't fit.
Also please note the words "caused by". A few helper bytes sitting somewhere are not the cause.
> Which is exactly how exploit chains work.
> A single vulnerability usually doesn’t achieve something dangerous on its own. But remove it from the chain and you lose your exploit.
Being part of an exploit chain doesn't by itself make something qualify as a vulnerability. (Consider arbitrary gadgets already in the program. You can't remove all bytes.) And I've never seen "you can send it bytes" described as a vulnerability before. Not even if you know the bytes will be stored at the start of a page!
What exactly is "an exceptionally restrictive sandbox"?
There are virtual machines such as JVM, V8, or even QEMU. These are sandboxes, which run either some special bytecode or native code with extreme performance drawbacks. Media decoders are performance- and energy-sensitive pieces of software in the end.
And media decoders actually ARE sandboxes of sorts. They are designed to interpret media formats, sometimes even Turing-complete bytecode in retrictive and isolated environments. And like any sandboxes, they too have bugs.
> And media decoders actually ARE sandboxes of sorts. They are designed to interpret media formats, sometimes even Turing-complete bytecode in retrictive and isolated environments. And like any sandboxes, they too have bugs.
It's pretty easy to sandbox a simple bytecode, but that's not the bulk of what a media decoder is doing. A plain old decoder is mostly not sandboxing what it does.
For image decoding in particular, you can put the software into an exceptionally restrictive sandbox, or use a language that builds in the same restrictions.
No I/O. No system calls. Just churn internally and fill a preallocated section of memory with RGBA.
The broader system will still have weaknesses, but it won't have this kind, and this kind keeps happening.