Hacker News new | past | comments | ask | show | jobs | submit login
Browser Security Bugs That Aren't: JavaScript in PDF (textslashplain.com)
99 points by todsacerdoti 9 months ago | hide | past | favorite | 54 comments



It's kind of a downer that the article didn't mention Safari which seems to take a different approach to PDFs. Instead of treating them as "active content", PDF documents are merely rendered with Quartz/Core Graphics and so are free of scripts of any kind. This also has the upside that PDFs look exactly the same everywhere on macOS/iOS, even Quick Look previews.

I like Safari's approach much more than having to hunt down some obscure browser setting or trust that it does the right thing.


I agree, some of Safari features are a little backward but they nailed it with this one.

Overall Safari and Firefox seem to be the best in terms of privacy and security.



This document is fairly out of date, as clicking through the bug links will illustrate.


Sure, but do you have any reason to believe that Firefox has become more secure relative to Chrome since then?


Yes? A large number of the big ticket bugs, such as win32k lockdown, are now fixed.


So any active content is discarded? That may suit your use case but will be insufficient for others.


For anyone (like me) wondering why PDFs would need to support JavaScript in the first place, the main motivation/use-case appears to be validation and interactivity of embedded forms.


I've seen javascript in PDFs be used for unintended exploits more often than every legitimate use combined. It's kind of like if JPEGs could run arbitrary code by design.


There are restrictions, but SVG is an example of an image file type that can run JavaScript (again, there are legitimate use cases for this).


> again, there are legitimate use cases for this

I'm curious: what legitimate use cases exist for embedding a turing-complete scripting language into an image format?


I guess this was specified in a time when nobody thought it would one day be possible to embed an SVG document in an HTML DOM and add animations and interactivity in a performant way there.

ninja edit: It's also from a time when W3C started to lose focus and authority.

It's amazing that SVG was so successful despite this mess and also the confusion potential of CSS in SVG.

Browsers ignore scripts in external SVG images. Don't know if that is for security reasons (JS sandbox unreliable) or because a full isolated JS context per image would be to expensive...


Wasn't there also a time where you could open a raw socket with SVG? SVG is very much from a time when we didn't know what the web was going to be or how it was going to work.


Every browser engine said no to that nonsense.

The core issue iirc was that one of the major use cases for SVG was map/navigation systems where a number of environments required fully standardized systems. But they didn’t want to say implement a full browser stack”, so they just came up with their own “networking api” that was just “sockets!”.

A lot of this work predated html5, and the subsequent rationalization of web specs such that (for example) the xhr API was not fully specified, and it was not a separate specification from the rest of the browser stack, so SVG couldn’t just do what they could (in principle) do now.

The SVG WG was not the most functional - i recall that something a subset of the committee did at one point was to after the end of one person’s work day they rescheduled a meeting to later “that day” (while they were asleep) and took a vote without them present.

A number of other choices were made to the detriment of the spec for specific use cases (the various performance profiles have fundamentally incompatible rendering behavior rather than gradual decay, etc)


Thanks for the explanations!

Funnily enough we did end up saying "implement a full browser stack" :/


I might be mistaken, but that sounds like a general XML-related security bug (of which there are plenty)


Compression: for some images, you can't use SVG's <use>, but a small script can generate the repetitive bits quite nicely. Also, aperiodic animation (e.g. a double pendulum): SGML animations can represent a few minutes, but don't try putting a few hours' worth in.

PostScript, the printer file format, is Turing-complete, for different reasons.


That's because SVG is actually a document format, that is mostly for vector graphics. SVG nodes even show up in the DOM and CSSOM.


as a general rule this would be to do generative graphics, user interactivity with the graphical elements, animations, superset of all these - games.


I knew a guy who wrote a PostScript document that was a map of the sky at that moment. If you rendered it an hour later it was different again. It used the `file` capabilities of host-based interpreters.


> I'm curious: what legitimate use cases exist for embedding a turing-complete scripting language into an image format?

Competing with flash?

SVG tries to be a lot of things, one of them was to be a full on interactive app.


Signature forgery?


There are "legitimate use cases" for just about everything imaginable on this planet because there will always be a user that goes "I spend all my day in X software wouldn't it be great if it could read my email/monitor my plants/talk to sales/..".

That's how cursed enterprise software develops email clients and chat services. Just say no.


I understand the motivation, but IMHO a PDF should be a static document, hence, something you can trust without worrying.

Since they can contain code, they can carry malicious code. PDFs have, in fact, been used for exploits. Meaning that you shouldn't really trust them. Which is a shame.


Iphones don't support JS in PDFs, but yet an integer overflow in image decompression code led to a zero-click imessage exploit.[1] So lack of explicit code support doesn't mean you can trust without worrying. Bugs can be anywhere. Iphones have been known to have crash-causing bugs in unicode-handling code.[2] So even just text could be a problem. Disclosure: I work at Google but not on Project Zero.

[1] https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...

[2] https://techcrunch.com/2018/02/16/iphone-bug-telugu-unicode-...


> Bugs can be anywhere.

Yes. Bugs. Bugs can be fixed.

By-design (mis)features can't be fixed. The only way to fix them is by removing the feature.

Unless you're agreeing that JS-in-PDFs is a bug, you're conflating fundamentally different issues.


JS in PDF might be a mis-feature, but any security lapse is indeed a bug in the implementation (made doubly worse by firefox running the JS in a web context).

Yes, removing JS support would get rid of potential security exploits. It doesn't change the fact that said exploits rely on bugs in the implementation.


That's true, but it misses the point that scripting adds orders of magnitude greater complexity to the attack surface.

Fixing other kinds of bugs is fairly straightforward. Update your toolchain, update your dependencies, use the right dependencies, avoid undefined behavior, etc. Fixing scripting issues means participating in an active arms race.


There does seem to be a mismatch between what PDFs are mostly used for, and their full capabilities.

IMO it’s be nice to define a file format for PDFs main use (I think?), papers and documentation. PDF, scripting, but maybe the ability to zoom and pan figures?


Such a format exists and is called PDF/A: https://en.wikipedia.org/wiki/PDF/A#Description

PDF viewers can have a matching PDF/A mode where all non-PDF/A features are disabled.


In the engineering world outside software, our cad tools generate rich interactive functionality into PDFs, including but not limited to 3d models for those doing mechanical work.


I've known about those capabilities for a long time and I've always wondered: How commonly is that used? For what use case(s)? What makes PDF the format of choice for that purpose and not, for example, a CAD file? What PDF apps are popular for creating and using those files?


pdfs are categorically not the appropriate medium for this.


From the article:

> Instead, Firefox offers an individual pdfjs.enableScripting preference that can be configured from the about:flags page.

As a long time FF user I have never heard of about:flags and it does not work either. about:config contains the setting like a million of other ones that no ordinary mortal can ever manage.

Just a mistake in TFA or am I missing something?


Chromium has chrome://flags, that's probably where the confusion came from.


Probably an error on a GPT generated article :)


Unless they were using a highly quantized local language model, it seems more like a human error than a GPT error.


But it got wrong both entities (the config address and the setting name). It feels like an LLM error on writing based on short high levels descriptions of sections.


The article is a bit one-sided: it reviews the topic only from the aspect of rendering PDFs using a copy of pdf.js embedded in a web browser. However, this is not the only copy of pdf.js. It would be interesting to check software like NextCloud or its proprietary workalikes (e.g., PCloud) for their handling of untrusted JavaScript in PDF files shared through these platforms.


They are mostly talking about chrome which does not use pdf.js (unless they changed it)

In any case, its pretty similar in both cases. Even in the client side rendering case, if there is a sandbox you still have to escape it before your script execution is a real vuln.


Even from sandboxed JS, you can do network probes.

https://github.com/joevennix/lan-js


Just because PDF files can't hijack your domain, doesn't mean they can't spy on you. Unfortunately, there isn't really an open source tool for sanitizing PDF files.


Yes there is: Dangerzone (https://dangerzone.rocks/)

Converts incoming documents into a PDF that is a sequence of filtered/optimised images. Dangerzone will also handle office docs, epub, and a lot of different image formats (e.g. SVG and others that can possibly contain active content).


I am surprised to see the website saying:

Sandboxes don't have network access, so if a malicious document can compromise one, it can't phone home

With a combo CSS, HTML, JS it is networkable from localhost.


You can always open the PDF in a stand-alone PDF viewer that doesn't support javascript. Like xpdf, mupdf, or atril, or one of possibly many other PDF viewers.

(those are the only ones I have installed right now, I think)

Or there's poppler-utils which contains a bunch of tools for dealing with PDF files. You might be able to write a script which uses them to extract the contents into a safe format (maybe even a different PDF file?).


Anything that converts to PDF/A will strip Javascript


> an open source tool for sanitizing PDF files.

Here you go: https://blog.invisiblethings.org/2013/02/21/converting-untru...


And what about SVG's with JavaScript? I once saw a security company reporting to disable user uploaded SVG's.


JS in SVGs can be dangerous, but you can mitigate it using a CSP or by sending "Content-Disposition: attachment" so the file will be downloaded instead of being executed in your current browser context.


Fortunately svg and (x)HTML aren't too horrible in being able to sanitize scripts or if. The use of CSS's access notwithstanding.


PoC || GTFO Volume II has a section of using a crafted polyglot ZIP / PDF to exploit SNES Super Game Boy. https://www.alchemistowl.org/pocorgtfo/pocorgtfo10.pdf

Issue is that PDF has turned into a form submitting for some US government agencies and other organizations. Until PDFs are no longer used for form submission and are transitioned to static documents they are a good exploit entry.


Sorry, if I missed this... Is the user asked: Do you want to run the JavaScript in this PDF?


Unsuperisenly, no.


Works on Chrome and Firefox but not on Safari.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: