Hacker News new | past | comments | ask | show | jobs | submit login

qpdf[1], and, in particular, libqpdf, is possibly the most useful PDF tool I've ever used, because it was the first library I found that works at the proper level of abstraction for dealing with the PDF file format on its own terms.

In other words, the library directly exposes the essential PDF object structure (pages, dictionaries, strings, numbers, streams, etc.) for easy editing, while abstracting away as much of the incidental PDF file structure as possible (encryption, compression, object references, the page tree structure, etc.).

Among many other applications, I've used it to

• Automatically repair minor PDF file format problems (of the sort that would be fixed by Open + Save in Acrobat).

• Concatenate multiple PDFs into a single PDF, adding a bookmark to the first page of each, with the bookmark title derived from information not contained within the PDFs or their file names.

• Losslessly reduce the size of PDFs in not-entirely-trivial ways. For example, I was given a ~1 TB set of PDFs that stored 1-bit monochrome scanned images as losslessly-compressed (RLE, LZW, or flate) 24-bit color images with every pixel either 0x000000 or 0xFFFFFF, but also stored color and grayscale images in the same way, and included important non-image data. libqpdf made it easy to loop through each PDF file, extract and analyze the pixel data for each image, and replace relevant images with JBIG2-compressed "true" 1-bit equivalents without otherwise modifying the PDF.

• Lossily recompress large images embedded in PDFs, but only if they matched certain criteria. Specifically, I had a large number of PDFs that contained lots of lossless high-DPI 4K screenshots of a specific application, where even relatively high JPEG compression maintained legibility, interspersed with images where such recompression was undesirable (photos, document scans, 1080p screenshots).

• Create PDFs by overlaying plain text on PDF forms — "paper" forms defined by PDFs, not PDF forms — without duplicating form content for each page.

[In the above examples, JPEG and JBIG2 compression was performed with other libraries, as these are out-of-scope for libqpdf itself.]

[1] https://qpdf.sourceforge.io




Interesting.

I must check out qPDF.

Have you tried out muPDF?

I have not, yet, but it looks interesting too.

mupdf.com

https://artifex.com/company




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: