Don't forget, "open source" is not enough: we need _lean_ open source and I do i...

tonyedgecombe · 2025-06-25T10:22:06 1750846926

I agree with you that the PDF format is insane (I have had my head buried in the spec for the last month) but it has won in the marketplace. It's unlikely anything can supersede it now.

Microsoft had a technically strong alternative but it was far too late.

eska · 2025-06-25T14:47:11 1750862831

FWIW I distribute HTML with embedded images instead of PDF usually.

sylware · 2025-06-25T16:43:23 1750869803

And I think you can do the same with <audio> and <video>...

michalf6 · 2025-06-25T10:46:44 1750848404

What was that Microsoft alternative called?

pjmlp · 2025-06-25T10:51:53 1750848713

sylware · 2025-06-25T10:44:23 1750848263

dude... 'it has won in the market' : with those words, you have already lost to big tech...

tonyedgecombe · 2025-06-25T11:38:41 1750851521

Good luck changing reality.

sylware · 2025-06-25T12:17:06 1750853826

With "people" like you, linux or any open source alternatives would not have happened.

You are part of the problem dude.

sodimel · 2025-06-25T10:02:57 1750845777

We generate pdf files using weasyprint (convert html+css into cool pdf files), I think tools like this are very valuable and practical for building higher-level pdf-generators tools.

sylware · 2025-06-25T10:57:50 1750849070

Yep, in-house PDF generators should be some sort of good middle ground, but I dunno if this 'weasyprint' is open source, is _lean_ open source? (no c++, java, etc).

When dealing with an ultra-complex file format which cannot be dodged, usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools.

For instance, the web, noscript/basic (x)html (or you are jailed in the 2.5 web engines of the whatng cartel).

With PDF, I dunno much of the format (since I did not manage to download easily the specs), but when I have to print some text, I have a very small PDF generator for that (written ~25 years ago, so no utf-8 for me).

But what's important: such attempt must be sided with re-assessing the pertinence of the usage of the information systems, and yes, it will annoying and much less comfy and that MUST be acknowledged before even trying.

And big tech is not the only one trying hard to do vendor and developer lock-in.

xOvni · 2025-06-25T13:15:54 1750857354

Hi, WeasyPrint/pydyf dev here!

> usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools

You’re right, that’s exactly what we do. We support a growing subset of HTML and CSS that’s documented. We also use the W3C testing suite for HTML/CSS, and PDF validators, on top of custom unit tests.

> And big tech is not the only one trying hard to do vendor and developer lock-in.

We "only" follow open specifications and refuse vendor-specific features to avoid lock-ins (equivalent closed-source tools love that). And we even love the other open-source "concurrents": ♥ to Paged.js and Vivliostyle, try them, they’re great too!

sylware · 2025-06-25T16:38:20 1750869500

"Open" is not enough anymore: it also has to be lean, stable in time, and able to do a good enough _pertinent_ (can be very subjective) job (and in the case of software, that includes the SDK, for instance if some c++ or similar are around, it should be excluded de-facto for obvious reasons).

It is _EXTREMELY_ hard to justify an honnest and permanent income writing software... REALLY HARD.

pepa65 · 2025-06-25T16:08:59 1750867739

How about typst, do you not consider that competition??

sodimel · 2025-06-25T11:17:39 1750850259

You can learn more about weasyprint on their website (https://weasyprint.org/ ). It's an open source Python package that can be launched using cli or from Python code. It uses pypdf, which is "pydyf is a low-level PDF generator written in Python and based on PDF specification 1.7" (from their README at https://github.com/CourtBouillon/pydyf ).

sylware · 2025-06-25T12:36:29 1750854989

Compile a minimal python interpreter with tinycc &| cproc &| scc, run this pydyf and you should be good to go :)

Hopefully, its API a C API bridge for interop.

But pydyf pretends to go up to PDF 1.7: this is kind of arrogant due to the file format complexity.

That's why such tools are not enough: what's important is to evaluate and to assess a subset of the PDF format, that to reduce significantly the technical cost of ownership and exit cost, and maybe use such tools to write also validation tools in order to enforce the usage of that subset of PDF.

Very often, complex file formats (open or not) end up being generated and consumed by one program.

A warning: big tech and its minions will fight super hard everything that is simple, stable in table and does a good enough job (like noscript/basic (x)html for nearly all online services as they were working a few years back).

henrebotha · 2025-06-25T14:02:17 1750860137

What on earth is "lean open source"

vasco · 2025-06-25T10:13:52 1750846432

IBM has some cool AI tools for PDFs that I used for some side project toys: https://github.com/docling-project/docling

pif · 2025-06-25T10:01:53 1750845713

[flagged]

tomhow · 2025-06-25T14:29:18 1750861758

> Your unrelated, idiotic disdain

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Please don't fulminate. Please don't sneer, including at the rest of the community.

https://news.ycombinator.com/newsguidelines.html