Don't forget, "open source" is not enough: we need _lean_ open source and I do include the SDK (then programing language).
That for software/protocol/file formats (and hardware programing interfaces...).
It is much easier to say than done, and when you read that, often it is to apply pressure on microsoft pricing only without a real intent to start to "digitally assume themselves".
Keep in mind: there is ZERO, Z-E-R-O, economic competition with big tech as they are backed by funds with thousands of billions of $ and they their billions of $ too. They will spend anybody out of business (~usually 5-10 years, even longer), and "buy" anybody (then throw them away once lock-in is assured).
For instance: libreoffice is horrible (c++ grotesque syntax complexity is the culprit), PDF file format is insane (I cannot event download the specs with noscript/basic (x)html browsers!). Better write simple utf8 text files along with some PNG images mkv(AV1/OPUS) video if needed.
Basically, you need to generate programmatically the PDF files of the administration since there are no "reasonable" (as far as I know) open source software to do so (often c++, then excluded de-facto).
I agree with you that the PDF format is insane (I have had my head buried in the spec for the last month) but it has won in the marketplace. It's unlikely anything can supersede it now.
Microsoft had a technically strong alternative but it was far too late.
We generate pdf files using weasyprint (convert html+css into cool pdf files), I think tools like this are very valuable and practical for building higher-level pdf-generators tools.
Yep, in-house PDF generators should be some sort of good middle ground, but I dunno if this 'weasyprint' is open source, is _lean_ open source? (no c++, java, etc).
When dealing with an ultra-complex file format which cannot be dodged, usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools.
For instance, the web, noscript/basic (x)html (or you are jailed in the 2.5 web engines of the whatng cartel).
With PDF, I dunno much of the format (since I did not manage to download easily the specs), but when I have to print some text, I have a very small PDF generator for that (written ~25 years ago, so no utf-8 for me).
But what's important: such attempt must be sided with re-assessing the pertinence of the usage of the information systems, and yes, it will annoying and much less comfy and that MUST be acknowledged before even trying.
And big tech is not the only one trying hard to do vendor and developer lock-in.
> usually a good way to deal with it is to only use a very simple but coherent subset and enforce this usage with validation tools
You’re right, that’s exactly what we do. We support a growing subset of HTML and CSS that’s documented. We also use the W3C testing suite for HTML/CSS, and PDF validators, on top of custom unit tests.
> And big tech is not the only one trying hard to do vendor and developer lock-in.
We "only" follow open specifications and refuse vendor-specific features to avoid lock-ins (equivalent closed-source tools love that). And we even love the other open-source "concurrents": ♥ to Paged.js and Vivliostyle, try them, they’re great too!
"Open" is not enough anymore: it also has to be lean, stable in time, and able to do a good enough _pertinent_ (can be very subjective) job (and in the case of software, that includes the SDK, for instance if some c++ or similar are around, it should be excluded de-facto for obvious reasons).
It is _EXTREMELY_ hard to justify an honnest and permanent income writing software... REALLY HARD.
You can learn more about weasyprint on their website (https://weasyprint.org/ ). It's an open source Python package that can be launched using cli or from Python code.
It uses pypdf, which is "pydyf is a low-level PDF generator written in Python and based on PDF specification 1.7" (from their README at https://github.com/CourtBouillon/pydyf ).
Compile a minimal python interpreter with tinycc &| cproc &| scc, run this pydyf and you should be good to go :)
Hopefully, its API a C API bridge for interop.
But pydyf pretends to go up to PDF 1.7: this is kind of arrogant due to the file format complexity.
That's why such tools are not enough: what's important is to evaluate and to assess a subset of the PDF format, that to reduce significantly the technical cost of ownership and exit cost, and maybe use such tools to write also validation tools in order to enforce the usage of that subset of PDF.
Very often, complex file formats (open or not) end up being generated and consumed by one program.
A warning: big tech and its minions will fight super hard everything that is simple, stable in table and does a good enough job (like noscript/basic (x)html for nearly all online services as they were working a few years back).
That for software/protocol/file formats (and hardware programing interfaces...).
It is much easier to say than done, and when you read that, often it is to apply pressure on microsoft pricing only without a real intent to start to "digitally assume themselves".
Keep in mind: there is ZERO, Z-E-R-O, economic competition with big tech as they are backed by funds with thousands of billions of $ and they their billions of $ too. They will spend anybody out of business (~usually 5-10 years, even longer), and "buy" anybody (then throw them away once lock-in is assured).
For instance: libreoffice is horrible (c++ grotesque syntax complexity is the culprit), PDF file format is insane (I cannot event download the specs with noscript/basic (x)html browsers!). Better write simple utf8 text files along with some PNG images mkv(AV1/OPUS) video if needed.
Basically, you need to generate programmatically the PDF files of the administration since there are no "reasonable" (as far as I know) open source software to do so (often c++, then excluded de-facto).