Regarding 2.: Most of these objects do not directly correspond to rendered elements. Basically every page has one (typically) content stream which will contain all rendered elements. The biggest rendered thing you see outside of that are annotations (link boxes, form fields, actual annotations, ...).
It's a bit different if you are looking at a tagged PDF, where the tagging structure is in there, but if you want to look at that in detail you are probably better served with e.g. ngPDF (https://ngpdf.com/) which will show the tagging structure including the mapping to rendered elements.
I haven't decided if I want to create an open-source version. In the first place, I made it private to worry less about my code quality and to finish the product faster before I lose interest in it.
It heavily relies on the core part of PDF.js: I've made a fork of the PDF.js project, removed everything not related to the core part, and added an export for low-level primitives [1].
Also, as inspiration, I used the pdf.js.utils [2] project, which almost does the same but in a different form.
I wouldn't worry about the quality of the code. You get better by seeing other people's work and seeing alternative solutions to the problems you had.
Also, as I mentioned in another comment, this could easily be built into a quick trouble-checking app for POD work. Posting it would also let people fork it to make more task-specific apps.
I'm curious, what are the differences between T5, Flan-T5, and Flan-UL2 for fine-tuning? Does the instruction tuning matter at all, once you're fine-tuning?
I'm curious about the prompt you used, if you're willing to share (I'm interested in blend words, which this is somewhat related, but not quite - these are blended in a semantic sense, not textual)
Go to Perplexity.AI, and start asking questions; entire Vonnegut collection is in there... you can finally figure out what all those little aliens meant (I have)!
I'll take the opportunity to request better editing ergonomics: the ability to connect from Jupyter/notebook-supporting editors and IDEs; the ability to open/edit .ipynb files from the local disk and/or Github without having to first put them on Google Drive.
Colab/Jupyter and friends are reinventing many wheels around editing code, and it would be nicer for them to support tools like Jupytext.
1. Is the source available anywhere? I'm curious to see how it works.
2. Is there a way to connect the structure displayed here, to the rendered version in the PDF? To visually display the subcomponents?