Hacker News new | past | comments | ask | show | jobs | submit login
Paged.js: Paginating content and making books in the browser (pagedmedia.org)
179 points by goranmoomin on Nov 10, 2019 | hide | past | favorite | 61 comments



I did a big deep dive into this when I was implementing offline caching of web pages and annotations in Polar.

The issues I had were mostly to do with converting to pagination and I decided that in practice it just wasn't worth it.

The problem is that pagination on web pages just doesn't work. People have figures and so forth that need new page breaks to properly work.

CSS has extensions for this but the problem is that no one uses them because they're not intending their pages to be printed or paginated.

I have more work on this if you're interested:

https://getpolarized.io/2019/10/06/Whats-Next-For-Web-Captur...

https://getpolarized.io/2019/05/11/portable-web-documents.ht...

... I've decided to just migrate all this work for EPUB for various reasons but I'm also going to migrate to one continuous 'page' due to the CSS pagination issues I mentioned above.


Holy Moly, this Polar thing looks amazing! And it’s open source too!! I was definitely not expecting that when I followed the link nor when I started reading about it.

Document management, and saving web pages is a huge deal and something that I have been looking for a solution for but which until now I have not found anything that would quite fit, and most of the ones I have looked at and seriously considered would require too much integration work and customization to be usable. But Polar looks really really promising!

I have avoided Electron based software until now for the same reasons you often find people echoing in the HN comments section when the topic comes up, but for Polar I would make an exception because it looks to be perfect for me so I am willing to sacrifice some of my precious RAM for Polar and am willing to accept a certain degree of UI and performance problems that I associate with Electron based apps.

Since a web browser extension is in the works though, would you recommend that I wait until that is released or should I start using the Electron based app now and then migrate/switch?


> Holy Moly, this Polar thing looks amazing! And it’s open source too!! I was definitely not expecting that when I followed the link nor when I started reading about it.

Thanks... 2.0 should come out next week too so it's going to be interesting to see how the community responds.

Here's more on the release:

https://www.reddit.com/r/PolarBookshelf/comments/du68gf/plan...

The browser version is almost on par in terms of features with the desktop version. The main features that we're missing in the web version are offline support (which will be resolved soon), support for syncing with Anki, and web capture.

We're working on porting web capture to completely run within the web extension though. I think the web version should be completely on par with the desktop/web version in the next month or so.

Also, it's really not THAT much of a resource hit. On my machine it burns like 100MB or RAM and say 300MB of disk space.


I remember when WoW ran on 300 mb


It's surprisingly difficult to navigate to a clear example of this project in action.


Found a sample, here are the first 50 pages of Moby Dick: https://s3.amazonaws.com/pagedmedia/pagedjs/examples/index.h...


That link is hilariously broken on my phone (Safari iOS 12)

I see the tiny zoomed out 2x2 layout of the page, but if I try to zoom in, it zooms in on the abyss.

And if I try to pan while zoomed in, it zooms back out to the illegible zoomed out view...

-

Actually the more I play with it, the more interesting the way it breaks are.

I was able to zoom in when I scrolled down first, and very inefficiently pan around the 2 column view, but if I pan too far, the zoom resets.

And sometimes I just end up being taken to the abyss again.

Which I figured out is way off in the right margin.


It's apparently made for printing online content, not reading in the browser / on your phone.


We have media queries, there’s no need to break one to make the other look good


Maybe I don't understand the project, but the top of the docs shows that it all goes inside the @media print{...} media query.


Sure, just make another @media screen {…}


Which didn't work in Safari on my mac. Looked like it did in FF, but only got the first 16 pages for some reason.


Broken on iOS 13 as well.


I open this link. The fan of my T480 starts blowing audibly and is getting louder. I can feel a delay when marking some text. All it does is render some text. Thankfully, scrolling stays smooth. I close the page. My fan calms down.

I think I will stay away from this for now.


It's weird that "and" is broken up into "an-" and "d" on page 38 and 39 respectively.



We are still working on a site for this so sadly not much to point to yet, but might be worth checking out the sneak peaks post with a few codepens running paged.js at https://www.pagedmedia.org/pagedjs-sneak-peeks/


I spent a few minutes looking for a demo and gave up. Did you find one?


Same here. There is an /examples folder in the top repo from the linked gitlab instance, but it just contains a lot of inline js and I’m on mobile so can’t try to run it.


You can have a look at https://paged.design/ for some example.

Use chromium and try to print to see the whole thing in action.


I think pagination works better in books because books are finished. Websites get updates all the time, so when you mentally organize an idea as being part of page 3, that might just lead to confusion next week.

The embedded links and ads everywhere we look just throw garbage all over the place too, or else published news articles might match pretty well


I think you're thinking about web apps. I imagine there's little purpose in printing those.


It always felt embarrassing to me how incomplete (and still in draft status) the CSS for paged media is...

I would be so happy if this polyfill takes off!


We're working hard for that to happen, and to see browsers implements more of the paged-media specs!


Thanks for your work. I've been re-reading Knuth's _The TeXbook_ as a reminder of fine points in typography. So much can be covered in just provisioning for floating blocks, top-floating blocks, widows and orphans, a few counters, some flexibilty in headers and footers, and such. It seems like you and/or CSS are hitting those high-points.

I hear you emphasizing this might just be for print, but something like a Kindle in-browser reader (hidden nav; spacebar or < or > to turn a page) can also provide a nice experience (different to scrolling) in reading longer works.


I searched through the site and still couldn't find a demo hosted anywhere. The idea of making paged media a more pleasant in-browser experience sounds neat, but the lack of examples is troubling.


We'll need to make these easier to find, but there are a few examples mixed in with the Readme and more at https://paged.design


Is it possible to use it to convert a web page to a paginated mode in the browser? Like the reader mode but put the content in pages instead of continuous mode.


This might be a hot take, but I think this is unnecessarily complicated and is a solution looking for a problem. My thoughts apply more broadly to the CSS Draft for pagination as well.

In general - why does the Web need this? This is a less-flexible CSS grid, with some auto-increment and @media(print) styles mixed in.

The use case for faithfully representing a textual work is not convincing - this can be done already, more accurately, and in a far less complicated manner by using images or PDF or any other format more suited than hypertext markup and CSS.

We already have pagination on the Web, it's called a webpage.

This just feels so wrong and not what a Web document should be: https://www.w3.org/TR/css-page-3/


I think (hope?) it is for printing, and not for reading in the browser. The example has `@media print{..}`, so I assume that is what the intended end result is for this project.


Julien here, working on paged.js

It's definitely made to print books and pages from the web. You can preview in a browser, but that's not the goal. It's a polyfill for CSS properties made for print.

Although, you can't tell for sure what people want to do with tools and library right? :)


Eh, I'm not so sure. I spent more effort than I should have to find an example of this working and I'm not impressed [0]. Everything about it seems worse than browsers' reader mode for accessibility. It's also very slow, but I'll let that skate since it's probably just the polyfill[1].

I don't get the appeal. This isn't anything like a book, since I'm still scrolling, squinting, and reading words from a backlit screen instead of a physical page. Not to mention, using the browser's zoom functionality just straight up doesn't work (the pages and text stay the same size). So much for accessibility?

I really dislike everything about this.

[0]: https://s3.amazonaws.com/pagedmedia/pagedjs/examples/index.h...

[1]: But actually, JS is supposed to be pretty fast nowadays - so let's not build this bloat into the browser's rendering engine please?


Printing books directly from the browser isn't a solved problem, so it seems like a good project, to me.

I don't see any evidence on their website this is intended for reading in the browser, and tons of clues that it's intended for printing books from the browser. I mean, there are tools for physical page sizes, blank pages, front/back matter, page counters, etc. Things that make zero sense in the browser and are necessary for a nice book printing.

So, what makes you think this is intended for reading in web browsers? As you note, it's horribly unpleasant in a browser...seems like they would have noticed that if they were testing in the browser, rather than using it to print books.


The example you showed is previewing in the browser. The intention is to use this with @media print and just to get nicely flowed, filled, and typeset output when sending to a printer.

You would either use this for stuff that is only useful printed (e.g. a boarding pass could be HTML instead of PDF) or as secondary, print-specific styling for web content that is frequently printed to paper.


It's basically a css print file, which includes a lot of css properties that are not built in browsers yet, with a preview you can see on screen.

print is just another aspect of responsive web design. And paged.js add the possibility to add things that the paper need and the browser don't (page number, cross rereferences, running-head, etc.)

A bit more info about css print: https://www.smashingmagazine.com/2018/05/print-stylesheets-i...

We're sending webpages to printing press, and it becomes a book. That's it.


The use case is when I buy a plane ticket it comes with a 3 page receipt and E-ticket, and rather than downloading a PDF, we can have HTML specify how to flow the document into a document. It seems the necessary extensions are almost TeX like, alternating headers, columns, sections, page break.

HTML is already exceeding good at flow layout, we can flow text into desktop, mobiles. Why not extend the rules so they flow into a piece of A4 document, or two columns?

Another use case is reports, where there are tabular data, where headers should be repeated at the top of every page. Wouldn't it be nice if HTML/CSS could specify this instead of resorting to Crystal Reports? It'd certainly make us doubly productive.


In my opinion it's a first-principles kind of thing with respect to "what is the purpose of a web browser."

To me, a web browser's purpose is to render markup into something useful for a user, dependent on variables like screen size, user preferences, browser zoom, accessibility needs, etc. A browser's purpose is not to render-a-document-as-an-A4-printout-specifically,-because-no-other-format-will-do.

"Make it look exactly like this" is not an expectation we can or should have about web browsers, and this proposal is an attempt at that.

Anyway, to specifically address your use cases:

> The use case is when I buy a plane ticket it comes with a 3 page receipt and E-ticket, and rather than downloading a PDF, we can have HTML specify how to flow the document into a document

HTML can already specify that document. Why does the browser care that it would be 3 different physical pages? Why would a user care? We're letting the tail wag the dog - just put the information in HTML.

> Why not extend the rules so they flow into a piece of A4 document, or two columns?

Two-column layouts are a concession of print formats due to constraints of needing to get all the other content onto the physical page. We don't have those constraints in Web, there's no need to introduce them.

> Another use case is reports, where there are tabular data, where headers should be repeated at the top of every page. Wouldn't it be nice if HTML/CSS could specify this instead of resorting to Crystal Reports? It'd certainly make us doubly productive.

If we're spinning up this whole AlmostPostscript.js thing to render a <table> with a sticky header, I really don't want to be a modern web developer anymore.


You're missing the point here. I have created an electron app that renders reports via json and svelte. They are intended for print not the web and the easiest way to get the data printed was using the browser engine which gladly outputs to PDF. However, styling these pages would be a lot easier with better page support in CSS. Or at least better support of those CSS specs. I really look forward to having these polyfills. Using paged media is in a lot of contexts easier than latex. Especially when it comes to dynamic content.


Why shouldn't people be able to print nice looking versions of certain web pages, for example books or reports? I don't understand the objection.


Definitely a hot take. Scrolling is not the best UX for all content across all use cases. I'm sure you can imagine some use cases where pagination is superior.

And sure, paginating fixed content is easy. Paginating dynamic length content? Having poked around at the problem, I'm very curious to give this library a try.


One of the main virtues of HTML/CSS is separation of style and content. This is useful when you want to style your content for both print and the screen.

Currently, this requires you to use something like pandoc to generate LaTeX and HTML from the same source document, then use two different languages for styling. I don't want to do that. I want to have a single HTML file with one stylesheet that applies a few extra rules to print nicely. I can sort of do this now, but there are lots of features missing that make it impractical for many use-cases. For example, it's currently impossible to make a footnote always display at the bottom of a print page in CSS alone.


Pagedown[1] leverages paged.js to create reproducible scientific documents. It's still a little rough but I prefer it to fiddling with LaTeX templates.

[1] https://github.com/rstudio/pagedown


Is this library needed, if you are using PrinceXML?[1] Looks like a lot of the pagination and margin stuff is already supported by CSS 3

[1] https://www.princexml.com/


Hey folks!

@julie_b @fredjc and myself are working on paged.js and we'll be happy to answer any question you may have :)

We're in https://mattermost.pagedmedia.org/ :)

Come play with us :)


If you want to know more, you can also go to https://www.cabbagetreelabs.org


This looks interesting. I work on a project to help convert articles into printable PDFs. Created a video a few years ago showing how CSS can be used to control columns based on paper size and orientation: https://www.youtube.com/watch?v=854Csokl3QA

We currently have a browser extension you can test here: https://pdf.fivefilters.org/simple-print/


The page printing of this website is somewhat broken on firefox. The two columns are neat I guess, but the subtitle overlaps the text, the top margin is unreasonably small, and decreasing scale (to save paper) makes right margin very big.

I understand that some people may not care about Firefox, but can you at least not break things for it?


Firefox does not support the page size declared in the @page when printing. The page is in A4 or letter by default, it is necessary to manually modify the size of the page in the personalized sizes. If it's not done, strange things happen. We care a lot about Firefox and paged.js team tried to have a discussion several times on this point. It should be moving in the next few months, I hope.


How well does it handle orphans and widows? That's my litmus test for whether it is suitable at all for making books.


Since we use Blink rendering engine, and columns, we do have the ability to set up widows and orphans as we want it :)


Any cross pollination with EPUB?


I'd love to be able to export a pre-paginated epub from the HTML we generate, but that is just a long term goal at this point.


FAQ says you can export to ePub, but not via the UI yet.


There is an example where you can upload and view an ePub: https://s3.amazonaws.com/pagedmedia/pagedjs/examples/epub.ht...


when i gave it a go, the demo "moby dick" was completely broken :/


All show something like two page zoomed out overview, firefox for android.


These two concerns seem rather unrelated. I mean, pagination is valid for 2 pages. A book is hundreds of pages, and the concerns there relate to that very long length. (I would settle for a nice in browser solution for merely the former!)


I don't want to sound snarky but this is a sincere question: why?

Why would I want to make a browser act like a book? Or vice versa? Didn't we kinda try this with Flash? And didn't we realize they are two different mediums?


It is not about making the browser act like a book. It is about using HTML (a document markup language) and CSS (styling independent of semantics) to write paged media. I personally need to write the same documents fairly often in both latex and HTML in order to make them pretty for paper and screen. I would love to use HTML/CSS for both.


Oh, great! Some more garbage that only works in Chrome... because hipsters and lazy devs looking for jahbs!


Because the other browsers havent made the @page size property available, which is the only thing we can't polyfill.

But Firefox is now looking at this, and Safari too. I hope we'll get to support more browsers soon enough (Firefox is my daily driver)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: