Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What do you use for PDF reports these days?
94 points by jguimont on Oct 12, 2014 | hide | past | favorite | 60 comments
I am creating some entreprise software and almost every other information needs to be sent by email in pdf format. Reports, invoices, etc. I am using whtmltopdf right now, but it is far from ideal.

My idea setup would be to create some template with fixed parts, growable parts (texts that can vary in size), repeatable parts (lines in a table). And then just feed the data to have a nice pdf built.

The closest I found was using LibreOffice to convert a document to pdf (http://railsblog.kieser.net/2013/04/part-ii-creating-beautiful-reports-in.html).

It seems PDF generation should be a solved problem. Maybe, Adobe LiveCycle is doing exactly what I need, but it is not open source and I do not have $50k to find out.

What are you guys using?




I'm using Org-mode which can export to LaTeX -> PDF [0]. I use Org-mode for invoices [1], reports, time-tracking, etc. Using org-babel you can use Gnuplot[2] or R [3] to embed charts and other visuals.

[0]: http://orgmode.org/manual/LaTeX-and-PDF-export.html#LaTeX-an...

[1]: http://orgmode.org/worg/org-contrib/

[2]: http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-g...

[3]: http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-R...


Do you have any idea how much memory/cpu does this conversion takes? We use wkhtmltopdf for an enterprise app but it's slow and takes too much memory. Latex -> PDF does sound very appealing.


I haven't personally run any memory or cpu performance tests. There are online LaTeX editors out there; ShareLaTeX comes to mind. My point being that if online services like ShareLaTeX exist, then the conversion process shouldn't be horribly slow. As always, YMMV.


I've used PhantomJS with CoffeeScript, and Reportlab with Python.

The PhantomJS solution became a bit painful, since there are issues with the way Qt converts HTML into PDF. I was impressed that even SVG generated with D3 faithfully was converted to PDF. But not being able to precisely control formatting was a problem.

After that I decided to try Reportlab, and it's been amazing. There is a free open source Python API version, and the commercial Report Markup Language product (RML). It's possible to mix Python and RML with a template engine (Preppy), very powerful. We opted for the commercial support, since time was a factor. RML is really baked, it's been around for over a decade. The documentation is extensive, but I still had to find a couple of things via googling. However, their commercial support is awesome.

If you don't want to spend a lot of time handcrafting documents, I recommend RML, despite the cost. It has features for page layout control, fonts, tables, styling and multipage text flow. There's even a charting library, which I didn't use yet. Also important was the ability to include existing PDF pages as backgrounds or chart inserts.


Honestly, I've been Chrome's print to PDF + Quartz Filter to compress JPEGs for over a year.

These are the most important tricks:

  .A4page {
	width: 793px;
	height: 1120px;
	position: relative;
	padding: 200px 50px 50px;
	box-sizing: border-box;
  }

  @media all {
	.page-break	{ display: none; margin-bottom: 100px; }
  }

  @media print { 
	.page-break	{ display: block; page-break-before: always; }
  }

Good font rendering, selectable text. Not perfect, but the best I've seen so far.


That's basically what wkhtmltopdf does.


I use the Prawn gem. I usually have a Rails controller to spit out the PDF files, even if the main project isn't Rails.

Prawn provides very fine control over the rendering, at the expense of having to finely control your render.

You can use your own fonts (and have to, for Asian languages etc) and it is easy to do layouts where the size is fixed and you shrink the text down to fit it in (or truncate it). On the other hand is not good for really rich text layout like you can do in HTML and CSS, where the size might vary.


I'm Prawn's maintainer and I think the "fine control over rendering at the cost of having to finely control your renderer" is spot-on.

We are working on a long term solution to that problem by finally focusing on building an extension and components layer for Prawn, which would hopefully allow all sorts of other gems to fill in these gaps. But it's a ways off into the future.

If going to Asciidoc is an option, Asciidoctor is now Prawn based for PDF output, and might make sense for simple reports:

http://asciidoctor.org/


Strong endorsement of the second paragraph here. I love Prawn, but would probably not suggest it for casual reporting. If getting that right is core to the app, though, it is a great option.


I'll add another vote for prawn. It's a great library that has allowed us a lot of flexibility with our generation of pdfs. The author Gregory Brown, as well as the current maintainers, are very active and have done an amazing job at getting it to 1.0. I highly recommend it!


Another vote for Prawn, which I use for on-the-fly PDFs at cycle.travel (Ruby but not Rails). Easy to use and reliable.


Docraptor, http://docraptor.com

It uses the Prince engine (http://www.princexml.com/), so it supports paging, headers, footers, page numbers etc. in addition to standard HTML+CSS3. It's great overall but quite pricey.


I highly recommend Prince/Docraptor as well. However, if you don't have complex layout needs and don't need CSS3 features use wkhtml2pdf. You just have to dynamically layout the page you need and use pass it thru wkhtml2pdf. There are many gems if you are using Rails.


I'd also recommend this, with the caveat that the built in javascript rendering engine isn't that great. The workaround is to load the page with phantomjs, and then send the computed DOM (w/ the JS stripped out) to DocRaptor / Prince.


We're also using Prince, love it. We've previously used PdfCrowd and wkthml2pdf, but our customers were demanding more control of their reports (detailed footers, repeating headers, table of contents).


For Reports with lots of tables and mathematics: generate LaTeX file -> PDFTEX to PDF using MiKTeX


I use this for invoices in my homegrown invoicing system. I also have a script for generating PDF & HTML versions of my resume from the source data, and the PDF version goes through LaTeX. If you're using Ruby, prawn is a nice solution also, but my favorite is going via LaTeX.


Even without math, LaTeX is a pretty nice way to generate documents. Especially with TikZ, you can get really nice autogenerated figures too.


Agreed. LaTeX is the best (well, least-worst) report generating system, mostly because of its extensive ecosystem. It's well-supported in multiple languages, and there are packages for just about everything.


What's wrong with WkHtmlToPdf? It is pretty damn good in a "it just works" way.


To me, no selectable text and full page background images.


I'm in a similar boat to OP, so I would be interested to know as well as far as what didn't go so well with wkhtmltopdf.


We use it here and have found it to be very flexible. We render a DOM in memory which we can then turn in to a variety of formats: markdown, html, pdf etc.


I use PrinceXML, which is a few hundred or a few thousand depending on what you need, instead of fifty thousand. Prince converts [html/xml + css] to PDF, and has the most complete CSS3 implementation I'm aware of by a green mile.

There's a free version you can experiment with, to find out.


Basically seems like if you're going to be using docraptor heavily, you might as well buy a license and the underlying technology yourself, right?


I mean, I don't really have an opinion on DocRaptor. It seems convenient and it might be what people want.

I wanted a local tool which I could script. Also I bought Prince years before DocRaptor existed. (That guy is in the Prince forums and he's pretty cool, and I like his product.)

I just wanted a command line tool though, and that's part of what you get from a personal license.


Fun fact: the (Australian) company supporting Prince has the (Norwegian) Håkon Wium Lie as director -- CTO of Opera and the proposer of CSS (see recent article https://news.ycombinator.com/item?id=8436659 ).

Håkon / Haakon being a common name, coincidentally the Norwegian crown prince is also named that. So searching for "haakon prince" has two quite different meanings.


Microsoft Sql Server Reporting Services - does everything you mentioned and exports to pdf

same for Business Objects, but IMHO needlessly more complicated than the former.

obviously neither are open source, and an SSRS license is worth it if your business spits out many reports.


SSRS is a beautiful piece of software.

SQL Server Express with Advanced Services ships with SSRS, is free, comes with all the tooling and generates PDFs too.

Perhaps surprisingly, if you install your ODBC drivers for your database engine (MySQL, Postgres etc) you can create a report from that data source as well so you're not tied to SQL Server as a database engine.

Typical steps: http://www.mssqltips.com/sqlservertip/2615/creating-a-ssrs-r...

And report builder: http://www.youtube.com/watch?v=Iy-bE0yXGlk

Can't beat it for $0.


ReportLab is a good product all right. Also, it has been around for a while and is pretty stable.

For text-only PDF reports (i.e. no styling, graphics or image support), my xtopdf toolkit may be worth a look, since it provides a higher-level and really simple to use API for basic use cases of PDF report generation: lines of text with pagination and page numbering, headers and footers, and setting the font - that's about it. But it turns out that you can generate many kinds of useful reports with just those features. As an aside, xtopdf can also create simple PDF ebooks from either text or XML. xtopdf is built using ReportLab.

xtopdf is open source, under the BSD license, and free.

A good high-level overview of xtopdf:

http://slid.es/vasudevram/xtopdf

The above URL is a presentation with many links in it, to more information about xtopdf, and many programming examples of the use of xtopdf for various applications.

The xtopdf project is at:

https://bitbucket.org/vasudevram/xtopdf

Also, a plug: I've available for consulting on PDF report generation from Python, using either xtopdf or Reportlab, or to some extent (can be decided on a case by case basis), even for PDF report generation using other libraries and from other languages, at least Ruby, Java and PHP.

I'm interested in feedback on xtopdf, including bugs, suggestions for features, etc.


I should also mention that I found ReportLab to be fairly fast, anecdotally, in my use of it. Not tested it a huge amount or on very large files, though.


I use pdfkit.js: http://pdfkit.org/

You can run it directly on the client (or the server, if you like), which is really nice.


Jasper works pretty well and gives you a designer, but you could also roll your own like Aheinemann says. I've done the wkhtmltopdf road before and it was tough to deal with paging.

I'm working on a related problem right now (end-user report design in the browser) so I'll be interested to see what other people say.


LaTeX. Here are some custom latex classes I uses for different purpose. Some are created by me, some are I get from elsewhere: https://github.com/malloc82/latex_classes


We use reportlab (https://pypi.python.org/pypi/reportlab) for invoice generation. Once you decide upon the template it's very easy to use.


Using it too, you can easily create templates in the "TRML" file format including headers/footers etc. It supports table breaking in the middle with repeating headers, this is really useful when you generate invoices or tables of data.


I've been using speedata-publisher[1] and pdf-writer[2] with good results.

[1] https://github.com/speedata/publisher

[2] https://github.com/galkahana/PDF-Writer

Some other options I've played with:

- https://github.com/signintech/gopdf

- https://github.com/mstamy2/PyPDF2

- http://www.jagpdf.org/


Our rails app generates dozens of accounting related reports on the fly in HTML and we use Flying Saucer to generate a PDF version -

https://github.com/flyingsaucerproject/flyingsaucer

To help minimize request time, we keep Flying Saucer persisted with Nailgun -

https://github.com/martylamb/nailgun

For generating checks, IRS forms and other PDFs that involve precise formatting we use Prawn -

https://github.com/prawnpdf/prawn


Hey! Flying Saucer! I wrote the first version of that almost a decade ago. I'm so happy to see people are still using it.


Use it on the JVM here, quite the god send, thanks for bringing it into being ;-)

We provide PDF reports generated from various stat pages on our site for NHL and college hockey scouts.

Performance is excellent, no caching required, and the implementation is seamless, just feed html directly to FS and voila, on-the-fly PDF reports.


I ended up wrapping iIText with my [clj-pdf](https://github.com/yogthos/clj-pdf) library for doing that sort of thing. I also created a standalone service using it, that accepts JSON and returns a PDF that can be found [here](https://github.com/yogthos/instant-pdf).


I run http://template2pdf.com/ as a side project.

Take a LibreOffice/OpenOffice document as your template and send a simple http request with the values you want to replace (with support for images and what you call 'repeatable parts'). The system will then return a link where you can pick up your generated pdf.


I generate the report as a normal HTML page and then render/export it using PhantomJS. Easiest solution I've found so far and I've done quite a bit of exploring.

https://github.com/ariya/phantomjs/blob/master/examples/rast...


In the one place where I need to create PDFs from code (generating address labels), I use the fpdf port of Python. It is probably too limited for most cases, but if you just need to position some simple images and text it might be enough.

https://code.google.com/p/pyfpdf/



Good reports are hard, I've not found a good open source solution for anything other than basic reports.

Last year I migrated 200+ reports from Crystal Reports to Microsoft ReportViewer Control.

Telerik Reporting is a product that we considered but didn't go with as we had the Microsoft Server.


We use shrimp(https://github.com/adjust/shrimp) which internally uses phantomjs. It works well but you will need to do the page break yourself which can become a pain.


We used to do wkhtmltopdf but nowadays we are using weasyprint + django templates. Much better support for css (including CSS Paged Media). See http://weasyprint.org/


In PHP the best one is TCPDF (replaces FPDF) along with FPDI. TCPDF is a library to generate PDF content, FPDI extends TCPDF/FPDF to allow for importing PDF (as templates, etc) for use in your creations.

Works really well.


We use BIRT to design & run our reports. It is flexible and renders in all sorts of ways (including PDF). We have not had major issues with it.


ReportLab.


Thanks for the recommendation. I can't believe I didn't know this existed, especially considering Wikipedia use it!


This thing is amazing


Generate SVG using an XML DOM library and then convert to PDF using Inkscape on the command-line.


Somewhat related: what is a good process for turning html-emails into PDFs (or images)?


PDFsharp is a pretty good opensource library for .NET languages.


pandoc is great too. you supply it with e.g. markdown and it can generate many different output formats. pdf via tex is one of it.


there is always itext / itextsharp. or pdflib. i think ghostscript does pdf conversions as well.


ReportLab


I use Telerik Reporting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: