Puppeteer vs WkHtmlToPdf and why I created a new module

In one of our recent projects at Coletiv the user had to be able to generate a PDF invoice with the list of orders he did on the platform we have built using Elixir. By that time I didn’t know what could be the best solution or if there was some native implementation to use.

With this article I want to share with you the steps I followed, the decisions I have made until I reached the final solution and why I ended up creating our own module. Hopefully the article can also help you solve a similar problem and make you contribute to the package we created.

I started by researching for existing solutions and quickly discovered that without a doubt the best solution to generate a PDF file was to have it designed in HTML and then convert it to PDF. Although I’m a backend developer at heart, I have some HTML and CSS skills that I could use on this task. It appeared that pdf_generator was the top choice for Elixir developers when there was a need to generate PDF files from code. pdf_generator is a wrapper to wkhtmltopdf, a C++ tool to create PDF files using HTML files.

I also found another alternative, a native PDF generator named gutenex, but quickly found that doing some fancy design with this module would be a more difficult process than writing HTML code (at least if you know HTML and CSS). Also this module hasn’t seen any activity over 2 year, another reason why we decided to go for pdf_generator instead.

WkHtmlToPdf

This software is widely used over the internet as one of the main tools to convert HTML to PDF. At first I was very satisfied with the output generated, being the only problem the HTML table entity containing the orders being badly divided between multiple pages. I ended up calculating how many table rows I could fit on each page, but as I kept evolving the design I had to redo the calculations over and over again. This was very error prone, specially for the other developers working in the project as we couldn’t forget to recheck the calculations every time we did a change.

HTML table entity render behaviour on Puppeteer (left) and on WkHtmlToPdf (right)

The second issue I encountered was with unicode characters, like Chinese characters. The solution for this issue was to convert each character in HTML entities first in order for them to appear in the PDF.

The third and most annoying issue was that wkhtmltopdf uses the machine display to generate the PDF file. For instance, being on an Macbook Retina (2560x1600px 13" display) I needed to exaggerate the document size in the CSS style (i.e. font-size, padding) in order to have a proper sized PDF and not a tiny one. For example, I needed to have font-size: 42px instead of font-size: 12px that was what I would normally use.

Although this wasn’t a very difficult problem to fix, when I deployed to a development server, which isn’t connected to a screen, the default screen size used was very small (1366x768) and the PDF file generated had the layout designed for my screen (2560x1600px 13" display) which in comparison was too big. Since I needed to do some calculation previously to show the content properly, this was unbearable to maintain.

I tried to tweek the display size with the --viewport-size option, but without success. You can find somes issues related to that in their repository, that contain over 1000 issues still open.

Some other possible solutions was to use the --dpi option, having also the --zoom option to reduce the overall size on the PDF generated on the server side, but in the end I couldn’t replicate the same design that I’ve tested on my computer.

At this point I decided to take a step back and rethink the solution, that’s when puppeteer came into play.

Puppeteer

Taking the suggestion of Daniel Ruf related to the display size issues, I end up exploring puppeteer, a Node API that allows you to take screenshots from webpages as well as generate PDF files using a version of Google Chrome browser in headless mode.

With this software, I finally could have the same PDF design replicated in both my laptop and on the server. Globally the PDF rendered was better than wkHtmlToPdf, being the only issue so far the file size which is 10 times higher than the size of the file generated by wkHtmlToPdf.

This also allowed me to implement the header and footer in HTML, and after some tweaking with margin parameters. I could finally generate a PDF file without having to do the previous calculations on the template items. This feature is available in the wkHtmlToPdf, but I just noticed that after exploring the puppeteer options.

The next step was obviously to create a wrapper in Elixir (similar to the pdf_generator wrapper) that allowed other people to use puppeteer the same way.

The new module is available in hex.pm, and also in our github repository.

Puppeteer vs WkHtmlToPdf

In order to help you with the decision of picking one of the two let’s highlight some possible reasons to choose one over the other. In the end it all comes to your specific needs.

Advantages of using puppeteer over wkHtmlToPdf

  • Better PDF rendering
  • Easier to use
  • Uses a well maintained software (puppeteer)
  • Uses less Elixir dependencies

Disadvantages of using puppeteer over wkHtmlToPdf

  • Needs NodeJS
  • Larger footprint (needs NodeJS plus Google Chrome image ~90MB)
  • Generated file size is way bigger than the one generated by wkHtmlToPdf

Final thoughts

If you are currently using pdf_generator wrapper and you are happy with the results that you have, you shouldn’t move away from it. If you’re searching for a PDF generator module for your Elixir project, take some time to give puppeteer_pdf a try.

Credits

HTML and PDF icon, created by Dimitry Miroliubov

Like what you read? Give David Magalhães a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.