Programmatically generating PDF solutions

Deepak Shakya
Gyana Limited
Published in
8 min readJul 17, 2017

Sometime back I was looking for some open source solutions to create PDF. I came across many solutions and thought of sharing my analysis. Below, I am sharing 6 different existing PDF generation solutions. Each one of them is different and you may choose to pick one over other depending upon your priority, project, framework etc.

1. XHTML2PDF

xhtml2pdf is a html2pdf converter using the ReportLab Toolkit, the HTML5lib and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python so it is platform independent.

The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.

Support:

Python2 (in addition to xhtml2pdf, need to install this specific version of html5 in case of error , pip install html5lib==1.0b8)

Python3 (pip install xhtml2pdf==0.2b1)

Git Source: https://github.com/xhtml2pdf/xhtml2pdf

Documentation: http://xhtml2pdf.readthedocs.io/en/stable/usage.html

Pros: Quick and easy to create pure text based PDFs

Quick example:

from xhtml2pdf import pisa # import python module# Define your data
sourceHtml = “””<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div>
<h1> Hello, world! </div>
<p>
The quick red fox jumps over the lazy brown dog.
</p>
</div>
</body>
</html>”””
outputFilename = “test.pdf”
# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
# open output file for writing (truncated binary)
resultFile = open(outputFilename, “w+b”)
# convert HTML to PDF
pisaStatus = pisa.CreatePDF(
sourceHtml, # the HTML to convert
dest=resultFile) # file handle to recieve result
# close output file
resultFile.close() # close output file
# return True on success and False on errors
return pisaStatus.err
# Main program
if __name__==”__main__”:
pisa.showLogging()
convertHtmlToPdf(sourceHtml, outputFilename)

2. REPORTLAB

Solutions to generate rich, attractive and fully bespoke PDF documents at incredible speeds. Serve high quality personalised documents in real time and support all kinds of delivery from web downloads to digital print from a single API.

Support: Python 2.7 or 3.3+

You need to register here (http://www.reportlab.com/accounts/register/) before able to get access to download source code

Source: http://www.reportlab.com/software/downloads/

Bitbucket: https://bitbucket.org/rptlab/reportlab

Documentation: https://www.reportlab.com/documentation/

(reportlab) $ pip install rlextra -i https://www.reportlab.com/pypi
Collecting rlextra
User for www.reportlab.com: <registered user id>
Password: <your password>
Downloading https://www.reportlab.com/pypi/packages/rlextra-3.4.14.tar.gz (9.2MB)
100% |████████████████████████████████| 9.2MB 13.4MB/s
Collecting reportlab>=3.4.14 (from rlextra)
Downloading https://www.reportlab.com/pypi/packages/reportlab-3.4.14.tar.gz (2.0MB)
100% |████████████████████████████████| 2.0MB 11.6MB/s
Collecting Pmw>=2.0.0 (from rlextra)
Downloading https://www.reportlab.com/pypi/packages/Pmw-2.0.0.tar.gz (847kB)
100% |████████████████████████████████| 849kB 11.9MB/s
Collecting preppy>=2.3.5 (from rlextra)
Downloading https://www.reportlab.com/pypi/packages/preppy-2.4.1-py2.py3-none-any.whl
Collecting pyRXP>=2.1.1 (from rlextra)
Downloading https://www.reportlab.com/pypi/packages/pyRXP-2.1.1.tar.gz (251kB)
100% |████████████████████████████████| 256kB 6.4MB/s
Collecting pillow>=2.4.0 (from reportlab>=3.4.14->rlextra)
Downloading https://www.reportlab.com/pypi/packages/Pillow-3.0.0.tar.gz (9.6MB)
100% |████████████████████████████████| 9.6MB 14.3MB/s
Installing collected packages: pillow, reportlab, Pmw, preppy, pyRXP, rlextra
Running setup.py install for pillow … done
Running setup.py install for reportlab … done
Running setup.py install for Pmw … done
Running setup.py install for pyRXP … done
Running setup.py install for rlextra … done
Successfully installed Pmw-2.0.0 pillow-3.0.0 preppy-2.4.1 pyRXP-2.1.1 reportlab-3.4.14 rlextra-3.4.14

Example:

In reportlab repository navigate to following path: reportlab/demos/gadflypaper/gfe.py

Pros: provides wide range of support for different data elements in pdf.

3. PHANTOMJS

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

Main website: http://phantomjs.org/

Download: http://phantomjs.org/download.html

Example:

https://coderwall.com/p/5vmo1g/use-phantomjs-to-create-pdfs-from-html

phantomjs rasterize.js https://www.google.com google.pdf

Pros: Quickly convert any existing webpage to PDF.

Cons: This may not be great solution if you looking for customisation on the fly type of solution to create PDF.

4. jsPDF — A library to generate PDFs in client-side JavaScript.

This is javascript based solution. I like this personally as it has very simple and intuitive API (very well documented) to use and one can create PDF from scratch.

Github: https://github.com/MrRio/jsPDF

Git (clone): https://github.com/MrRio/jsPDF.git

NPM: https://www.npmjs.com/package/jspdf

Documentation: http://rawgit.com/MrRio/jsPDF/master/docs/

Good examples to start: https://mrrio.github.io/

Live Demo: http://rawgit.com/MrRio/jsPDF/master/

Quick sample code:

var doc = new jsPDF();
doc.text(20, 20, 'Hello world!');
doc.text(20, 30, 'This is client-side Javascript, pumping out a PDF.');
doc.addPage();
doc.text(20, 20, 'Do you like that?');

// Output as Data URI
doc.output('datauri');
doc.save('test.pdf');

Pros: It has very simple and intuitive API (very well documented) to use and one can create PDF from scratch.

5. WeasyPrint

WeasyPrint is a visual rendering engine for HTML and CSS that can export to PDF. It aims to support web standards for printing. WeasyPrint is free software made available under a BSD license.

Support: Python 2.7 or 3.3+

Github: https://github.com/Kozea/WeasyPrint

Tutorial: http://weasyprint.readthedocs.io/en/stable/tutorial.html

Documentation: https://weasyprint.readthedocs.io/en/stable/

API: http://weasyprint.readthedocs.io/en/stable/api.html

Dependencies:

brew install python3 cairo pango gdk-pixbuf libffi

Installation:

(weasyprint) $ pip install weasyprint

After installation, quick check:

(weasyprint) $ weasyprint — help
usage: weasyprint [-h] [ — version] [-e ENCODING] [-f {pdf,png}]
[-s STYLESHEET] [-m MEDIA_TYPE] [-r RESOLUTION]
[ — base-url BASE_URL] [-a ATTACHMENT] [-p]
input output
Renders web pages to PDF or PNG.positional arguments:
input URL or filename of the HTML input, or - for stdin
output Filename where output is written, or - for stdout
optional arguments:
-h, --help show this help message and exit
--version Print WeasyPrint's version number and exit.
-e ENCODING, --encoding ENCODING
Character encoding of the input
-f {pdf,png}, --format {pdf,png}
Output format. Can be omitted if `output` ends with a
.pdf or .png extension.
-s STYLESHEET, --stylesheet STYLESHEET
URL or filename for a user CSS stylesheet. May be
given multiple times.
-m MEDIA_TYPE, --media-type MEDIA_TYPE
Media type to use for @media, defaults to print
-r RESOLUTION, --resolution RESOLUTION
PNG only: the resolution in pixel per CSS inch.
Defaults to 96, one PNG pixel per CSS pixel.
--base-url BASE_URL Base for relative URLs in the HTML input. Defaults to
the input's own filename or URL or the current
directory for stdin.
-a ATTACHMENT, --attachment ATTACHMENT
URL or filename of a file to attach to the PDF
document
-p, --presentational-hints
Follow HTML presentational hints.

Example to try:

weasyprint http://weasyprint.org ./weasyprint-website.pdf

6. LaTeX

LaTeX, which is pronounced «Lah-tech» or «Lay-tech» (to rhyme with «blech» or «Bertolt Brecht»), is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents but it can be used for almost any form of publishing.

Website: https://www.latex-project.org/

Download: https://www.latex-project.org/get/

It is supported on all platforms.

You can install full version (MacTeX, 3.5GB) or smaller limited version (BasicTeX)which is approx 72MB.

Sample latex file:

\documentclass{article}
\usepackage{graphicx}

\begin{document}

\title{Introduction to \LaTeX{}}
\author{Author's Name}

\maketitle

\begin{abstract}
This is abstract text: This simple document shows very basic features of
\LaTeX{}.
\end{abstract}


\section{Introduction}

Here is the text of your introduction. We use some Latin nonsense text to fill
the paragraphs. This way the resulting document will look more like an actual
scientific paper or so. Here is an equation:

\begin{equation}
\label{simple_equation}
\alpha = \sqrt{ \beta }
\end{equation}

Now you don't need to read the text any further because it's just Lorem ipsum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit
amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.


\subsection{Subsection Heading Here}

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.


\section{Conclusion}

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.


\end{document}

Command to compile it:

$pdflatex sample_latex.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./sample_latex.tex
LaTeX2e <2017-04-15>
Babel <3.10> and hyphenation patterns for 22 language(s) loaded.
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/base/article.cls
Document Class: article 2014/09/29 v1.4h Standard LaTeX document class
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/base/size10.clo))
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics/graphicx.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics/keyval.sty)
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics/graphics.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics/trig.sty)
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics-cfg/graphics.cfg)
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/graphics-def/pdftex.def
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/infwarerr.sty)
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/ltxcmds.sty))))
(./sample_latex.aux)
(/usr/local/texlive/2017basic/texmf-dist/tex/context/base/mkii/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
) (/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/pdftexcmds.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/ifluatex.sty)
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/ifpdf.sty))
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/oberdiek/epstopdf-base.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/oberdiek/grfext.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/kvdefinekeys.sty)
) (/usr/local/texlive/2017basic/texmf-dist/tex/latex/oberdiek/kvoptions.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/kvsetkeys.sty
(/usr/local/texlive/2017basic/texmf-dist/tex/generic/oberdiek/etexcmds.sty)))
(/usr/local/texlive/2017basic/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg
)) [1{/usr/local/texlive/2017basic/texmf-var/fonts/map/pdftex/updmap/pdftex.map
}] [2] (./sample_latex.aux) )</usr/local/texlive/2017basic/texmf-dist/fonts/typ
e1/public/amsfonts/cm/cmbx12.pfb></usr/local/texlive/2017basic/texmf-dist/fonts
/type1/public/amsfonts/cm/cmbx9.pfb></usr/local/texlive/2017basic/texmf-dist/fo
nts/type1/public/amsfonts/cm/cmex10.pfb></usr/local/texlive/2017basic/texmf-dis
t/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/local/texlive/2017basic/texmf
-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/local/texlive/2017basic/te
xmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb></usr/local/texlive/2017basic
/texmf-dist/fonts/type1/public/amsfonts/cm/cmr17.pfb></usr/local/texlive/2017ba
sic/texmf-dist/fonts/type1/public/amsfonts/cm/cmr6.pfb></usr/local/texlive/2017
basic/texmf-dist/fonts/type1/public/amsfonts/cm/cmr9.pfb>
Output written on sample_latex.pdf (2 pages, 95855 bytes).
Transcript written on sample_latex.log.

Pros: Amazing amount of granularity to create your PDF.

Cons: It might take some time to get yourself familiar with all syntax of LaTeX to use it’s full potential.

I wanted to put together all these solutions on one page and hopefully it might be useful for many people who are looking for similar solutions.

--

--

Deepak Shakya
Gyana Limited

COO at Samudra Oceans (Climate-tech startup), Hatha Yoga teacher in London, U.K.