Converting HTML into PDF in Java
Using Open-Source Libraries: Jsoup and Flying Saucer
I recently ran into the need to convert an HTML file into a PDF file in Java using free, open-source libraries. In this post, I will walk you through my setup process:
- Installing Maven using Homebrew and configuring $JAVA_HOME
- Setting up a Maven project in IntelliJ and installing jars needed for our code
- Code to convert HTML into PDF
Installing and Configuring Maven
If you haven’t already, install Homebrew. Then we will install Maven with Homebrew.
brew install maven
Add the following line to ~/.bash_profile
export JAVA_HOME=$(/usr/libexec/java_home)
Then in the Terminal, run:
source ~/.bash_profile
To check if $JAVA_HOME
is set correctly:
echo $JAVA_PATH
This should give you something like /Library/Java/JavaVirtualMachines/adoptopenjdk-13.0.1.jdk/Contents/Home
Configuring a Maven project in IntelliJ
Create a new project in IntelliJ and select Maven. From this step on, there are two common errors you may encounter, and I will show you how to resolve them.
Error: release version not supported
Create a Java file in src/main/java
and have it print out Hello World!
Run the program and you may run into the error Error:java: error: release version 5 not supported
if you are using JDK 8+. There is a post on Dev.to about resolving this error.
In the pom.xml file, add the following lines: (1.8 for JDK 8, 1.11 for JDK 11, 1.13 for JDK 13, etc.) I am using JDK 13.
<properties>
<maven.compiler.source>1.13</maven.compiler.source>
<maven.compiler.target>1.13</maven.compiler.target>
</properties>
Shift + Cmd + A (on Mac) or Help > Find Actions to bring up the Actions menu. Type Reimport All Maven Projects.
Now rerun the Java program to ensure that Hello World is printed correctly.
Next, let’s add the jar files our HTML to PDF code depend on in pom.xml. Add the following lines.
<dependencies>
<!-- https://mvnrepository.com/artifact/org.xhtmlrenderer/flying-saucer-core -->
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-core</artifactId>
<version>9.1.20</version>
</dependency> <!-- https://mvnrepository.com/artifact/org.xhtmlrenderer/flying-saucer-pdf-openpdf -->
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf-openpdf</artifactId>
<version>9.1.20</version>
</dependency></dependencies>
Maven Error: invalid target release
In IntelliJ’s Terminal, run this command to install the dependencies.
mvn install
This may fail with an error message saying error: invalid target release: 1.13
. This blog post has a solution for this.
Add the following plugin to pom.xml and do Reimport All Maven Projects. Note that the source
and target
should be 1.8 for JDK 8, 1.10 for JDK 10, but 11 for JDK 11, 12 for JDK 12, 13 for JDK 13, etc.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>13</source>
<target>13</target>
</configuration>
</plugin>
</plugins>
</build>
Then mvn install
should finish without errors.
Converting HTML to PDF
We need two steps: First, convert HTML to XHTML with Jsoup. Second, convert XHTML to PDF with Flying Saucer. XHTML is different from HTML in that XHTML is a syntactically stricter version of HTML. For instance, XHTML doesn’t allow self-closing tags like <img src=''>
.
HTML to XHTML
Add Jsoup as a dependency to pom.xml:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
Import Jsoup:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
Create a method to convert HTML to XHTML:
private static String htmlToXhtml(String html) {
Document document = Jsoup.parse(html);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
return document.html();
}
XHTML to PDF
Add Flying Saucer as a dependency to pom.xml:
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-core</artifactId>
<version>9.1.20</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf-openpdf</artifactId>
<version>9.1.20</version>
</dependency>
Import Flying Saucer:
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.*; // for file I/O
Create a method to convert XHTML to PDF:
private static void xhtmlToPdf(String xhtml, String outFileName) throws IOException {
File output = new File(outFileName);
ITextRenderer iTextRenderer = new ITextRenderer();
iTextRenderer.setDocumentFromString(xhtml);
iTextRenderer.layout();
OutputStream os = new FileOutputStream(output);
iTextRenderer.createPDF(os);
os.close();
}
Optionally, you can register custom fonts used in the HTML. Right after the line ITextRenderer iTextRenderer = new ITextRenderer();
, add:
FontResolver resolver = iTextRenderer.getFontResolver();
iTextRenderer.getFontResolver().addFont("MyFont.ttf", true);
Putting Both Methods Together
public static void main(String[] args) throws IOException {
String html = "<h1>hello</h1>";
String xhtml = htmlToXhtml(html);
xhtmlToPdf(xhtml, "output.pdf");
}
I downloaded a font called Butterfly.ttf and used it in my HTML.
<head>
<style>
@font-face {
font-family: "Butterfly";
src: url("Butterfly.ttf");
}
.butterfly {
font-family: "Butterfly";
}
</style>
</head><body>
<h1>Hello world</h1>
<img src="https://www.w3schools.com/w3css/img_lights.jpg">
<p>Regular text</p>
<p style="color:red">Red text</p>
<b>Bold text</b>
<p class="butterfly">Fancy font</p>
</body>