Books Shelf
Photo by Tom Hermans on Unsplash

The Art of Book Creation with Pandoc and Markdown

Self-Publishing Technical Books

Syed Asif
5 min readNov 23, 2023

--

If you’re looking for a way to create a PDF or EPUB from a markdown file, you might have come across Pandoc, a versatile document converter that can convert files between different formats, including Markdown, HTML, LaTeX, and more.

Pandoc is a valuable tool for writers, researchers, and anyone dealing with diverse document formats. It can seamlessly transform documents, making it a great choice for converting Markdown to PDF or restructuring HTML into a different format.

Markdown is a simple markup language that is widely used for web content, documentation, and e-book creation. It’s known for its readability and is a quick, user-friendly way to structure and style text without the complexities of HTML. This tutorial is aimed at those who want to use Pandoc and Markdown for making PDFs and EPUB, especially for technical books that have code snippets.

Installing Pandoc

The process of installing Pandoc depends on your operating system. Below are instructions for the Linux (Debian/Ubuntu) operating system. Open the Terminal, and run the following commands:

$ sudo apt-get update
$ sudo apt-get install pandoc
$ sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic

These instructions cover the basic installation process for the Debian-based operating system. Make sure to check the official Pandoc Installation Guide for any additional details or specific requirements related to your system.

Example — Converting Markdown to PDF

Once Pandoc is working on your system, generate a sample PDF without any customization. Let’s say you have a Markdown file named example.md, and you want to convert it to a PDF using Pandoc.

Here’s a simple example:

# Example Document

This is a simple example of a Markdown document.

- It has a list.
- It supports **bold** and *italic* text.

Use the following command to convert the Markdown file to a PDF:

$ pandoc example.md -o example.pdf

This command tells Pandoc to take example.md as input (-o stands for output), and it will generate a PDF file named example.pdf. If you want to create an EPUB file, you could replace example.pfd with example.epub.

After running the command, you should see a new PDF file example.pdf in the same directory.

It is advised to use markdown headers in order without skipping — for example, H1 for the chapter heading and H2 for the chapter subsection, etc.

Enhancing Document Structure

Custom Chapter Breaks

In Pandoc, you can insert chapter breaks by using specific headers in your Markdown document. Typically, you would use a header level that indicates the beginning of a new chapter.

# Chapter 1

This is the content of Chapter 1.

# Chapter 2

This is the content of Chapter 2.

# Chapter 3

This is the content of Chapter 3.

When you convert this Markdown file to a format like PDF using the previous Pandoc command, it will recognize these headers as chapter breaks but no page breaks. We can also customize the appearance of the chapter breaks with a new page in the output by using a custom LaTeX template.

To force page breaks before each section (chapter breaks if you're using headers like # Chapter 1), you can utilize custom LaTeX templates. Here's how you can achieve it using the -H option in the Pandoc command:

\usepackage{sectsty}
\sectionfont{\clearpage}

Using a custom LaTeX template with Pandoc gives you a lot of flexibility in terms of formatting and styling your document. Your Pandoc command with the -H option and the specified LaTeX template file (chapter_break.tex):

$ pandoc example.md -H chapter_break.tex -o example_chapter_break.pdf

The -V option empowers the customization of variable values, enabling the adjustment of settings such as page size, font, link color, and more.

As the number of settings modified increases, using a Python script to invoke Pandoc becomes a more efficient alternative, eliminating the need to manually input lengthy commands in the terminal.

import os
import pypandoc

input_dir = "chapters"
output_dir = "my_book"
output_filename = "book_output.pdf"

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

# Build the pandoc options as a string
pandoc_cmd = [
"--include-in-header", "chapter_break.tex",
"-V", "linkcolor:blue",
"-V", "geometry:a4paper",
"-V", "geometry:margin=1.8cm",
"-V", "mainfont=DejaVu Serif",
"-V", "monofont=SauceCodePro Nerd Font",
"--pdf-engine=xelatex",
]

output_path = os.path.join(output_dir, output_filename)
# Convert all markdown files in the chapters/ subdirectory.
pypandoc.convert_file(
os.path.join(input_dir, "*.md"), "pdf",
outputfile=output_path,
extra_args=pandoc_cmd,
)

print(f"Conversion completed. Output saved to: {output_path}")
  • mainfont is for normal text
  • monofont is for code snippets
  • geometry for page size and margins
  • linkcolor to set the hyperlink color

The Pandoc invocation is now through a script:

$ python3 md2pf.py

Syntax highlighting

To customize syntax highlighting for code snippets, one option is to save a Pandoc theme and edit it accordingly.

$ pandoc --print-highlight-style=pygments > pygments.theme

Edit the generated file, in this demonstration, the following settings are modified:

# Background color changed to a shade of gray for easy code-text distinction
"background-color": "#f8f8f8",

# Italic switched to false to avoid issues with comments
# Comment text-color changed to another shade of gray
"Comment": {
"text-color": "#9c9c9c",
"background-color": null,
"bold": false,
"italic": false,
"underline": false
},

For inline code snippets, we are changing the background color, add this snip to your new file (inline_code.tex):

\usepackage{fancyvrb,newverbs,xcolor}

\definecolor{Light}{HTML}{F4F4F4}

\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
\colorbox{Light}{\oldtexttt{#1}}
}

Include the following options in your Python script and regenerate the PDF again:

# Rest of the script as above...
# Build the pandoc options as a string
pandoc_cmd = [
"--include-in-header", "chapter_break.tex",
"--highlight-style", "pygments.theme",
"--include-in-header", "inline_code.tex"
"-V", "linkcolor:blue",
"-V", "geometry:a4paper",
"-V", "geometry:margin=1.8cm",
"-V", "mainfont=DejaVu Serif",
"-V", "monofont=SauceCodePro Nerd Font",
"--pdf-engine=xelatex",
]

# Rest of the script as above...

Bullet styling

To customize bullet style in Pandoc, you can use a custom LaTeX template for that, rename chapter_break.tex and inline_code.tex to main.tex. Add the below code in your main.tex:

\usepackage{enumitem}
\usepackage{amsfonts}

% level one
\setlist[itemize,1]{label=$\bullet$}
% level two
\setlist[itemize,2]{label=$\circ$}
% level three
\setlist[itemize,3]{label=$\star$}

Table of Contents

Pandoc offers a convenient --toc option to effortlessly append a table of contents at the beginning of your generated PDF.

Customize the depth of the table of contents with the --toc-depth option; the default is set to 3 levels. Additionally, you can personalize the default label Contents to your preference using the -V toc-title option.

There are more options I have added to the script, the script is now:

# Rest of the script as above...
# Build the pandoc options as a string
pandoc_cmd = [
"--toc",
"--number-sections",
"--resource-path=.",
"--metadata-file", "metadata.yml",
"--include-in-header", "main.tex",
"--highlight-style", "pygments.theme",
"-V", "toc-title=Table of contents",
"-V", "linkcolor:blue",
"-V", "geometry:a4paper",
"-V", "geometry:margin=1.8cm",
"-V", "mainfont=DejaVu Serif",
"-V", "monofont=SauceCodePro Nerd Font",
"--pdf-engine=xelatex",
]

# Rest of the script as above...

If you need a template, look at the repository here.

This guide serves as a compass for navigating through the intricate details of Markdown, Pandoc, and the creative customization they provide during the process of book creation with Pandoc. From syntax highlighting to table of contents, unleash your creativity and elevate your writing to new heights.

Reference: learnbyexample

--

--