Building poppler-utils for CentOS 6.5 (really)

Jake Bathman
4 min readDec 29, 2017

--

For a number of reasons, one of our production machines is firmly fixed at CentOS 6.5.

This same server also handles big scanned PDFs for us, processing them to individual files and working some magic to log data from each page. It’s a fun setup that I won’t go into here.

Recently, the package we were using to convert PDFs to JPEGs (sejda-console, which aside from this is a really nice tool) started producing some strange results: all images were a bizarre white-on-black edge detection version (kinda like this). This had something to do with the original PDF, but we couldn’t figure out a common thread.

Rather than re-process these big PDFs to maybe prevent this, we switched to another package: poppler. Specifically the poppler-utils command pdftoppm.

The switch to poppler was great…until we tried to deploy to production.

Time to Troubleshoot

When something works on one machine and not another, the first thing to compare is the installed version of the package.

As expected, we had a problem:

Production (left) had a very, very old version

Version 0.12.4 is from 2010, and is the latest version served from yum. That’s no good. And, as far as we could figure out, there’s not a more recent version built for CentOS 6.5.

Looking at the poppler changelog, we don’t really need to have version 0.41.0 in production, just the one that does the job for us: pdftoppm with JPEG output support. That was added in 0.13.0, so we picked version 0.13.4 as our build target.

Building from source

If you’re trying to build poppler-utils for CentOS, or something similar, these are the steps we took to get it working. (I’m writing this as our production server churns through 39 backlogged PDFs and the few thousand JPEG files created by this build, so it’s definitely working.)

Step 1: Get the source

From the command line, let’s start by retrieving the source files. You can put them anywhere; they’re only good during the build, and can be safely deleted when we’re done.

$ cd ~
$ wget https://poppler.freedesktop.org/poppler-0.13.4.tar.gz --no-check-certificate

Note: the --no-check-certificate flag is yucky, but wget won’t work without it as of this writing. If you’d like, you can use another method to get the tarball to your machine, such as curl.

Step 2: Unpack the source

$ tar xf poppler-0.13.4.tar.gz
$ cd poppler-0.13.4

Step 3: Prepare & configure with libjpeg

As I said earlier, we were most interested in the pdftoppm command creating a bunch of JPEG files for us. This requires us to include a few extra steps when installing.

You’ll need to install libjpeg and libjpeg-devel from yum:

$ sudo yum install -y libjpeg libjpeg-devel

Once that’s done, we can configure the installation of our package:

$ ./configure --enable-libjpeg

The flag --enable-foo tells the configure script which extra features to include or not include, and there are many other options. Read through the INSTALL file to learn more, or dive right into the configure file with vim configure to see the source.

Step 4: Make and install

Once configured, there are two more commands to run:

$ make  ...$ make install

If you get an error installing, try again using sudo make install instead.

Step 5: Did it work?

Before we’re done, check to make sure everything worked as expected.

The commands should be installed to /usr/local/bin/, and we can check them by running:

$ /usr/local/bin/pdftoppm -v
pdftoppm version 0.13.4
Copyright 2005-2010 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC
You did it!

Step 6: Use it!

In the time it took to write this up, our server has processed half of our PDF backlog. This tool will now be used to process around 200,000 PDF to JPEG conversions per year, all thanks to building from source.

If you run across any issues with the steps above, or have suggestions on how we might have done this a little easier (besides upgrading from CentOS 6.5), you can find me on Twitter @jakebathman.

--

--

Jake Bathman

I break things to see how they work, and just hope when I put it back together there aren’t screws left over.