Playing spot the difference with PDFs

Alex Rowan
Code & Wild
Published in
4 min readAug 10, 2021
Substituting the gift card image for a “spot the difference” I found, we can see that the monkeys tail has moved!

We had a problem at Bloom and Wild, how do we know if we’ve broken our PDF labels? To us, a broken label is pretty serious. It can put a halt to our entire delivery process, potentially causing large delays to our customers. The label contains all the shipping information, gift card image and customer message. If we accidentally start producing labels with misaligned barcodes or ineligible customer messages, they would have to be reprinted, and this is a no-go.

Originally, this meant manual regression testing. Which was very time-consuming. We needed a way to automate this process and ideally add it to the build pipeline.

Here I’ll walk you through some of the steps we used to solve this problem.

The tests

The simplest way to automate this process and add it to our build pipeline, is to add it to our tests. This’ll allow developers to test locally before they push any changes, and also ensure any changes that do get pushed are non-breaking.

Our tests require us to check multiple variants of the same label, so we created a variants.json with various config options:

[
{
"id": "variant_1",
"payload": {
// business logic for building variant 1 pdf
}
},
{
"id": "variant_2",
"payload": {
// business logic for building variant 2 pdf
}
}...

This’ll allow us to easily modify what we’re testing in the future.

Using RSpec, we can easily loop through the variants.json:

RSpec.describe "PDF comparison" do
variants = JSON.parse(
File.read("variant.json"),
symbolize_names: true
)
variants.each do |variant|
context "Variant: #{variant[:id]}" do
let(:variant_id) { variant[:id] }
before do
label_generator.generate_variant_pdf(
variant[:payload],
"/tmp/#{variant[:id]}.pdf"
)
end
it "Generates PDF that matches Reference" do
reference_path = "fixtures/#{variant_id}.pdf"
test_path = "/tmp/#{variant_id}.pdf"
result = PdfCompare.new(
reference_path,
test_path
).match?

expect(result).to be(true)
end
end
end
end

We start by reading the JSON file and symbolizing all the keys:

variants = JSON.parse(
File.read("variant.json"),
symbolize_names: true
)

Next we loop through the variants and generate a test PDF. We’ll make sure to save this test PDF in the /tmp directory for later use:

before do
label_generator.generate_variant_pdf(
variant[:payload],
"/tmp/#{variant_id}.pdf"
)
end

Here, we pass the reference_pathand newly generated test_path to the PdfCompare class. The expectation of this test is that the test PDF will match the reference PDF.

it "Generates PDF that matches Reference" do
reference_path = "fixtures/#{variant_id}.pdf"
test_path = "/tmp/#{variant_id}.pdf"
result = PdfCompare.new(
reference_path,
test_path
).match?

expect(result).to be(true)
end

The PdfCompare class

The aim of this class is to take two PDF files and compare them. However, there are some caveats to consider;

  1. PDF files can’t be directly compared.
  2. PDF files can contain multiple pages.

To solve both of these caveats, we used the Grim gem. Here you can see we’re reading the PDF, then saving each page of the PDF to a temp directory.

def convert_to_png(pdf_path, key)
pdf = Grim.reap(pdf_path)
pdf.each_with_index.map do |page, index|
temp_path = temp_png_path(key, index)
page.save(temp_path,
colorspace: "Gray",
alpha: "Remove"
)
temp_path
end
end

It’s also important to note that we removed the alpha layer and shifted the colour to greyscale. This becomes important later on when we compare the images.

The final output of this method is an array of paths to the temp pages:

[
[
"/tmp/ref_varient_1-0.png",
"/tmp/ref_varient_1-1.png"
],
[
"/tmp/test_varient_1-0.png",
"/tmp/test_varient_1-1.png"
]
]

We used (ref/test) to denote reference images from test ones. You’ll also note the (0 ,1) at the end of the file name, these indicate the page.

The match? method will be called from our tests, and will expect to return trueif the two PDFs match.

def match?
page_paths = []
page_paths << convert_to_png(reference_pdf_path, "ref")
page_paths << convert_to_png(test_pdf_path, "test")
compare(*page_paths)
end

Here we’re taking the two starting PDF paths, and resolving all the page image paths. These two arrays then get passed onto the compare method.

def compare(reference_paths, test_paths)
matcher = Imatcher::Matcher.new(
mode: :grayscale,
threshold: 0
)
fails = []
test_paths.each_with_index do |test_path, index|
reference_path = reference_paths[index]
result = matcher.compare(reference_path, test_path)
unless result.match?
fails << result
path = temp_fail_path(index)
result.difference_image.save(path)
end
end
fails.empty?
end

Here we use the Imatcher gem. We found the best comparison mode for this use case was grayscale, as it provides the most accurate fault detection.

The aim of the compare method is to take the array of test paths and compare them with their counterpart in the reference paths array. If the matcher decides that the two do not match, we record that in our fails array and save the “difference” image to a temp directory.

Example of “difference” image:

Here you can see that the barcode is missing from our test image as it is highlighted in red.

The compare method will then return false if the array contains any items. This will result in a failing test and the build will fail.

This solution solves our problem, and gives us extra confidence when making minor changes to the code base.

If you’d like to know more about how we write code at Bloom & Wild, or you’d like to join the team, reach out to us on Twitter or take a look at our careers site.

--

--