PDF content verification in Playwright

Published in

BosphorusISS

5 min readDec 19, 2022

Are you looking for a blog post about how to use Playwright and NodeJS to verify the content of a PDF file? If so, you’re in the right place! In this blog post, we’ll go over how to use Playwright, pdf-parse (a JavaScript library for parsing and extracting text from PDF file) and fs (a built-in Node.js module for working with the file system) to verify the contents of a PDF file.

Verifying PDF files in web applications testing is crucial commonly because PDF files are often used to give the most important information to users, such as invoices, contracts, and other legal documents.

Playwright are executed on node.js environment

First, I would like to introduce Playwright which is an open source automation framework for testing web applications. Like the other open-source test frameworks, it does not have built-in PDF or local files verification. While the Playwright framework is powerful compared to other automation tools, testing a PDF file is example of a minus of Playwright framework. That’s why I’ve written this blog.

Now that we have a basic understanding of what Playwright is, let’s go over how to use it to verify the content of a PDF file. First, we’ll need to install Playwright, fs and pdf-parse via npm:


npm install playwright fs pdf-parse
npm init playwright@latest

I will verify that a simple pdf content is the same as the expected PDF file in Playwright with fs and pdf-parse. Firstly, we need to get the content manually using pdf-parse to create the expected value file(txt).

Web UI is used to download the PDF file like real user:

Navigate the browser to the PDF link and download the PDF file into the ExportData Directory

Then I will write the PDF text content to console using pdf-parse:

Output:

 A Simple PDF File 
 This is a small demonstration .pdf file - 
 just for use in the Virtual Mechanics tutorials. More text. And more 
 text. And more text. And more text. And more text. 
 And more text. And more text. And more text. And more text. And more 
 text. And more text. Boring, zzzzz. And more text. And more text. And 
 more text. And more text. And more text. And more text. And more text. 
 And more text. And more text. 
 And more text. And more text. And more text. And more text. And more 
 text. And more text. And more text. Even more. Continued on page 2 ...

 Simple PDF File 2 
 ...continued from page 1. Yet more text. And more text. And more text. 
 And more text. And more text. And more text. And more text. And more 
 text. Oh, how boring typing this stuff. But not as boring as watching 
 paint dry. And more text. And more text. And more text. And more text. 
 Boring.  More, a little more text. The end, and just as well.

This content needs to be verified manually once, then, it is saved as expected.txt file into the ExportData directory.
We have expected.txt file now to compare with the actual data. Now we can create an automated test case in Playwright for the expected PDF file with the actual PDF file. So, at the end of the blog, code block and hierarchy of directories will be like:

- ExportData
  - actual.txt
  - expected.txt
  - sample.pdf
- node_modules
- playwright-report
- tests
  - example.spec.js 
- package-lock.json
- package.json
- playwright.config.js

example.spec.js

Looking at the code snippet below, the first line imports the test and expect functions from Playwright's @playwright/test module. The second line imports the fs module, which is used to read and write files on the file system:

The code below defines a test case using the Playwright test keyword. This test navigates to a URL that serves a PDF file using the page.goto function. It waits for the download event to be triggered and clicks on a link to download the PDF file:

Create a test block and download the PDF file

Once the download has completed, as it can be seen below, the code saves the PDF file into the /ExportData directory using the filename suggested by the download event. It uses the pdf-parse module to extract the text from the PDF file and save it to a file called actual.txt into the /ExportData directory:

Import pdf-parse and write the text content into the actual.txt file

The code snippet below reads the expected and actual values from the files that were saved earlier. It uses the expect function from Playwright to assert that the values match. If they do not match, the test will fail:

At the end, the complete example.spec.jsfile:

const { test, expect } = require('@playwright/test');
const fs = require('fs');

// Define a test using Playwright's `test` function
test('verify content', async ({ page }) => {

  // Navigate to a URL that serves a PDF file
  await page.goto('https://www.africau.edu/images/default/sample.pdf');

  // Wait for the download event and click on a link to download the PDF file
  const [download] = await Promise.all([
    page.waitForEvent('download'),
    page.getByRole('link', { name: 'A Simple PDF File https://www.africau.edu › images › default › sample' }).click()
  ]);

  // Use the suggested filename from the download event to save the file
  const suggestedFileName = download.suggestedFilename();
  const filePath = 'ExportData/' + suggestedFileName;
  await download.saveAs(filePath);

  // Use the 'pdf-parse' module to extract the text from the PDF file
  var pdf = require('pdf-parse');
  var dataBuffer = fs.readFileSync('./ExportData/sample.pdf');
  await pdf(dataBuffer).then(function(data) {
    fs.writeFileSync('./ExportData/actual.txt', data.text);
  });

  // Read the expected and actual values from the saved files
  let expected_export_values = fs.readFileSync('./ExportData/expected.txt', 'utf-8');
  let actual_export_values = fs.readFileSync('./ExportData/actual.txt', 'utf-8');

  // Use the `expect` function from Playwright to assert that the values match
  expect(expected_export_values).toMatch(actual_export_values);
});

Conclusion

Neither Playwright nor other open source test automation tools have built-in PDF file verification. When I needed to verify generated PDF file by the web application that I am testing with Playwright, I could not find any blog/document/tutorial online. That is why I wrote this blog, so that one can learn how to test the content of a PDF file in the Playwright automated web application test suite. 🚀

PDF content verification in Playwright

Conclusion

Published in BosphorusISS

Written by Bünyamin Mete

Responses (4)