Automation for validating the content of PDF document
Testing requirements are developing day by day due to the complexity of the feature, we may learn how to validate various cases in various conditions.
In this article, I will elaborate more on how to automate the content of the PDF document.
To begin with, As an example, SIPLah (Sistem Aplikasi Pengadaan Sekolah) is a procurement application in Indonesia that is used by schools to purchase their needs using government funding. As required, siplah.blibli.com generates some documents that might be used as a legal report by the schools to other stakeholders, for instance, but not limited to BAST, comparison, invoice, negotiation document, SPK.
From this background, it is clearly apparent that the validity of the documents is essential in this application. Thus, as SDET of B2G squad, we tried to explore how to automate not only the detail shown in the application but also the content of the pdf documents that will be downloaded by users.
There is a solution the validation of the result, to minimize the duration of the test.
in Java, there is already a library to manipulate a PDF document called Apache PDFBox, which can be used to convert the PDF document into String data. After that, we can add an assertion to verify or validate with expected data that is shown on PDF Document.
Implementation
1. Add Dependency on pom.xml
https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.26</version>
</dependency>
2. Put a code to read and convert it from a pdf document into a string
public String convertPDFtoString(File path){
String contentString = "";
try {
PDDocument pdDocument = PDDocument.load(path);
contentString = new PDFTextStripper().getText(pdDocument);
pdDocument.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return contentString;
}
3. Add Assertion
String orderID = "123/221/22";
String pdfContent = convertPDFtoString(new File(filePath));
assertThat("Order id is not correct or not exist",
pdfContent,
containsString(orderID));
Example
We have a scenario test to validate the grand total of the invoice document, to make sure the calculation is correct because there is a tax 11% that has to calculate.
the library will return data after converting it into a string
Implementing daily automation
As a reference, in our automation, we have 49 documents validated across all scenarios in our regression that run daily to make sure all the docs have a valid value, calculation, and other notable information.
Conclusion
There are always two sides to the coin, on the other hand, this approach still has a weakness which is we can only validate the content without ensuring the position. Thus, if there is the same value placed in the document in the wrong field/position, this case will not be captured as an issue.
Reference
https://www.javatpoint.com/pdfbox-tutorial
https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox
Credit
Blibli — B2G Squad — SDET Team
Faizatunnisa
Stella Suharli
Vincent Novanto
Abhipraya Radhityaqso
Dwiki Nugraha