How One Script Can Replace Your Whole Department.

Kieran Maher
May 18, 2019 · 6 min read
Photo by Alex Kotliarskyi on Unsplash

When we think automation, we tend to think more of physical work being replaced and automated by machines. However, a lot of office jobs have components and aspects that it would benefit being automated, because it would free workers up for more high level and time sensitive tasks.

In this article I will show an example of how a small script can automate aspects of finance, and the process of generating invoices to send to clients, with no need to have administrator privileges to access a system back-end or mess around with an API, just simply using the web interface you or I would look at and copy the information out of, by using web scraping.

Before we begin, all code can be found here on my Github.

To Start — The Sample Data.

In order to develop the JavaScript script that will perform the automation, I first had to generate some sample data that could be extracted. I didn’t want a simple HTML page because it would make the extraction too easy, so a small Vue project was created and populated with some sample tables.

The generated interface

I used 5 sample ‘companies’, that each have a name, an amount due, a VAT rate, and a due date.

The Extraction — A Puppeteer.

Now that the sample data is constructed, I began developing the script to pull the data directly out of the browser. To do this, I have used a library called Puppeteer that is made by the Google Chrome team.

Before I could extract anything though, I first had to determine what elements within the website’s code contain the data. To find these elements, a simple query selector was used that finds the table nested within the relatively complex Vue structure. This method shows that regardless of how complex the website holding the data seems, it can only be holding it in HTML code (whether static of generated) and the data you’re after can be targeted within that HTML.

Identifying the data in Chrome

With the relevant data found within the HTML structure, and because all data is contained within the one table, a map function can be used to extract the information into a tidy array.

Extracting the data using Puppeteer

In the above image, we see Puppeteer being used to pull all elements out within the table that we have identified, and storing them into an array. I have made this ever so slightly more complicated by storing everything in the one array, so all records are together. This could be present a problem, so …

A Problem — The Array Splitter.

The data that we can see in the screenshot above is an example of how the data would appear in the real world — as a large list not neatly split up for you into each record. But if we look at the array, we can see the data is structured.

Every fifth entry represents a new entry in the table. With the first being the name, the second being the amount, third the date due, fourth the VAT rate and the fifth marking the start of the next entry.

Now that we know there is actually an order to the array, we can split it up since we know at each index what result to expect. I have split it like so:

With each index within the array denoting what value it is. I have then added all values to their own arrays. So for example for each client, their amount and date and VAT are found at the same index in each corresponding array, making recall easier. There are many ways and more efficient ways to do this operation, but I have found this to be an effective method. Of course in a real world example, much more error catching should be done!

Building It — The PDF Generator.

In this stage, I now have the value of each client stored in individual arrays representing the data they hold. I can therefore loop through each client, and begin generating an invoice based on their data.

To do this I have used the pdfkit library to quickly generate PDFs. Each client has their own PDF, as they would in the real world, which can be named anything but I have opted for the client’s name, and that PDF has the information that is relevant to them. PDFKit makes building PDFs very easy, and images can be embedded as well as fonts and stylings used.

The PDF generation process for each client — this happens in a loop

As a note, the VAT percentage as it appears in the tables has a percent sign after. This was deliberate to show that the data may not always appear as you want it to. In an ideal world, the VAT would just be a number but instead has extra characters attached so it cannot be manipulated immediately and directly.

As part of the PDF generation, the total amount is calculated by adding the VAT on to the amount due, using the below function. This is quick and simple and just uses regular expressions to strip all non alphanumeric values out of the VAT number.

VAT calculation for the final PDF

Testing — The Output PDF.

With the PDFs now being generated and stored in a directory, we can look at a sample PDF that has been output.

Output PDF viewed in Chrome

It might not be the prettiest invoice, but it could be with some simple styling and branding added to it. As we can see it is built using the information that was available to us in the tables at the very beginning, and is now ready to be sent out to a client!

Conclusion

It is easy to see how this process can make so many jobs so much easier; if we imagine an employee who gathers information from many systems and collates them into one consolidated place of information, then this process of web scraping allows them to do it automatically and tremendously quicker, and frees the employee up for much more complex tasks.

We can imagine a situation in which more complex algorithms are utilised, such as Bayesian Classification used to change the wording of a generated output, or a Neural Network used to inform and convey the information in better and previously unconsidered ways.

The complete sample code for the script can be found here on my Github.

The Startup

Get smarter at building your thing. Join The Startup’s +800K followers.

Kieran Maher

Written by

Software Developer & Analyst. General technology enthusiast.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +800K followers.

Kieran Maher

Written by

Software Developer & Analyst. General technology enthusiast.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +800K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store