Virtually Possible (a tale of iterative creation and remote teamwork)

Published in

BACIC

10 min readOct 5, 2021

Our team has gone through a lot of learning over the last year working remotely — learning how to communicate best, retain strong internal involvement, and tackle technical challenges.

Despite the situation, we have made fantastic progress in many areas and solved many complex problems. This article will tell you a story about one such problem that I hope will add value to your team in this remote world.

To preface, our team was building software to serve paralegals. Their work was made faster and easier through our tools. An incredibly impactful feature is the ability to generate documents that merge essential client information into precoded templates.

There’s a lot of data and logic that can be coded into document templates and many workflows that can trigger the generation of documents. Long story short — this feature poses a very crucial problem to our team:

“How do we make sure documents are created correctly?!”

After all, we are creating legal documents here. The stakes are high, and errors can result in catastrophic problems. Our users and their clients depend on us!

Our solution needed to reliably verify the key workflows are producing what we expect — from login to logout. It had to be automated, run quickly, provide easy-to-understand actionable feedback to our QA and document coding teams and be scalable.

What was our solution, you might ask? Well, you see, it goes like this…

<pushes in glasses to deliver lengthy explanation>

Determining how we were going to tackle this, we began by breaking the problem down into smaller parts:

How do we replicate the paralegal's steps to assemble their data before creating a document?
How do we capture a generated document?
How do we compare the generated document with an expected result?
How do we communicate the results?
How do we enable our team to take action on the results?
How do we run this process repeatedly?

Phew! That’s a lot. Let’s get into it…

How do we replicate our user’s steps?

Our First challenge was to effectively mimic what our users were likely to do while using our application.

We quickly learned the vastly different world that paralegals work in. Moving like nimble ninjas, they quickly grasp workflows and perform their tasks with tactical precision.

I am still always in awe at the speed and accuracy with which they moved between views, often awaiting with their cursors at the spot for their next click while the browser caught up with them!

Watching a paralegal at work (our point of view).

It seemed natural to respond to their vigour by selecting a technology that could, in spirit, perform at a comparable level. After some hearty research and experimentation, our QA team landed on Robot Framework.

Robot Framework is a technology meant for automated testing but, more importantly, has a bent towards its code being English-readable. This is not to make coding easier necessarily but to help remove the communication barriers between the folks coding the tests and the folks defining what test to write.

In this example, you can quickly tell what each step is doing. This makes writing tests more descriptive and self-documenting. Our QA team eagerly shared this idea around, and I dare say we were all quite titillated with Robot Framework’s potential!

…ok, so now that we’ve got the tech picked out, how do we figure out what tests to create? With their extensive experience using our app, the doc coding team was very generous in helping us with this (probably due to the previously mentioned titillation).

Over some back and forth, we picked out a series of critical tests and what information should be contained in these tests’ setups. Doing this back and forth gave rise to our Golden Data Set. We use this golden data set to seed environments before running tests, making it worth its weight in gold (if the data had any significant weight).

How do we capture a generated document?

With tests running and documents generating, we next needed a way to grab copies automatically. These documents are stored in AWS S3 once generated. I could tell you what the result was, but going through each failure is far more entertaining:

Attempt #1: Request the document by name from the bucket after the test has run.

This solution works the first time you run the test, but each subsequent test rerun pulls every document with the same name! We could version the documents in S3 and pull the most recent one, but we’ll try a different route first.

Attempt #2: Assign a unique random number to the test file.

We had to change some of the test code to intake the random name as a variable. Afterward, we were able to get the uniquely named file from S3. Our new problem was grabbing multiple files.

Attempt #3: Create a loop to grab all the files after the tests were complete.

After a refactor to create an array of random file names, we gathered all the generated documents and placed them into a temporary directory. Wait for a second! What did we name these documents initially?

Attempt #4: Change the random name back to the original file name.

We created a second array with the original file names to reassign to our temporary directory. A bit of a pain now that we have to maintain two lists, though. Such is life sometimes, I suppose.

Attempt #5: Simply trigger an in-app feature to download the file using Robot Framework.

Well… we didn’t consider this at first, but this is way easier! No after-script was required once the tests were done running. It also covers testing our app’s download feature.

I guess we don’t need the previous code anymore!

Sometimes, this happens when coding, but we got the best solution (thanks to our QA team).

How do we compare the generated document with an expected result?

Super! We got all the documents made during the tests, now what?

Thanks to our document coding team, we have copies of the documents with the expected results. What we generated ought to match the desired results. How can we automate this?

Fortunately, we’ve been down this road before. Using a technology called “pandoc,” we could generate markdown versions of the .docx files and leverage git to automatically make the comparisons for us. Adjusting a bit of open-source code, we came up with this git hook script:

If there is a difference between the expected markdown file and the generated one, this script will run and allow our team to spot it quickly (it’s like always knowing where Waldo is on every page!).

We indeed went through our share of learning, getting this part of the project to work. For example, using on Mac vs. Windows caused some chin-scratching due to some lower-level OS issues.

Also, we had to develop a way to increase our iteration speed in building this tool by reducing the test suit to the critical tests (waiting around for 10 minutes each time sucked the fun out of it).

Finally, understanding our script's pass and failure modes went a long way. We surprised ourselves at the number of times a comparison failed due to the generated date being taken into account. This situation forced us to limit the variability of results as a trade-off to get more reliable tests.

Overall this comparison tool rocks!

How do we communicate the results?

Ok, we got tests and result comparison system running. Let’s now tackle how we handle communication.

Lying out the puzzle, we have a few key pieces — the first being Robot Framework’s reporting tool. Whenever tests ran, a few artifacts were created (log.html, output.xml, and report.html).

Together, these automatically generate an interactive report that our team can use to identify our app issues quickly.

The second piece is our Slack workspace. We wanted a way to send notifications to everyone who needed to know the status of our tests. With many potential options, we opted to create a dedicated channel with a Slack webhook to post incoming messages to it.

A bit of redaction. We don’t want to tell all of our secrets ;)

Crafting the message (especially what we would see on failure) was a fascinating exercise. We ran several iterations trying to have git summarize the differences, but we would get far too much information or far too little in any case.

In the end, we opted to have a branch created automatically on failure, which showed the differences in our repository’s UI.

Oh yeah! That side-by-side makes the work glide!

The last piece was deploying a site where our team can securely view the results anywhere, at any time. To do this, we used AWS S3 to set up a static website along with AWS Cloudfront to deploy it.

Once this reporting site was working, we wanted to add a security layer to prevent unauthorized sleuthing quickly. The result was a simple, easy-to-use site for our team.

How do we enable our team to take action on the results?

Sometimes (often) tests dig up an issue. Our notifications are an excellent first step in directing us to the problems, but why not add some automation to help?

Since we were using git to determine if a deviation has occurred, we were also able to use Git to create a branch for updating the expected results:

It certainly took a few tries to get this part right, but some magic bash script did the trick:

Yes, we wrote it, but don’t ask us how it works.

Awesome! We have a working solution and a way to handle failures. However, a failed test is always an attention grabber!

How do we run this process repeatedly?

At this time, we’ve managed to create a script that runs successfully and creates notifications on Slack. We needed to figure out how to run this automatically on a schedule (This is where bitbucket pipelines come into play).

There are several choices for tools similar to bitbucket pipelines (like Circle CI, Travis CI, etc.…). We had already been using Bitbucket’s solution for several things for our team, so this was a natural choice.

Our first challenge was creating a docker image with Robot Framework built-in (among other tools). After a very iterative struggle, we came up with an image built off of Python 3.7:

Probably wise to tidy up at some point, but it works!

We liked our docker image because it allowed us to easily add in more dependencies as we matured our solution (particularly the AWS CLI and Pandoc).

Getting the correct drivers and libraries for Robot Framework was tricky, and it took several tries to get it right.

Using this, we built up our script to run periodically, allowing us to apply some neat aggregation:

We went back and forth with different parts of our team to get feedback, and while we have a wish list, we are reasonably satisfied with what we have created!

Problems left to solve

Always having more to add, here is our team’s current wish list:

Pipeline optimization and test parallelization
Adding more test cases
Consensus on handling branches generated by diff errors
Make it more widely available to run on other development environments.
Applying to SVGs (for testing app-generated images. oh la la!)
Applying to PDFs (for testing app-generated PDFs)

Our biggest learnings

I’d say the essential part of this journey is what we learned along the way. Here are the highlights:

Motivation and Discipline. We found that a great way to get through some challenging issues was to book working sessions with our team working remotely. Even if only one person was coding for a part of the session, it was a great way to stay connected and motivated. It was also a great way to keep focused on the end goal #Focused-creatives.
Tool research. Our initial discovery of what to use brought out a zoo of options (Jenkins, Selenium, etc.) that we had to sort through. We approached this decision by comparing and scoring each to how well they answered each of our initial questions. After lots of reading and searching, we were able to come to a decision #Ambitious-learners.
Identifying checkpoints and audiences. We were fortunate that we had several opportunities to show off our stuff as we built it. Our scrum master started a show-and-tell session that was invaluable for discussing new ideas! #Diligent-builders.
Being highly iterative. We were thrilled that we could produce a proof of concept that was also easy to continue building on. We found it way better in the long run when we didn’t over-plan things and started playing with tools.

As they say, “Build the skateboard before building the Tesla.”

We hope you found this useful and entertaining, but do send them our way if you have any thoughts. Our team enjoyed building this tool, and it has served our testing purposes well, allowing us to deliver a more reliable product. Thanks for reading!