That Software Won’t Check Itself… Or Will It?

Automating QA for over 9,000 Visual Assets with Image Processing

Published in

Powtoon

8 min readJun 25, 2018

Powtoon is a platform that allows you to easily create short videos and presentations. With out roots in animation, we have a massive library of visual assets. That also means we have a massive QA requirement. Everything has to work, look right in every instance, allow for locking and watermarking of assets based on user context, and more. This is a huge drain on human resources, which are always in high demand.

Powtoon is a lean company, and where we can find a technological fix to improve our work and our product, we take it — even if we have to build it ourselves. This is an exploration of the steps we took to create our own quality assurance automation regimen to test Powtoon’s massive library of visual assets with image processing.

Work with other innovators. Check out opportunities at Powtoon.

QA Requirements at Powtoon

The way we look means a lot to us. That means we want to test every visual asset in our product before it goes live. Without paying attention to both Powtoon’s UI in general, and the assets in the studio specifically, this testing would be worthless. This means thousands of manual tests, for a huge amount of content (with thousands of opportunities for human error). Automating this process would help us immensely, but we had trouble finding an existing solution that would test our product’s UI as well as our visual aspects.

Powtoon has a lot of content in its product. From individual characters, to props and other objects, backgrounds and fully-designed scenes, we host over 12,000 visual assets. For any given release, we need to assure the quality of up to 9,000 of those assets.

In addition to that, depending on the context of the user, these assets may behave in different ways. The free version of Powtoon involves a watermark and “lock icons” on specific assets. The assets themselves come in a variety of sizes, which the user can then change. They can be flipped, swapped, and they have a 360° range of rotation.

During our QA process, a QA engineer would look at the final Powtoons produced by the suite and see whether the free versions are getting the promised watermarks and locks. The engineer would have to go over all the assets (including new ones), to verify their look and behavior. Since the assets and logo can be placed in different locations, and depend on user context, no accessible automation tool could handle the job. We considered it a task only humans could perform reliably.

Our QA automation team, however, intended to challenge this assumption, and create our own tool that would validate the presence of the assets and the logo/watermark, regardless of location, orientation, or size scaling.

Choosing Your Model

Image analysis technique plays a major role in our case, since we needed not only to validate the assets’ initial parameters, but also their rotation, location and size in some cases, and ignore those aspects in other cases.

To begin with, we tried several different methods to compare between the original svg of an object and the object in the studio: namely, by its color, shape, and pixels.

Template Matching

Since we use vector graphics in the studio, we have a very clear image during the entire work process. So we allowed ourselves to start with an abstract match filter. For that case, OpenCV is a recommended library for image processing that interfaces with Python, C++ and Java.

The simple approach included the following steps:

Loop over the input image at multiple scales (i.e. make the input image progressively smaller and smaller).
Apply template matching using cv2.matchTemplate and keep track of the match with the largest correlation coefficient (along with the (x, y)-coordinates of the region with the largest correlation coefficient). You can choose your preferred method from the OpenCV documentation.
After looping over all scales, compare the largest correlation coefficient with a given threshold. The result in this case should be a binary “yes” or “no.”

We tried comparing a set of image templates with their original to find a match. However, this method works best to check an object you already know exists on the screen, and you only need to check its location. For the problem Powtoon faces, the proposed method has two major drawbacks.

First, the threshold mentioned above in step 3 seems to change from asset to asset. We were unable to select a single threshold that would reliably provide us with our yes/no binary outcome.

Naturally, size and color normalization is assumed for assets. This assumption doesn’t play a role when one seeks the location of a template in an image (knowing it exists), because it only needs to select the largest correlation (in other words, the size and color shouldn’t play a role here).

In our problem however, “non-existence” and “existence” of different templates and target images is scored so differently, that we couldn’t find a normalization that worked well for a range of different assets and conditions. Context counts, and the context of our users, the way they use our product, and the circumstances of their use are just too diverse to accommodate this simple approach.

Second, this approach couldn’t work for rotation and mirroring, as it relies on the scaled image “matched” exactly by its filter in step 2 above.

So, we couldn’t rely on this approach to properly check all the different Powtoon assets (and how they might be used). Further, this approach couldn’t deal with the complexity of visual assets that can rotate, resize, recolor, and more.

A different approach, which proved to be very useful for us, was keypoint matching.

Keypoint Matching

Figure B: Green dots represent keypoints on a cat

Keypoints (Figure B) are regions that remain static under transformations. With vector graphics, which remain clear on multiple scales, finding keypoints by algorithm is stupid simple. That’s why we chose keypoint matching for our QA automation. No matter the context, assets on stage in the Powtoon studio are easily identified by these keypoints (rather than by matching to a pre-existing asset template).

There’s a variety of keypoint-matching algorithms to choose from. KAZE, SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features. Inspired by SIFT), ORB, and others.

For any object in a training image, one can extract points for a feature description (i.e., the collection of keypoints the algorithm identifies) of this object. With this description, one can identify the object in a different scene (or a test image), regardless of multiple other objects, change of scale, noise, etc. SIFT, for example, collects keypoints, and rejects those that are suspected to have low contrast.

What’s important here is that the relative positions between those features, or (key) points, in the original scene shouldn’t change from one image to another. So while it would be nearly impossible to match to the original template of your image on a ‘noisy’ stage or in a different position, the relative keypoints in the same image will remain clear no matter the rotation, scale, or color of our visual asset.

Figure A — with 2 templates of a cat — is a great illustration the value of keypoint matching. The cat’s ear is a feature that doesn’t change between the different transformations that the cat image can undergo. In any configuration, the tip of the ear and its relative position to other keypoints remains constant (allowing easy identification).

We developed the Powtoon automation framework in-house. It is based on Selenium, to imitate human interactions with the website. One of the most important components of the framework is our computer vision component. It operates by applying the keypoint matching approach.

Keypoint Matching in Action: SIFT

1. Scale Space Min-Max Detection

The template is scanned (in x,y space), and its representation in several scaling levels is processed (Gaussian Pyramid). The local maxima are selected as potential keypoints. These locations are selected both over space and scaling levels.

2. Keypoint Refinement

Once potential keypoint locations are found, they are refined to get more accurate results. Taylor series expansion is used on scale space to get more accurate location of maxima, and a keypoint is declared valid if the intensity of the value passes some threshold. The thresholds and settings may be defined and calibrated for each application.

3. Orientation Assignment

The surrounding pixels are taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram covering 360° is created. The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. Keypoints are then created with same location and scale, but different directions.

4. Keypoint Descriptor

A 16x16 neighborhood around the keypoint is taken. It is represented as a vector to form a keypoint descriptor.

5. Keypoint Matching

The keypoints of the target and the template images are scanned to see whether they are close and the images match, or not. Again, a threshold regarding the number of matching keypoints and the match intensity will be involved to determine the final match.

Conclusion

Powtoon is an awesome company that isn’t afraid to explore the unending world of applied algorithms. And you can do this too. You can follow our footsteps to find an element in a noisy environment by identifying the keypoints of that element. But even if you aren’t in the business of QAing visual assets, anyone can take this step-by-step approach to improve their own processes, freeing up real-live human beings for the most business-critical work.

Yes, that software can check itself… it just takes a little work to set it up!

Are YOU ready to innovate? Click here.