Our Journey to In-House Visual Testing

Published in

MyHeritage Engineering

8 min readFeb 2, 2024

Introduction

In the rapidly evolving landscape of web development and design, maintaining visual integrity across different browsers and devices has become quite a challenge. Imagine navigating to your favorite online website, ready to make a purchase, only to find that the checkout button has mysteriously disappeared, yet somehow, it’s still clickable if you know where to look. This isn’t just a hypothetical scenario but a real issue we encountered that undermined the user experience. Such hidden discrepancies, often overlooked by traditional testing technologies, can cost companies both in user dissatisfaction and in tangible financial losses.

In this blog post, we’ll dive into the journey of our team at MyHeritage as we confronted and overcame the complexities of visual discrepancies with our own, built in-house, visual testing framework.

Our quest for a solution began with evaluating the leading available commercial tools in the visual testing ecosystem. However, we soon faced significant challenges: poor handling of dynamic pages, high costs, and a rough integration process due to dependency clashes and other compatibility issues with our existing testing framework. These challenges led us to develop an in-house framework. We expected it to be much more adaptable to our needs and cost-effective solution. And while a framework developed in-house would lack the fancy features offered by the commercial tools, its eventual highly positive ROI was undeniable.

Framework Overview

Our visual testing framework is an integrated solution that enhances our existing Selenium automation tests that run on top of Watir and Cucumber. The foundation of the system we created is similar to what commercial visual testing tools do: the employment of baseline and test screenshots for visual comparisons. Baseline screenshots serve as the reference point against which we compare the actual screenshots taken during test runs. This comparison is key to identifying any visual discrepancies that might have occurred since the baseline screenshot was set.
The screenshot comparison framework we’ve selected is designed to compare each pixel in an image. This thoroughness, while beneficial, can sometimes lead to false positives, such as minor, insignificant pixel changes that aren’t actual bugs. To address this, we’ve introduced a feature allowing test developers to set a pixel tolerance percentage. This adjustment helps filter out these minor discrepancies, ensuring that only meaningful issues are flagged.
Result Page Design

We’ve developed a detailed result webpage that displays all the relevant test information in one convenient location. This includes the scenario name, viewport size, browser, and any elements excluded from the test. Most importantly, it features a comparison slider that highlights the differences between the baseline and test screenshots, offering a clear, visual representation of any discrepancies directly on the page.
In addition to showcasing discrepancies, our result page offers a convenient feature for updating baselines. If a test fails due to a legitimate change in the UI, framework users can easily set a new baseline directly from the test screenshot with a single click. This process simplifies the task of baseline maintenance, ensuring that future tests are always compared against the most up-to-date references. An example of a result page of a visual test can be seen below.

Technical Details

Integration and Adaptability of the Framework

Our visual testing framework main component is a standalone Java service, which is able to receive and process screenshots from any testing framework. This design ensures that it’s not limited to our in-house systems, but can be adapted and utilized by any setup that runs automation tests. In our testing framework, we added a new Cucumber step definition which is responsible for capturing screenshots and sending them to the Java service via an API. These screenshots serve as new baselines or test screenshots for comparison against existing baselines. Below is an example of the Cucumber step for sending a screenshot of the MyHeritage home page for comparison against existing baseline.

When I take a screenshot for testing on the MyheritageHomePage

2. Leveraging Open Source Libraries

Our approach is rooted in leveraging open-source technologies. We incorporated three key libraries to create our solution:

Watir Screenshot Stitch: This library extends the capabilities of Watir by allowing it to capture full-page screenshots, beyond the visible browser viewport. It works by taking a screenshot of the visible viewport, scrolling down to the next part and taking another screenshot. It continues to do so until the end of the webpage. At the end of the process, all screenshots are “stitched” to a single image representing the entire web page. It’s key to ensuring that we have a complete view of the webpage.
Java Library for Screenshot Comparison: At the heart of visual testing is the need to catch any visual discrepancies. We utilize a Java-based library specifically designed to compare images with precision, highlighting any differences detected.
JS Library for Visual Comparison: Understanding the visual differences is as crucial as detecting them. To facilitate this, we integrated this library that showcases screenshot comparisons on a result webpage through an interactive slider. This visual aid allows testers to easily spot and assess the discrepancies detected.

3. Data Management and Result Presentation

In addition to capturing and comparing screenshots, our framework is designed to store test results together with relevant metadata in a database. This allows us to create comprehensive result pages that present all the information our framework users need, which in turn enables quick decision-making and efficient issue resolution. By integrating this data management feature, we ensure that framework users have a clear, accessible view of their visual testing results whenever they need it.

Notable Findings

The system proved itself a few days after its launch when it discovered the first visual bug. It turned up on the page where the user chooses which type of two-factor authentication method to set up on his account. The text in a number of the components was broken, which led to a few other elements to change their location. Those kinds of changes were missed by functional testing, but were caught by visual testing.

Example of of two factor authentication method bug

2. A major finding of our visual testing framework was that the “Buy now” button completely disappeared from the visible web page. In spite of that, it was still clickable by the Selenium engine, since it uses element IDs and does not rely on visibility of the element. Therefore, the functional test did not fail, even though the website’s real users were not able to click on the button and complete a purchase.

This visual discrepancy is presented on the result page in 2 ways. First, as can be seen above, using a comparison slider. The second way is by showing the test screenshot with red rectangles that mark areas on the page that are different from the baseline screenshot.

Result page — comparison difference painted in red, excluded areas painted in green.

Challenges

While our visual testing framework marks a significant leap forward in quality assurance, it’s not without its challenges and limitations.

Handling Dynamic Pages

One of the most significant challenges we face is dealing with dynamic pages. Web applications nowadays are more dynamic than ever, with content that changes based on user interactions or time. This dynamic nature requires a flexible and adaptive approach to visual testing. To provide a solution for this, we equipped our framework with two features:

A test developer can exclude an element in the page from being tested. This is useful when most of the page is static, but there are a few dynamic elements (animations, user name, etc.) that have to be excluded from the comparison. The example below demonstrates how we excluded a randomly generated username from the test. Without this exclusion, elements such as the user strip or any other area displaying the current username would cause the visual test to fail. Below is an image from the result page of a test that failed because specific elements were not excluded. All elements displaying a random, logged-in username are marked in red. Additionally, other elements impacted by the name’s size are also highlighted.

Image showing result of comparison: All comparison differences are highlighted red

In the following screenshot, excluded elements are marked in green. These elements are ignored during the comparison between the baseline and test screenshots. However, the elements that were not excluded are still marked in red and will cause the test to fail.

Image showing result of comparison: excluded elements are highlighted in green

A tester can choose only a specific element within a page to be tested while excluding all the rest. This can be used when we want to test only one, specific element on the page. It also can be used on a page with a lot of dynamic elements, which saves us from sending a large number of excluded elements to the comparison.

“Sticky” elements

Some elements like headers or banners stay fixed on the screen when scrolling up or down the webpage. However, since we use the Watir screenshot stitch library to take a full-page screenshot, these elements might show up several times in different locations in the final image. To resolve this, we added an option to disable the “stickiness” of these elements, making the “sticky” elements stay in their original place in the final screenshot. Below are two screenshots of the same page. On the first screenshot, the sticky header appears twice on the page. However, when activating the anti-stickiness mechanism on this element, as shown on the second screenshot, the header appears only once at the top of the page.

Conclusion

In summary, this novel framework we have developed provided our test developers a fast and easy way to add visual testing during their routine workflow. These visual tests will dramatically increase our tests coverage while investing only minor effort. Therefore, it’s clear that our in-house visual testing framework is a critical component in our ongoing commitment to delivering visual integrity and user-friendly web experiences. By addressing the unseen, yet impactful visual bugs, we are not only enhancing user satisfaction, but also safeguarding MyHeritage’s brand reputation and financial health.