Modernizing Automation For An Evolving Complex Legacy System

Yusuf@writing
Kongsberg Digital
Published in
16 min readDec 6, 2019

The word “Automation” has gained plenty of traction in recent times with the advent of AI and Machine Learning etc. In fact, predecessor to the popularity of this word is “Test Automation” in the realm of building software. With the advanced and complex software systems Kongsberg builds, test automation has inevitably been part of its deliverables for a long time.

In this article we shall discuss:

  • The components of automation ecosystem for one of our critical applications in Kongsberg Digital (KDI)
  • A brief overview of in-house built Performance Lab for the application
  • The core automation issues at scale and their corresponding solutions
  • Scalability challenges we had to sort out
  • Strategic transition to the cloud with continued support for the on-premise flavour and waking up to the DevOps reality
  • Future technological sophistications ideated for our automation ecosystem

Alongside this, we will see approaches and solutions we are exercising to overcome certain challenges by repackaging the automation systems’ implementations.

Application Under Test (AUT)

The application and systems cater to the upstream oil & gas industry to oversee and optimise the drilling activity and manage other subsequent operations. The solution integrates all the information from the different service providers involved in oil well construction activities into a common platform and monitors the KPIs that are important to each of these activities. It addresses drill safety, speed, operational and cost efficiency among others. Predominantly it deals with real-time data from thousands of sensors and brings the data to the frontend for visualization through nice widgets.

The Beginning of the Official Automation Journey

In KDI, the Test Automation practices debuted around the year 2010. The AUT in question had been in existence much long before the automation practices came to being in KDI. Initially the practices were just consisting of test framework and test methods leveraging the test framework. It was based on Microsoft CodedUI (shortly, CUI) — as the applications too were being built on .NET.

Automation Technology Stack

Listing below the development languages, SDKs, tools, frameworks, database technologies and ALM platforms employed for the automation implementation.

  • Development/Scripting Languages : C#, PowerShell
  • Automation GUI Driver Engines/Tools/SDKs : CodedUI, WinAppDriver, MSTest runner, White Framework
  • Infrastructure Virtualization : VmWare vCenter and vSphere SDK
  • Web Ecosystem : HTML, XML, CSS, ASP.NET, SQL Server, SQL Server Integration Services (SSIS)
  • ALM and Cloud : Azure DevOps, git, TFVC
Automation Technology Stack — Implementation Overview

Core Components of the Automation Ecosystem

Below listed are the core components of the automation ecosystem of the AUT.

  • Automation Core Framework
  • Test Suites
  • Test Suite Driver
  • ALM Integration Service
  • Email Reporting Service
  • Failure Analytics Solution
  • Aggregated Test Run Dashboard
  • Performance Lab
  • Performance Lab Dashboard
  • VMWare vSphere Test Labs with Win Server 2008, 2012 and 2016 VMs.

Let us go through only a couple of these components below. I’ll cover some other components in separate articles in future.

Performance Lab

The AUT inherently deals with a large number of oil wells, wellbores, and huge volume of data from sensors. Hence, it’s quite necessary to have the AUT tested under defined load of wells, wellbores and multiple users. As a result, we decided to setup Performance Lab with dedicated infrastructure.

Performance Lab Dashboard — Run Summary

We consider this lab to be a great stock in store in the list of the core constituents of our automation ecosystem — colloquially known as “CI Lab for Performance” and technically it’s meant for customized “Continuous Testing”(CT) pipeline to a certain degree. What is most notable thing is that such type of CT pipeline began to be in existence even at a time KDI itself was not prepared enough for a DevOps journey that we see today. The CT pipeline with a daily frequency was essentially designed for Performance Smoke, but it it is implicitly used to test the Build Verification aspect as well — every day against new builds.

The Performance Smoke proved to be useful in identifying functional and performance issues at the earliest to the extent of the scope of the tests being run every day. The extensive set of performance counters used in the Smoke Suite helps trace back the root of the performance bottlenecks and issues. We have built the suite to run for 14 hours overall, and 8 hours of durability and stress testing run after applying the fullest load.

Performance Lab Dashboard — Build and Run Details with Charts for Some Key Performance Metrics

Typically many systems are required to apply single load factor — for example, user load or data load. However, the application under test in question requires us to apply multi-factor load i.e., multiple oil wells, users and data volume. The daily load test run generates dozens of GB of data that are stored in SQL Server database. The SQL Server Integration Services(SSIS) package combined with ASP.NET application retrieves, filters and renders it on the front end.

Performance Lab Dashboard — Summary Table With Comparison of Performance Metrics for Three Different Runs

The Performance Test Lab setup consists of “Performance Lab Dashboard” — the front end ASP.NET web application built for serving the performance details of targeted backend processes and services comparing the previous run details of the respective processes. The web app is offering rich list of charts and graphs to represent performance stats. The differences in performance metrics are benchmarked and coloured accordingly i.e. degradations in red colour and normal or improved metrics in green colour.

All that said, the very tenets of Performance Lab requires good amount of maintenance just like its functional counterpart. We have had spent a decent amount of efforts on many occasions towards maintenance of performance tests, SSIS packages, SQL queries, Performance Lab Dashboard, Build Sync utility and upgrading lab infrastructure.

Aggregated Test Run Dashboard

This is a custom-built ASP.NET web application built by automation team to collect, collate and render detailed results for any automation runs. Basically, we carry out various types of runs. The permutations and combinations are very large and associated with multiple customers, multiple release versions, multiple application brandings or flavours under a specific release version and multiple OS platforms. This is where the stakeholders of the product fall back on the Aggregated Test Run Dashboard to refer any ongoing or historical run results. The good part about this application is that only occasionally we came across some kind of maintenance challenges.

Glimpse of Test Run Dashboard

The Continuous Improvement Drive

For very long time, the CodedUI based automated suites have been heavily used for Smoke and Regression runs. Also, daily CT Lab Runs for Performance and monthly Microsoft patches too were using the automation. However, lately it needed major systemic improvements with the ever-growing demands on the application side in response to exponential growth of the AUT and the shrinking release cycles. Though, we could never afford to take up the improvements on priority for long time.

Back then some months ago, we reached peak pressure point and had to do something on the long-standing serious pain points — lengthy scripting efforts even for simple cases, scary maintenance outlook, unbearably long running Regression Suite overlapping days and nights with more to catch up on the coverage front.

This time we decided to assess our whole automation ecosystem to study the root causes in detail and group them accordingly. We could spot serious challenges in many areas of our legacy automation implementation and practices, as grouped here :

  1. UI Map Maintenance and Management
  2. Test Framework Efficiency
  3. Test Script Development & Maintenance
  4. Reducing Test Run Duration
  5. Test Run Analysis & Reporting
  6. Scalability & Robustness to Respond to Parallel Support for Cloud and On-premise applications.

Let’s see in the subsequent sections in details about the problems, root causes and the corrective measures we are undertaking with regards to the identified set of challenges, as listed above.

UI Map Management

This section discusses the issues related to UI Map Maintenance and Management.

The UI Map Repository in CodedUI was always needed to be improved to reduce the test maintenance efforts in a big time. We came up with an innovative, technically highly challenging and automation engineer friendly solution called — Element Interface Wrapper(EIW).

CodedUI way of managing UI Map is ok, only if we have single parent UI elements. However, we have had hundreds of elements with as many as five parents in our AUT. On many occasions, we had to move out elements from one parent to another and add new parents altogether. Overall it produced deeply nested structures of element descriptions resulting in highly complex and tedious job of maintenance.

The idea behind EIW was to have too lesser maintenance efforts and improve readability of the description as well. Another separate article is needed to explain the EIW solution in its entirety. The below snap shows the differences between EIW and CodedUI way of describing a single UI element. 169 lines of code generated with CodedUI, while it’s meagre 26 lines of code with EIW(actually it would be only 9 lines, if we excluded class and namespace statements) with exceptionally better readability in a single view of the page.

CodedUI In-built UI Map Feature vs. Element Interface Wrapper

Nothing could be more when it comes to innovation in automation domain. To be able to get the best out of EIW and CodedUI’s default UI element reader, called Test Builder, a conversion tool was being built — Element Interface Wrapper Converter. Basically, this tool converts the descriptions of thousands of CodedUI elements — stored in deeply nested XML structure — to the EIW format, saving months of human efforts.

Automation With Two Silos — CodedUI + WinAppDriver

This section discusses issues related to Test Framework Efficiency, Test Script Development & Maintenance, and Reducing Test Run Duration.

In-house Framework Improvements

The CodedUI based in-house test framework served better on the automation peripherals and test infrastructure side — email report, result storage in DB, test run portal etc. But, it did not meet the required degree of efficiency and robustness factors on the accounts of test development, execution and CUI reporting. For example, tests were written directly with the tool-specific APIs and functionality without solid wrappers for those APIs to be able to reduce maintenance pains. This posed greater challenges when we wanted to update scripts in fewer places so other relevant tests would receive the updates. Those tests are still existent and they are tightly coupled to the CodedUI APIs and libraries.

CodedUI Stopped Ticking the Clock

We relied heavily upon CodedUI and found that itself was inefficient on many counts to meet our fast growing testing needs. We got down to building a small wrapper on top of the existing CUI framework. It was helpful to a smaller extent for stabilizing the scripts from failures and scripts maintenance. But this was never enough.

Soon we needed another tool to be added in our automation kitty. But how about the existing tests hard-wired with CodedUI ? It’s highly difficult to reengineer the entire framework to support multiple tools in a unified and seamless fashion and rewrite the tests accordingly at this juncture, after having a test suite of 400+ old tests. Nevertheless, after thinking over various options, in addition to CodedUI, we added support for WinAppDriver — the replacement tool for CodedUI in future, recommended by Microsoft.

Ok, what’s the deal ? : Old hard-wired tests are staying in CodedUI, while new tests are being written in WinAppDriver. The best we could do then was that we devised a strategy to build a separate and independent action and assertion libraries for WinAppDriver. As a result, two test frameworks came into existence. This approach helped bring down the known challenges only to a smaller extent that was again never enough to the proportion of the challenges.

It didn’t take us long. In a span of few weeks, we decided not to continue with the approach of having two independent test frameworks for two different tools. Why ? The reason was very much compelling — in future, we might need to add another tool and then it would lead to multiple silos of libraries and wrappers for each tool within our automation system. Moreover, this design approach introduces below par test development practices. We’ll see in the next section how we finally solved this big challenge.

Paradigm Shift — From Standalone Tool Silos To Tool Agnostic Unified Test Framework

This section discusses issues related to Test Framework Efficiency, Test Script Development & Maintenance, and Reducing Test Run Duration.

Even after building the extra abstraction layer over and above our legacy test framework, we could not meet the defined efficiency and robustness keeping the future development and challenges into consideration. Hence, we set out to build a highly flexible and robust framework encompassing many good features for the Future-Gen automation use cases. We were confident about the proposed Unified Test Framework(UTF) to solve many long-standing automation pains we were grappling with.

With UTF, the amount of coding the team use to write has come down dramatically — at least , a whopping 50 % reduction.

Unified Test Framework(UTF) guarantees seamless and easy integration of any number of .NET based tools. Basically, the Unified Test Framework includes the features of three major pieces as of today : Element Interface Wrapper, Coded UI and WinAppDriver. In addition, it has highly configurable and extensible design.

After the implementation of the UTF, the amount of coding the team use to write has come down dramatically — at least, a whopping 50 % reduction — without sacrificing coverage even a bit. Productivity has improved a lot. Apparently, it has helped accelerate the speed of the development, stabilization of tests and definitely reduced the test run duration in all comparisons. Moreover, it’s obvious that the UTF helps write test scripts in the same way regardless of the underlying supported tools. The test suites developed with this framework are easy to read and maintain, as the framework APIs are uniform.

Currently, the testware is primarily supporting Windows Forms, WPF, UWP and other Windows Desktop technologies. Test automation on Windows Desktop and Web applications are giving different experiences and challenges and Win Desktop applications are more challenging to automate than Web applications for many reasons. The UTF is capable of accommodating any number of tools that support .NET. Yes, we can seamlessly and easily integrate Ranorex, Test Studio, Selenium, Appium, RPA — UiPath etc with zero impact to the tests and very little training. No tool or vendor lock-in.

Failure Analytics

This section is partly addressing the issues related to Test Run Analysis & Reporting.

The last frontier to realize the true automation value is analysis of failures to decide further course of actions — be it triaging and logging the bugs, effort estimates for fixing the bugs, assessing the impact caused by the bugs on the release schedule and reviewing the overall project readiness for release etc. The true impact from automation value chain lies in the fact that how quickly and efficiently the failure analysis is carried out consistently across the formats of automation.

With Failure Analytics, the drop in the effort is tremendous — just 4 hours.

We had to spend 3 man days before we built our Failure Analytics solution for a run with failure of 100 tests. More runs with more failures mean huge efforts. Soon after adding the solution into our automation process flow, the actual analysis effort required is a mere 4 hours. This topic itself is worth for another article, because this is the most decisive phase of the Automation Development Life Cycle.

Transition to Cloud and DevOps

This section is partly addressing issues related to Scalability & Robustness to Respond to Parallel Support for Cloud and On-premise applications.

In the recent past, the AUT was undergoing major transformations to build new business models to meet customers’ demands and expectations. These models were built around Azure cloud, SaaS, PaaS and DevOps. As such, the monolith applications are undergoing transformations to be broken down into microservices and deployed on the Azure Service Fabric cluster. The major goal behind repurposing the applications is to offer customers the features precisely they want and make them pay for what they are using. Whereas, certain other projects mandate weekly twice releases.

On the automation front, we had our task cut out with the need to make plenty of automation design decisions with trade-offs in all ways. The major challenge was the on-premise or monolith version of services would continue to exist and we need to test those out in the traditional way.

A major task to accomplish : The deployment model for on-premise as well as cloud based services is significantly different. However, we can’t go ahead with a separate codebase for cloud services, because it is suboptimal and it results in a huge maintenance overhead. Therefore, a new framework subsystem was built by extending the previous framework for only cloud-ready application components. This framework extension is addressing both on-premise as well as cloud services testing.

The foundational and transformational automation solutions were successfully built with Azure Release Pipelines, tested and demoed to the customers. This has paved the way for easily extending automation coverage, which is really top priority at the moment and pretty much in progress. I will be glad to come up with a new article on this topic sharing more insights in future.

Future Technological Sophistications

So far I have run through one of the AUT in O&G, automation ecosystem developed for the AUT, various automation challenges linked to the AUT and the solutions we have built and are still building as a part of the continuous improvement drive. In this section, we’ll see the technological advancements we have planned to add in future for our automation ecosystem.

  1. ALM Integration into Automation Runs with Azure SDK : Simply put, this proposed feature enables us to retrieve full details of bug work items against failing test cases from Azure DevOps. We should be able to dynamically collect more insights about environments and programmatically manage Azure pipelines. Also, we can do much more analytics with this solution.
  2. Test Reports With Extent Library : Since the inception of automation system, we have been using the default Test Result feature of CodedUI. Apparently, structure and format of Test Reports in the current automation system have scope for improvements. To be able to move away from less efficient CUI test result format (.trx) files, we have identified two options to leverage : The first one being — custom-built sophisticated test report and the other one is — Extent Report library. We have opted to go ahead with a PoC on the second option for now.
  3. TROS (Test Run Operating System) : This is a configurable, multi-feature run-time governance tool managing various run conditions and events essentially associated with UI automation runs.
    It encompasses many features to manage test run sessions end-to-end, spanning very longer durations — as long as 40 hours. It continuously monitors, controls, take corrective actions, indicating important incidences at run-time, collect and correlate important data and stats, generate test reports and fires off emails with test results and stats. It has test re-run capability at failures, run-time error recovery capabilities, dependency management among tests and test suites to be managed through various rules, ALM integration features, failure analytics and pattern identification are a few features among many others. I’ll come up with a separate article on this tool in future.
  4. Real-time Test Reporting Dashboard For Azure Pipelines : Azure DevOps is lacking real-time test result update feature against Release Pipeline runs. We have successfully done PoC to update users with real-time test reporting. We will invest more efforts to build the full-scale solution down the line based on our road map and priorities.
  5. Computer Vision Algorithms For Object Recognition : We are employing Unified Test Framework(UTF) for automation. Under the hood, CUI and WinAppDriver are the tools driving the UI. However, it’s not that we are propelling automation without any glitches as for object identification is concerned even in UTF. There are cases of us being helpless in this regards and that puts the critical automation coverage at risk, sheerly due to glitches in locating UI objects.
    Taking into consideration our disappointing experience on resolving object recognition issues with many other techniques, we are always looking for better options, when the underlying tools are failing to score in regard to this. I have been exploring the opportunities of leveraging AI in test automation for sometime now. There are genuine use cases and challenges that necessitates employing Computer Vision(CV) technology to salvage the coverage. We have just started using CV, wherever applicable. There are handful of useful CV based APIs I have built for resolving image comparison challenges in automation. I’ll share and describe more details in a separate article about employing AI in test automation.

Conclusion

Here in KDI, the automation evolutions and the modernization efforts are going on by leaps and bounds for the long serving and highly complex application systems pertaining to domains like O&G, Drilling and Maritime etc. Since the AUT in question is the backbone of the oil & gas domain and carries high stake business value in the organization, automation too has been a high focus area. Over the past two years or so, the automation practices have been strategically well aligned, remodelled and integrated into KDI’s Automation Center Of Excellence model in pursuit of moving away from AUT-centred practices and approaches.

In this article, we have walked along the details of the core components of automation for the AUT. We have seen in detail the benefits of Performance Lab established in the very early days of automation setup. We have discussed the pain points with our legacy implementations spanning across the areas of UI Map, test framework, test development and failure analysis & reporting. We detailed the solution approaches for each of the challenges discussed. Finally, we peeked into the mandatory transition of automation practices in response to Cloud and DevOps driven application development and deployment scenarios.

I am glad to underlie the fact that we have set out our modernization drive in the right direction and are rapidly progressing in view of the growing multi-dimensional testing complexities, automation technical challenges and huge permutations and combinations of testing requirements of the AUT.

Quality is not an act, it is a habit.” — Aristotle.

Great Automation! Creative And Curious Teams! Better Human Engineering Productivity!

--

--