The purpose of CI/CD is to improve development process: to move faster and be more flexible. But what’s the point of moving quickly if you break things along the way? What if you cannot tell for sure if a feature works as expected once integrated into the master? Test automation is essential for any development process because, without robust and reliable automated tests, there is no CI/CD.
Here, at TrueCar, we’ve come a long way to finding the optimal test automation solution. This article is the first of the two blog posts describing our journey, the challenges we’ve faced, and the lessons we’ve learned. We built a full-scale CI/CD, where every code change goes through numerous automated validations and tests. This gives us confidence about the quality of the features that we ship to production.
A Little Bit of History
We’ve had automated testing for 10+ years at TrueCar. Some old-timers still remember the QTP and VB scripts from 2009, which were replaced by a Selenium and Python framework in 2013. Around 2014–2015 we moved to Robot Framework, which enabled us to create test scripts much faster using its rich library of keyword functions. In 2016, as the company started working on Capsela (the initiative of rebuilding our legacy platform in the cloud), the QA team took on building Otto, an extensive test framework. At all points, the test automation platform evolved together with the development platform, adapting to the ever-changing needs of the fast-paced environment.
Pre-Capsela Testing Process
Back in 2014 through early 2016, TrueCar was still running on the legacy development platform hosted in our data center. For the QA team, it meant we had to support more than ten services and client applications written in Java, Python or C#. In fact, everything was fragmented and alterations of any kind were difficult. Changing the color of a button across www.truecar.com and our partner sites required close collaboration of at least three teams. PMs had to create three change management (CM) requests. Three teams had to hold three planning meetings. Different developers had to check in the code to separate repositories, and different QA engineers had to individually test each change. Complicating things even further, web and mobile versions of our partner sites were two separate applications. Given these complexities, test automation was a major challenge.
Our release process involved three different pre-production environments:
- QA, which was our primary integration and test environment
- Staging, which reflected production, was used mainly for performance testing, but also functional regression before a release
- UAT(User Acceptance Test), which was prod-like as well, but used by our partners and stakeholders primarily for testing the cross-platform integrations and acceptance of the new features.
Our regular release cadence was once every two weeks. A release would start with a CM (Change Management) ticket that had all the planned feature tickets linked to it. Developers began working on the release candidate, and three days before the release day they had to cut a release candidate branch and pass it for testing. QA engineers had two full days to verify the changes in the shared QA environment.
That’s when all the “fun” started. The bugs were coming in (the bug fixes too), new builds had to be deployed and re-tested, and a single shared test environment was making it all very challenging.
With multiple apps releasing on the same schedule, we had to juggle the versions and make sure we were testing the exact combination that we needed to ship.
Some deployments could bring the environment down for up to 20 minutes, so the testing sessions were often interrupted.
We had a significant number of automated regression tests, but these interruptions made it almost impossible to get a clean run of the automated regression. After testing on QA, we deployed to our pre-prod environments and ran another round of regression tests. If everything looked good, we signed off on the release.
On the release day there was a CM coordination meeting to discuss the order of deploying the applications. Typically, backend services had to go first, followed by the client applications. Sometimes there were up to five or six apps in the queue.
Feature releases were tied to the code deploys, so if we got the order wrong (e.g., deployed FE components for the new feature that relied on the specific API endpoint that did not make it to production earlier), different parts of the sites could go down. We did use feature flags to control the state of the features, but that system required a code deploy too. If something was wrong, a hotfix was the only option.
This process left a lot of room for bugs to slip into production. The automated regression was essential to identify as many of them as possible on the test environments.
Our test automation framework was written in Python around the well-known Robot Framework. Robot Framework is an open-source framework designed for BDD/ATDD testing using keyword functions, which make test scripts readable as plain English text. These keywords were the top abstraction layer that we configured to call Selenium WebDriver for the UI tests and Appium for the native apps.
As for the backend service tests, we had Python scripts under that keyword abstraction.
To run the tests, we used Jenkins. Every team maintained their personal Jenkins dashboard with a set of automated regression jobs that they ran per their custom schedule.
Even though we had automated tests for all the application layers, we still had some holes in our coverage. The process left very little time to write new tests and maintain older tests. As the tech debt increased, we had to rely more and more on manual testing.
Thankfully, this is when the Capsela initiative started. The company was migrating to the new engineering platform with a monolithic architecture that would save us from dependency hell. Our new platform stack was Ruby on Rails for the backend and React for the frontend.
Otto, test automation framework for Capsela
With Capsela, we got a fresh start. We were not only modernizing our technology stack but also transforming our development and testing culture. To reflect that change, the QA team was renamed as Test Engineering. We wanted to move away from the vicious circle of half-manual, half-automated testing and get into the “automate everything” mindset.
Our primary goal was to build a new test automation platform for the new monolithic backend and frontend applications that would reduce our dependence on manual testing. At the same time, we had to continue supporting legacy applications and ensure ongoing maintenance releases.
This is when the Test Engineering team built Otto — a monolithic, single-repository automation framework that contained API, web, mobile web and data tests — designed as a universal testing solution for Capsela stack applications.
Otto is written in Ruby, just like our main backend application. We took our time to evaluate several popular Ruby frameworks (such as Cucumber and Capybara) as the candidates for the core test libraries.
Cucumber is similar to the Robot Framework and is another open source tool for a data-driven BDD/ATDD-style testing. The Gherkin syntax allows writing Cucumber test scripts in plain English using Given-When-Then format. The idea behind this approach is to make tests readable and easy to understand, even by non-technical team members, so that any stakeholder can access and review the actual test coverage and provide some feedback. Additionally, the tests serve as living documentation, where product owners write down user scenarios and test engineers “translate” them into the code.
Cucumber is a great framework, but we could not use it as a universal test runner for different types of tests. While BDD worked well for the UI end-to-end tests, it did not bring much value for the backend service and data tests. It was just an extra level of abstraction to maintain.
As for the Capybara, at that time, it was just not mature enough. Currently, it’s a tool loved by many, but unfortunately, we recall a lot of stability issues that turned us away from it.
Minitest is the default test library that comes with Ruby. It’s exceptionally lightweight and simple, and it seemed like a perfect choice for a start. With Minitest, every test and a test step is a plain Ruby function, which makes it easy to develop and maintain.
Watir is an open source library for automating web tests in Ruby. It provides simple Selenium-like syntax and support for different browser drivers. Configuring Watir to run as a headless browser was the first significant change to the automated testing process. UI tests are usually much slower than the unit tests, but we wanted to keep UI regression execution time under ten minutes. Selecting headless allowed us to get closer to that goal without changing the effectiveness of the tests.
As mentioned, we designed Otto as a meta-framework for data, API, and Web (mobile) tests to centralize the automated testing in a single repository and enable testing of the different application layers without duplicating the code.
The three individual modules are queries, services, and utilities:
- Queries module contains queries and query builders for different data sources: Postgres, Elasticsearch, legacy MsSQL, AWS S3, Redshift, and Salesforce. Query builders are utility functions that abstract the boilerplate code and simplify creating new queries. They handle all the logic of switching ES servers and indexes based on the application that we run the test for.
- Services module is for the backend service wrapper functions with HTTParty that encapsulate the requirements to call specific APIs. Every API is represented with a class that contains requests to all the endpoints of that API and a link to the Swagger documentation. From a code reusability perspective, it is beneficial to have those functions in separate modules. We can call them from the API tests directory to run the assertions against the response and from Web directory to find or create the data needed for specific test conditions. With a few customizations to the logger, all the HTTP requests performed by the tests are logged in a curl format that we can copy and paste into Postman to repeat test steps when debugging the failures.
- Utilities module contains core extensions, general utility tools, and helper methods, such as UrlBuilder and ApiCaller. The name UrlBuilder speaks for itself. This is the tool for building URLs, both legacy (data center) and new stack (AWS-hosted) applications. It also handles building the URLs for 800 partner sites, some of which are on the custom domains. The ApiCaller provides methods to call the APIs with custom test headers and provides an interface to override HTTP errors and handle specific response conditions. One example of that would be a retry that we added for 5XX response codes on a few internal APIs known to be unstable.
Core extensions to Watir and Selenium Webdriver allowed taking screenshots on errors. And a few other customizations to the Minitest allowed using its spec engine as a full-scale test runner not only for the backend but also for the UI tests. For example, we added tags support and started using them for organizing the tests into suites. It was great for splitting the “backend” and “frontend” regression into more specific and targeted suites based on the application components, such as “New cars,” “Used Cars,” “Registration,” etc. We have introduced a few custom variations to the default
repeat that served as a test stabilization tool by running a specific test in a loop as many times as needed; and
variant that provided support for data-driven tests.
But one of the other most important utilities for testing the legacy-to-Capsela migration was the deep-assert library that later transformed into a standalone Ruby gem. The library allowed us to verify that the new Capsela APIs are in parity with their legacy predecessors. We only needed to define the mapping object and pass the full or partial responses of the two APIs. The test would make HTTP requests to both endpoints, store the responses, compare them using the mapping rules, and output the differences. Running such tests in a loop against random input data allowed us to identify numerous edge conditions and bugs.
Otto’s infrastructure is quite complex. As we kept working on the platform, we added a few more bits to it:
- Otto Assist is a gem that contains all the scripts needed for running the tests in CI including parallelization for the test runner and aggregated reporting
- Otto App — a web application that serves as an aggregator and a presentation layer for test reporting results from CI
- Otto gem — a command line utility tool
Otto Assist is where we stored Docker images and CI configuration files, as well as the parallelization scripts. Otto’s parallelization service is quite simple. When a test job starts, it executes a dry run of the tests in a suite determined by tags in that job’s configuration. For example, if the tag parameter contains “@webAND@consumer_frontendAND@registration”, then the driver runs tests only for the registration component of the consumer-frontend application.
A dry run outputs the names of the test cases and their total number. Then Jenkins uses that information to trigger a new sub-job for each of these test cases, so that all the tests run concurrently.
As each test finishes, it stores the report in the job workspace. Once all the sub-jobs have finished, the reporter service combines all these separate reports into one and sends it to S3. The parent test job pulls that aggregated report, and that’s what we see when it completes.
Otto App is an aggregating service that provides metrics and dashboards to track the performance of individual tests and test suites. It reports the total number of tests in the predefined suites, test execution time, and pass/fail rate. By monitoring and analyzing historical test status data, we can identify the so-called flaky tests (tests that can pass or fail under the same conditions) or the flaws in test logic when they meet some unforeseen environment conditions (data issues, configuration changes, cold cache).
Otto gem is a command line utility designed to simplify the interactions with the main Otto library. It provides aliases for most popular ways of running the tests by tag and by application, a stability tool for re-running the tests in a loop to catch the edge conditions, and other small utilities like a list of all available test tags, test owners, etc.
With the Otto framework in place, we were able to dramatically transform the way we tested and deployed new code. Development of the new ephemeral Spacepod environments allowed more effective and isolated testing. We wanted to reduce the release cycle and deployment time, and now we had everything we needed: new platform, new environments, new test automation platform. The CI/CD initiative was starting to pick up speed.
While Otto was a huge improvement, the evolution of test engineering at TrueCar did not end there. In the second post of this two-part series, we will share how we built the testing process that allowed us to achieve full CI/CD. We will discuss the challenges we faced and share the current state of our platform.