Improving End-to-End Testing at Coursera using Puppeteer and Jest

Billy Kirk
Coursera Engineering
6 min readJun 26, 2019

By Ankit Ahuja, Billy Kirk, and Sumit Gogia.

Coursera is committed to delivering a high level of product quality for every learning experience. This includes individual courses, and courses included within Degree & MasterTrack™ programs.

In the past, we’ve used Selenium and Sauce Labs for running end-to-end tests. Stability issues with our setup and limited continuous integration (CI) tools around test result reporting led to poor adoption.

To make end-to-end (e2e) testing an essential part of our culture and help deliver a quality experience on the platform, we focused on a few key principles:

Developer Experience

  • Make end-to-end tests simple to run and debug
  • Push test failure notifications to engineers in code review
  • Leverage open source libraries

Actionability

  • Push test reports to engineers in code review
  • Clearly highlight failures and suite stability through notifications and reporting
  • Assign clear owners to groups of tests

Scalability

  • Run many tests quickly and reliably
  • Make it straightforward for new engineers to learn

Recently, we revamped our end-to-end testing system using Puppeteer and Jest. We added tools to improve the development workflow, and made sure test failures were actionable. The new system has improved the consistency of our tests, and helped us deploy with confidence faster.

Writing Tests with Puppeteer and Jest

To facilitate a smooth test writing experience for engineers, we wanted our tests to rely almost completely on Puppeteer and Jest APIs. Puppeteer is a Headless Chrome Node API, which controls a version of the Chrome browser. Some highlights of Puppeteer include:

  • Built on top of the Chrome DevTools Protocol
  • Event-driven architecture
  • High quality documentation & strong community of maintenance
  • Ability to test against different page viewports and network conditions

A significant tradeoff we considered during this process was cross-browser testing. We found that many of our bugs were non-browser specific. Given the other methods we currently use for catching browser-specific bugs (not writing browser-specific code, better linting, JavaScript and CSS polyfills, smoke-testing on other browsers we support), we felt comfortable with this decision. Additional considerations included estimating migration time and setting up training for engineers to adapt from Selenium.

We decided to use Jest for describing, running, and reporting, which was convenient for the following reasons:

  • Our unit tests were already written in Jest
  • Great community and high quality documentation
  • Configuration options for running tests in different environments

Additionally, we exposed a set of test utilities that abstract away complex work from our actual tests. These include authentication, emulating different devices and network conditions, interfacing with our A/B testing system, accessibility checks, and accessing user and course fixtures.

Our resulting test files are compact, and read similarly to our unit tests:

Coupled with Puppeteer’s debugging tips, this familiar syntax decreased the time for engineers to write a test. This meant that they spent more time thinking through edge cases in product flows.

Running Tests

We chose cloud tools that would allow us to run many tests in parallel, and integrated well with our existing build and deployment systems. We followed a few helpful examples and commentary when setting up our Puppeteer and Jest integration using AWS Lambda.

Our cloud test runner addressed two primary use cases:

  1. Run all tests against the deployed versions of front-end applications in production.
  2. Run app-specific tests for each pull request and reject if tests fail.

We built a local command-line interface (CLI) using Commander to simplify local test development and debugging. The CLI encapsulates configuration options for both Jest and Puppeteer through one interface. Some example commands include:

// Run a folder or file path against a specific environmentyarn integrate --env { prod (default) | staging | local } -f <fileName>// Show the browser during runtime, delay page actions for visibility, and display the chrome dev tools pane alongside the pageyarn integrate --show --slow 500 --showDevtools

In contrast, our homegrown setup required engineers to run a standalone server alongside the local development environment to run tests. This led to inconsistent results when running a test in the cloud. Failure debugging was often inconclusive and non-deterministic, and we frequently mitigated issues by adding retries and increasing timeouts.

Timely and Relevant Notifications

For each pull request submitted for review, our CI process generates a staging environment based on the changes. The end-to-end test suite runs against the apps that changed. A notification posts to our code review tool (Phabricator) when the test runs are complete.

Example of a pull request getting rejected if tests fail.

Additionally, we set up test group configurations, which define tests corresponding to a given suite. Based on the group configurations, the framework can ignore certain tests, report separate results for different teams, and notify group-specific channels. Furthermore, test results are still visible in an aggregated summary view. These features have enabled us to add new tests without causing noise within our production suite.

Facilitating Debugging With Clear Reports and Tracing

We also wanted to clearly report results to make it easy for engineers to identify, diagnose, and resolve test failures. All end-to-end test jobs generate a report, which details the results of all the tests run for a given job. The top of the report highlights failed tests, and each test block contains the results of each assertion. We expose a stack trace for failed assertions, along with a command to run the test locally against the given version.

Example test report.

When building features, engineers frequently use Chrome DevTools to capture a timeline trace of a page. We leveraged the timeline viewer to capture a snapshot of each test failure to give engineers a head start when debugging a test failure.

Addressing Flakiness by Monitoring Tests

Test flakiness remained a persistent issue. Common causes of flakiness included test concurrency, setup/cleanup of state, and API timeouts. As a baseline approach to combating flaky tests, the new framework runs each test three times. Passing status is calculated as a summary of all the retries.

We needed a way to accurately summarize test runs to get an idea of end-to-end test health. To that end, we built a data visualization that illustrates test results over a 30-day period. The visualization displays the passing health of each test in total as a ratio of successful runs for that test over total runs. A user can drill down into individual tests to view summarized results on a per-assertion basis.

Example from our production e2e health dashboard.

We notify relevant Slack channels immediately when tests fail, so test and app owners can quickly respond. We also set up a weekly summary to increase visibility and bring gaps to light on a regular basis.

Evaluation

Proper metrics are paramount when evaluating the efficacy of a new system. From experience, we’ve found that good metrics are needed for indicating success and driving execution. In some cases, such as our site-wide availability push, the uptime metric accomplished both. With e2e testing, our primary end goal was to capture more critical bugs before they hit production. With this in mind, we wanted to choose metrics that were easily split and assigned to owners.

We are monitoring useful metrics for e2e testing, which include: application weekly pass rate, count of high priority bugs filed, critical bug capture rate, application e2e test coverage, and number of critical product flows tested. Additional next steps for this framework include setting up access to dynamic test data, running tests against back end changes, and implementing visual diffing.

With the advent of many new features augmenting the Coursera experience, this e2e testing framework became a team priority. Writing a full production suite of tests using this new framework has added an additional layer of stability to our CI process. Furthermore, it has helped codify product flows critical to Coursera’s success.

--

--