How Trustpilot Tests and Monitors 200 Microservices

Published in

Runscope

6 min readMar 7, 2017

Trustpilot’s journey with Runscope began back in 2015. They started off with our free trial plan, and have since grown to almost 5 million test runs a month. They were kind enough to chat with us about their experience, and how they transitioned from a single monolithic application, to a more efficient microservice architecture with over 200 microservices and hundreds of endpoints.

Here’s Dumitru Zavrotschi sharing their story:

First thing first, tell us a little bit about yourself.

My name is Dumitru Zavrotschi, and I’m a Test Automation Engineer at Trustpilot. I have been working here for over 2 years, and I’ve worked in QA for over 7 years, mainly for companies in the telecom business.

What does Trustpilot do?

Trustpilot is an online review community, and our goal is to help people buy with more confidence by creating a community based on trust and transparency. Customers can write reviews on our platform about any company, service, or product. We also help businesses interact with our community by providing them with tools to build their reputation, drive conversion, and improve their services.

How big is the company? How many people are in the development team?

We are about 500 people located all over the world. Our biggest office is in Copenhagen, Denmark with around 300 people, and we have other offices in New York, Denver, London, Melbourne, and Berlin.

The development team is made up of about 50 people, divided into 7 teams. The QA team consists of 3 people right now, and we work across all development teams. We test and monitor every new application or microservice, from start to finish, and make sure that they match our quality standards and work well with other services before deploying it to production.

Tell us a little bit about the transition from a monolithic application to microservices. What was that like?

A few years ago, when the company was just getting started, all we had was a single repository with a monolithic application. As the team started growing, and we needed to add more features and more services to our product, we realized that we had to break down our application into smaller pieces to allow for faster iteration.

The transition happened gradually, and now the biggest parts of the monolithic application are decoupled. The process is still ongoing, but now our development teams are able to start working on new features and shipping them much faster.

“We were able to increase the number of deploys from twice a week to 70+ (40+ staging, 30+ production) times in a working day.”

But we did run into a side-effect during that transition that we weren’t expecting. With the increase in changes and deployments we were doing, things started to break. For example, we would run into issues because of misconfigured routes, or a new service wouldn’t be able to handle the amount of traffic and then other services would fall over.

These errors started happening, and sometimes we would struggle to find out what was wrong. We needed more visibility into our APIs and how they were communicating with each other.

Worst of all were when these errors happened, and we wouldn’t notice them. Sometimes we would just find out that a microservice was broken for the whole weekend when we got back to the office on Monday. We realized we needed a solution to monitor our APIs, and also a better way to catch things before they were pushed to production.

What made you choose Runscope as your API monitoring solution?

We tested a few different products but decided to go with Runscope for a few reasons. First, it was very easy to setup and start using it. Second, it had most of the features that we wanted, the biggest one being the ability to write custom scripts. And lastly, it had integrations with other tools that we were using, such as Slack, PagerDuty, and Ghost Inspector.

We started by testing five critical features in our application, and then expanded from there. We were also able to leverage the API to automate and customize a few tasks to fit into our workflow.

Nowadays, we rely heavily on shared environments, script libraries, and subtest steps to avoid as much duplication as possible.

How do you use Runscope at Trustpilot?

Each team has their own separate Runscope bucket with tests for their microservices and APIs, and we have a few others for 3rd-party services and internal tools. We have over 20 buckets and more than 200 tests in use.

Our tests range from one or two requests that check a single endpoint, to full integration tests. Those include setup and teardown of test accounts, 3rd-party application tests, and many others with 10+ steps that will simulate a user’s actions throughout our website, and make sure that everything is working correctly.

For example, one test will set up a test account, create a review, publish the review, get the published review, and delete the test account. Another example is a test that integrates with our email platform, to retrieve an automated email link and test that it is valid.

We also have a Runscope bucket that contains a few shared tests that everyone can use. For example, we have a test to automatically refresh and update new access tokens, and other teams can use that as a subtest step.

What is your CI/CD process?

We use Octopus, TravisCI, and Appveyor for our CD/CI process. Once something is committed to our master branch, we do an auto-deploy to our staging environment. Our API tests run on a scheduled basis for both our staging and production environments, so no bugs can leak from staging to production.

Even with the continuous monitoring on staging, in some cases, we have tests integrated into our Octopus deployments. They make sure that all features are working before it gets deployed to production.

What teams interact with Runscope?

All the development teams in the organization use Runscope, which includes managers, QA, developers, and DevOps.

Since every team has their own bucket, we also have some screens in the office displaying a bucket’s dashboard, so anyone can easily see the health of their microservices.

We have also built an internal “Alerts Dashboard”, which displays alerts coming from different sources like New Relic, Octopus, AWS, and also Runscope. So every team has all alerts in one place, which is very convenient for them. For Runscope, we only display tests that have failed 3 consecutive times, to avoid displaying random minor issues.

What are your plans for improving QA at Trustpilot in the future?

I’m the person that’s most familiar with Runscope in our team, so currently, whenever we create a new microservice, I’ll create a new set of tests for it, or a template for teams to start working with. But the goal is to empower more people in our organization to use Runscope and create tests themselves. We’re having a few internal presentations in the next weeks for technical and non-technical usage, so our support team, managers, and designers also feel comfortable using it.

We’re also planning to expand our set of tests to include stress, performance, and security testing, to make sure our systems are operating correctly outside standard capacity, and are scalable to support heavy usage.

Conclusion

**Vikram Mahishi (left) and Dumitru Zavrotschi (right) from the Trustpilot team**

A big thanks to Dumitru and Vikram from the Trustpilot team for taking the time to share their story with us. If you want to learn more about Trustpilot, you can check their website here.

We’re looking to tell more stories about how people are using Runscope to make their jobs easier. If you’re interested in sharing your story, please reach out to us.

Originally published at blog.runscope.com.