Breaking the Monolith

Modular redesign of Agoda.com

This article tells the story about the ongoing process of a modular redesign of the Agoda website (at the moment when the article was written). The first part contains information about the development process in Agoda and the second part is a high-level overview of the modular architecture.

I am Vlad, an Engineering Technical Lead at Agoda’s Frontend department, primarily working on modular redesign, including traffic migration from Windows to Linux and the codebase migration from ASP.NET Framework to .NET Core.

Let’s dive in!

How We Work

Agoda's front-facing web applications use the .NET stack and other modern frontend technologies like React, Redux, Webpack, and GraphQL.

The server-side is mainly concerned on APIs and returning minimal page-landing HTML. Client-side deals with the UI of the website (desktop and mobile) and WebViews for native apps.

Web Projects

For the Desktop website and a small number of WebView pages, all frontend teams work together in a single repository that has one codebase of an ASP.NET Framework monolith application. We move fast using GitHub Flow and every day developers push dozens of pull requests to main branch. We release a new version of this monolith application several times a day.

The Mobile version of Agoda's website is a Single-Page Application, that we shortly call MSPA. We develop MSPA in a separate repository. Majority of MSPA requests handled by a standalone API, called Gateway.

Quality Standards

When developer finished with changes, new release candidate needs to pass a set of tests (we will talk more about these tests later). Our CI/CD uses a “canary” deployment process: we deploy to one cluster first, and if all good, we deploy to all data centers.

All the new business logic (including bug-fixing and refactoring) are covered by experimentation, known as A/B testing. We monitor and measure everything: booking rate, exceptions, latency, hits and many more things.

We need to wait data to compare 2 variants and if B results better than A (B win), we can Take code changes. Work on a particular Story is completed only when the code of the A variant removed from codebase.

If you want to know more about A/B experimentation in Agoda, please, read an article written by ex-Agoda Software Engineer and my good friend —
Max “Bear” Mahasak Pijittum, who shared his thoughts on this topic:
How to Fail Like a Boss at Agoda.

Bridge
Now you know how Agoda’s Frontend Department works; let us discuss how we measure success and what is driving us to make such a significant change.

How Do We Measure Success?

Agoda is a data-driven company where the single source of truth is data. For example, each B variant of experimentation should prove, that it brings better metrics than version A. To measure our development process we are using 3 main metrics: devfeedback, leadtime, and CI success rate.

DevFeedback

Imagine this workflow: a developer did code changes and wants to be sure, that application is still stable to be deployed to production. The time to answer this question is called Development Feedback. Here’s what happens during this time: build server-side, build client-side, run unit-tests and jest tests, run Feature Tests.

Feature Tests — Selenium tests on mocked data, no data from external systems such as Database or APIs.

Leadtime

What is Leadtime? Leadtime is the time from the moment when Pull Request starts it's trip in CI until it merged to the main branch of GitHub repo. In CI we additionally run Integration Tests.

Integration Tests — Selenium tests on real data. We are checking application behavior with actual data and all other system dependencies.

Formula of Success

The CI Success Rate value shows the percent of PRs, that passes all CI stages and end up as ready to deploy release candidate. We are controlling CI Success Rate. When it lower than it should be — we improving our tests.

DevFeedback tests are cheap and fast, and we control how fast they are, but a disadvantage, they can not cover everything. Then we add more Integration tests. They are slow and complex but protect us from real troubles. If somehow a bug leaks to production we cover business logic with tests, and again, much better if we can write only unit or feature tests for it. We always should be in balance.

Bridge
Website modularization aim to solve existed technical problems and significantly improve our current metrics. Let’s discuss current architecture in terms of problems and solutions.

Key Principles of New Design

8 years ago, when a small number of teams working on Agoda website, there was no issue with a monolith application. In the last 4 years, the Agoda website has grown rapidly in terms of services and features: Hotels, Homes, Flights, Packages. Today Agoda website is in the hands of 20 different frontend teams.

Problem: Monolith Website Application

When so many teams work together on different things at the same place — each action may impact the other. An especially critical area is a Request Pipeline, where any minor change may significantly impact the performance of every request.

Solution: Domain Isolation

Isolate domains. One team handles one “product domain” and controls its codebase. It gives the team freedom to choose internal design and code conducts.

Screenshot from original movie “Braveheart” (1995) by Paramount Pictures, 20th Century Fox

Problem: Cross-repository Development

The structure of repositories impacts the development process and slow it down. A new feature for the single page for Desktop and Mobile requires development in 2 different repos, and your “time to market” is doubled.

Solution: Full-Stack Repository

Full-stack repository segregation, where in each repository team has a codebase of server-side for pages, API and client-side for all Desktop, Mobile and WebViews they own. Domain repository should also have all required tests: unit-tests, feature tests, integration tests for domain and cross-domain use-cases.

Problem: Test all, Deploy all

Sharing one CI/CD process across 20 teams can also be a bottleneck. For example, when we have CI/CD environment issue — the whole website deployment is stuck until we fix it.

In monolith application we run tests of the whole application even for a minor change of code. When we have a “flaky” test it slows down the deployment process, until the owner team takes care to stabilize it.

Solution: End-to-end Ownership

Each team should be responsible of development and deployment of its Product using standalone CI/CD with full test coverage, including tests across systems.

Bridge
As you can see, all technical proposals lead to “Monolith Break” and recombination of repositories. In the last chapter, we will talk about our strategy and tactic to approach Modular Architecture.

Modular Architecture

Migration to a new architecture takes time. From the business side we have 2 requirements: start using Linux servers as soon as possible, keep adding new features to a website. Therefore, migration to a new architecture is not a revolution, but an evolution.

WebGate

First system component, that should be mentioned before we start talking about the website itself is a WebGate proxy. Proxy is placed in front of all frontend systems. Originally, WebGate was built for common needs, but become a cornerstone of a new design. In terms of website modularization, WebGate doing 2 important things:

  • Route traffic to particular downstream. Website application is one of the downstream. We can add new applications and manage traffic on the WebGate level using A/B experimentation: variant A sends a request to Windows servers and old website; variant B send a request to Linux servers and new website.
  • Request enrichment. WebGate appends to a request useful information by adding Headers. For example, device detection logic happens at the WebGate level and every downstream read the result from the Header. And we do not need implementation of the same logic in every application.

WebGate centralization is great. It solves a lot of bugs of distributed logic and improves visibility. WebGate is a critical component, so we can’t lose it for a single ms. Thanks to our super high Uptime standards WebGate always in an amazing state.

Traffic Migration

We start from the Proof of Concept (POC) phase, where we can build a new application as soon as possible. Using the same Website repository, we migrate one page (City Page) from ASP.NET Framework to .NET Core.

Move fast, fail fast.

Once we have a first working page, we immediately sent the first user traffic on it. In September 2020 we had only 1% of the website’s traffic on Linux servers. At the end of the year, we have 40% and keep working on it.

More and more teams migrate the code to .NET Core infrastructure building more and more standalone Modules. During a migration process we refactored a lot of code and applied various .NET Core optimizations, that you can read in this related article written by Ilya Nemtsev. As of this writing, we have 12 Modules. Talking about Modules, now’s a good time to go into details.

Module

Domain Module, Product Module, or simply a Module — is an independent unit of distributed website. The module is WebApp to build one website page with backend-for-frontend API of it. You can start one Module or combine several Modules and run them together.

What is a Module and its design principle:

  • Module belongs one team-owner
  • Module have one or several Controllers for page rendering or API calls
  • Module control external dependencies
  • Module control appsettings
  • Module don’t have dependency on any another Module
  • Module have dependency on Bootstrap NuGet package

What is Bootstrap nuget package? Bootstrap library makes your Module become a part of one “distributed” website. Let’s talk about what is Bootstrap NuGet package, and why it is so important.

Bootstrap nuget package

Bootstrap is a NuGet package, that provides “core” functionality to every Module. Here is the a list of Bootstrap responsibilities:

  • Host. Bootstrap responsible for a website startup, managing connections to external systems, and register your application in a website distributed network.
  • Middleware. Bootstrap adds a request pipeline Middleware for Page rendering and API calls. The latency of the pipeline is 13ms (p99) for pages and 4ms (p99) for API calls. This time is used to execute a common website logic and prepare Context-specific data for Controllers.
  • Context-specific data. Bootstrap middleware provides to Module Controllers an object with context-related information. An object contains information about the request, user, page, pricing, and more.
  • Core Services. Bootstrap speed up the development process by providing a set of common services such as Experiment Manager, Service Discovery, Data Access Provider, Logs, Measurement, CMS, and many others.
  • Client-side Commons. Provide Razor Layout to build Desktop/Tablet, Mobile and WebViews with same Header and Footer. Dynamic dependency on common CDN bundle of React components for Header and Footer.

For the client-side, we use React, and this is the last piece in the modular architecture: micro-frontend approach using npm package and bundle.

Micro-Frontend

Fast catchup: Every Domain repository has a standalone client-side project with teams creating web pages using React and Redux. Pages are different, but still, we want to make sure, that the user experience is the same across all of them. Common part of all pages are Header and Footer. To provide this common functionality to all Modules we built the npm package.

When we debated on how to deliver client-side commons, we stopped on the micro-frontend approach:

  1. Header and Footer React components are developed in a separate repository, packed into npm package and published to npm-artifactory. Additionally, we also deployed them as a bundle to CDN.
  2. The product team uses the HeaderFooter npm package in their client-side project then develops website pages. Here the main trick. When they create a bundle for their client-side, they exclude this npm dependency out from the resulting bundle.
  3. Module uses Razor Layout from Bootstrap NuGet where we already inject a reference to a CDN bundle. We can dynamically change the version of bundle using Consul. In the browser, Page meet Header and Footer from the bundle and it works like a charm.

The Micro-frontend approach gives us incredible agility. When we develop a new version of npm we don not need to update and deploy each website application. We let teams working at their own pace. This important flexibility allows us to work on Header/Footer as an independent piece and deploy Header/Footer to all website Modules in one click. We still can follow our favorite practice: move fast, fail fast.

In conclusion

While this all for now, our journey is still not finished. At the moment of writing this article, we are still working on a Traffic Migration phase. I hope that the next time I can share the results of our whole redesign: where we were wrong, and where we made a right choice.

Thank you very much for reading.

Big thanks to Agodans, who helped review this article:
Max Panasenkov, Royee Goldberg, Niels Schroyen, Shaun Sit, Akshesh Doshi.

Join the team

careersatagoda.com

--

--

--

Learn about how products are developed at Agoda, what is being done under the hood, from engineering to design, to provide users a seamless experience at agoda.com.

Recommended from Medium

ARK Core: What’s Next?

Here’s What the Web Developers Would Love About the New Python 3.9

Add RabbitMQ and gocron to your DigitalOcean droplet [Part 3]

Test Driven Development in an Angular World — Part 2

Software engineering best practices for data scientists — Part I: Coding

9 Programming Jokes | Funny Side of Programmer's life (And Bonus Tip)

Coding Mistakes

How-to Easily Upgrade XAMPP on Windows

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vlad Batushkov

Vlad Batushkov

Engineering Manager @ Agoda. Neo4j Featured Community Member. Certified Neo4j Professional. Articles brewed on web, hops and indie rock’n’roll.

More from Medium

Code Scanning with SonarQube

Skipping selected test coverage warnings with undercover

Duck Typing Two (case study: bicycle tour)

Testing — Unit Testing