Booking Tool: A reengineering journey from a legacy monolith to microservices
Author: Stanislas Nichini, Software Engineering Manager at Groupon
Booking Tool — a monolith application built in 2014
One of Groupon’s key priorities is to have a frictionless booking experience for our merchants and customers. Booking Tool plays an important role in this strategy by providing a free booking management application to our merchants in International and North America. But this tool was not only originally focused on our merchants but also on our customers and operational teams. Booking Tool originated from Groupon’s French engineering team in 2014. The application was initially built for the French and German local markets and especially for the Food and Drink industry. The tool was originally used by 100 merchants in France and Germany where 500 deal options were bookable on the platform.
By the end of 2014, merchants from France, Germany, Italy, Spain, The Netherlands, and the United Kingdom were using the Booking Tool. More than 15,000 bookings were made by customers and more than 470 merchants launched 2100 deal options. This was the beginning of the Booking Tool journey.
The expansion of the tool among all International markets continued until 2020. It was launched in test markets like South Africa, Hong Kong, Brazil, Singapore, Sweden, Malaysia, Switzerland, Argentina, Denmark, Portugal, and Norway.
More than 23 million bookings were made since the launch in 2014!
365k+ deal options across all markets
>60k merchants onboarded
As of today, the Booking Tool is available across all the International and North American local markets. In September 2019, the tool was launched in North America. More than 2500 merchants are now bookable!
Booking Tool, a multi-headed beast that needed to be reengineered
The monolith tool became an increasingly complex and sophisticated platform. But actually, the legacy tool had multiple heads:
- Merchant experience: Set availabilities, manage bookings, redemption, communication
- Customer experience: Book their experience via an option flow outside Groupon
- Operations and administration: Manage merchants setup, customers/merchants support, bulk editing
So why did we want to re-engineer and improve the platform? First of all, the tool lived outside the Groupon platform but also was:
- Hosted on another hosting provider
- Built on a different software stack
- Optional and outside of the Groupon purchase funnel
- Not integrated with mobile apps and desktop website
- Unscalable due to the database architecture bottleneck
- Unreliable as the monolith codebase was too large and not entirely testable
- Had a technical debt growing and needed to be paid back
- Old and the user interface didn’t age well
- Not well documented and the documentation became obsolete
- Expensive to maintain
How to transition from this legacy monolith architecture towards a microservices architecture? Before answering this question, the team came up with the following requirements:
- Business results-driven
- Additional product features
- Minimize the risk of transition between the services
- Fast iterations and cycles
- A seamless experience for end-users: merchants and customers
- Simplification of flow/business logic and role of each component
- Platformisation: part of the Groupon booking system
- Understandable: documented, peer-reviewed
The “Chicken Little” Incremental Migration
To tackle the transition from the monolith tool to several services, the team applied the incremental transition strategy called the “Chicken Little” approach. This approach comes from the analogy with the very cautious and conservative Walt Disney hero. The overall idea and methodology are described by Michael L. Brodie and Michael Stonebreaker back to the 90s in their book “Migrating Legacy Systems.” The migration consists of 11 steps that can be done in any order and parallelized and can be skipped.
The key is to perform each step “incrementally”:
- Incrementally analyze the legacy system
- Incrementally decompose the legacy system structure
- Incrementally design the target interface
- Incrementally design the target application
- Incrementally design the target database
- Incrementally install the target environment
- Incrementally create and install the necessary gateways
- Incrementally migrate the legacy database
- Incrementally migrate the legacy application
- Incrementally migrate the legacy interface
- Incrementally cutover to the target system
The benefits of Chicken Little reside in:
- Limiting the risk of the whole migration
- Early feedback on the progress and results
- Safer: No brutal transition
- Smaller and shorter cycle
There are risks attached to this transition:
- A complex and sophisticated system: A hybrid architecture with multiple overlapping components and services has increased risk
- Reverse engineering is still needed and the discovery/analysis phase cannot be neglected
- The cost of time and resources is hard to evaluate
The Booking Tool migration plan and cycles
For each cycle and step of the migration of the Booking Tool, the team adopted the “Chicken-Little” incremental approach.
to explain the value of each cycle, we also added 3 major rules:
- Why do we need it? Satisfy users and business objectives
- How will we do it? The most important objectives first
- What will we build? Decomposition into independent operational deliverable milestones
Satisfy users and business objectives
This rule defines “why” this transformation is needed.
The question is: What is the measurement of the success of this cycle?
Each deliverable needs to be measured against one (or several) metric(s) of success (KPI: Key Performance Indicator).
This can be a measure of :
- Increase of # bookings
- Increase of conversion
- Decrease of merchant attrition
- Increase of voucher redemption using the booking funnel
- Increase of repeat customer
- Decrease of merchant or customer support interaction
- Improve the performance of the systems: stability, speed
- Increase merchants onboarding capability
Most important objectives first
This rule defines “how” to do the transformation.
The question is: What are the key components that are indispensable and mandatory for the success of each cycle? What is the goal? The Product and Engineering groups work hand in hand with a common goal: To make our Groupon local merchants bookable. To ensure our success, the team identified the most important priorities:
- What are the pain points of our users and how to address them?
- What are the gaps and how to close them?
- How can we make the life of end-users easy?
- How can we connect our merchants and their customers?
- How to guarantee, facilitate, and simplify the booking process?
- How to facilitate the redemption process?
From an engineering perspective, we followed the same route as business and product. We aligned our goals and objectives to tackle the transformation.
Decomposition into independent operational deliverable milestones
This rule defines “what” to do during the transformation.
The question is: What are the deliverables of each phase and do they interact with the overall engineering picture? What is the purpose of each cycle? Each system or component needs to be delivered and launched at the end of each cycle. Smaller milestones can also be set to launch part of the components. I The sooner it’s deployed, tested, and used, the sooner you’ll receive feedback from the users. The sooner you’ll be able to correct, troubleshoot, and improve the system. What are the dependencies of this component? Is there any other migration depending on the success of the deliverable?
The transition plan started in May 2018 and is scheduled to be completed by the first half of 2021.
To begin, the Product and Engineering teams decided to build a new customer experience as part of the purchase funnel on Desktop, Touch, and Mobile, allowing a seamless experience for the customer. The customer will be able to see the availability of a deal, pick a date and a time, and book their experience as part of the transaction.
In June 2018, the first step of the migration began. This was our architecture before:
Booking Tool high-level architecture pre-migration
And our target:
High-level architecture post-migration
- In June 2018, to allow this change, the Booking team built a new API exposing availability for a deal and allowing this “Pre-purchase” or “Book’n’Buy” feature.
- At the end of 2018, the new feature was launched in Italy, the United Kingdom, and France. Customers were slowly redirected to the new booking flow and the legacy customer experience deprecated for specific deals.
Not all deals benefited from this change as some setups are complex and require, for example, the phone number or comment to complete the booking. Other deals are multi-session or travel oriented. Finally, some deals are bookable on multi-locations.
The MVP excluded these complex setups. The deal coverage was above the 60% mark.
- At the beginning of 2019, the team started to work on the new MVP of the Merchant Booking Tool. After 6 weeks of development, the team released 2 new sets of API for the customer and merchant experiences and a new frontend application based on top of the existing Booking Tool infrastructure. In parallel, the customer-oriented team increased the bookability of deals to more than 80%.
- In March 2019, the MVP launched for 4 pilot countries: Ireland, Australia, Italy, and the United Kingdom for the Food and Drink vertical. The results and feedback were very positive. The team was able to work an additional 2 months, adjusting the application, adding missing features, and learning from the user experience session with merchants.
- During the rest of 2019, the team worked on new features on both F&D and HBW versions of the new Booking Tool.
- In October 2019, the Booking Tool was set up, adapted, and launched to the North American market.
- November 2019 marked the engineering start of the BOTS API using the Java building block (J-Tier).
- In March 2020, we launched the FAQ/support page for our merchant on the Merchant Booking Tool ITA to production directly to the AWS cloud.
- In April 2020, it was the BOTS turn to be launched in Cloud and on-prem. The first exclusive feature was launched in BOTS: Support for Classes/Workshop merchant. From that day forward, the team exclusively launched new features on the BOTS API.
- August 2020 marked a simultaneous launch of the Google Calendar 2-way sync feature on both BOTS API and Merchant Booking Tool ITA. This is the key milestone for deprecating the old Legacy Booking Tool and capitalizing exclusively on the new merchant application stack.
- Since August 2020, the teams are focused on the development of the Booking Tool Self Service on top of the Merchant Booking Tool. The teams are also finalizing the transition of existing features of the old tool to the new one.
The Squad Ownership
To tackle this reengineering project we had, over the last 2 years, 6 major teams or “squads.” This article won’t go in-depth about the squad model but I wanted to highlight some of the key aspects of this ownership model. The squad is defined by:
- Owning a service or application
- Being the maintainer of this service
- Having end-to-end ownership of this application, from architecture to support
- Being collocated or at least in the same time zone
- Being a self-organizing and cross-functional team
- Having a maximum of 8 individuals
- Having 1 long term mission
We also had some special squads that helped things to go faster and bring the product to life: SWAT and Acceleration team. One other key essential aspect was the cross-collaboration between squads. Having multiple services and multiple teams, we had to find solutions to facilitate:
- Working on the same code base: Innersource model
- Reducing silos, simplifying collaboration
- Working on several features at the same time
- Parallelization / Swim lanes, minimizing the overlap and conflict
- Prioritization of dependencies between teams and internal services
What did we learn?
Throughout the last 30 months, the different teams, squads, and stakeholders learned many important things:
Fail Early and Fast to Learn Faster
Launching a new application from scratch and learning from its users is key but we also need to know early and rapidly if it’s a success or a failure. The entire research, design, architecture, development, deployment, and launch processes for the MVP of the new merchant application lasted 6 weeks. The first feedback from our merchants and sales representatives were walked through early. We were able to adapt to their ideas and we knew what needed to be changed. We ditched some undeveloped functionalities, re-evaluated some, and simplified others. We learned from different markets and business pitch sessions on site. We finally postponed launches and prioritized others using business KPIs (key performance indicators). We were able to learn from some mistakes and we were nimble enough to address our merchants’ needs.
Make it work, Make it right, Make it fast: Kent Beck
Related to the previous learning, our primary goal was to make our application work by:
- Launching the application to our merchants
- Launching new features to more merchants
- Operating on more markets
Since then we are still in the process of making it right:
- New application stack
- New platform
- Squad model
- Innersource model
The teams are starting to work on the last rule, making the application fast, more reliable, smart metrics, profiling, scaling, and alerting.
Squads framework: Spotify model
Along this reengineering journey, the team’s structure and framework also evolved. We are still in the process of the Agile transformation at Groupon.
Our Product and Engineering groups benefited from this transformation early. The teams operated at a fast pace, almost like a small startup within the organization. It allowed us to be very efficient and to remove our roadblocks quickly. There is still a lot of work to be done to transform the entire organization but this helped our project to be achieved step by step. Being aligned and autonomous were and still are two key values at Groupon.
This learning is still fresh but the Innersource model is supported by our engineering leaders and owners to become a standard at Groupon. The teams applied this model to our two new applications: Merchant Booking Tool and BOTS. Other teams are able to build the necessary features and contribute to the codebase. This change cannot happen overnight but it will surely grow in the future.
Business Operational vs Application vs Infrastructure metrics
These 3 metrics are very important and complement each other. We need to know when the servers are down, when the CPU load goes beyond the threshold, or when the database has a sudden spike of parallel threads. We also need to understand when we have errors, uncaught exceptions, API dependency failures, and so on. But we should not neglect the importance of business and operational metrics. The operational business metrics are useful to detect when something is wrong over time. It can be negative or positive. These metrics are key to understanding why a feature is successful, or why there is something wrong with the application. For example, you could suddenly have a drop of bookings compared to last week, a feature not used as frequently as last month, a refund rate going up by 30% for a given vertical or category.
Our journey is not over yet! Our teams are continuing the platformization project of the booking customer experience. We have also entered the deprecation phase of our legacy monolith:
- Moving our users to the new platform: merchants but also customers
- Ensuring that our dependencies are off the legacy platform
- Sunsetting the old system, VM, databases
We are targeting to remove the plug of the legacy monolith before the end of 2021!
- Migrate Incrementally (DARWIN: On the Incremental Migration of Legacy Information Systems)
- Legacy Migration  Brodie, M. and Stonebraker, M: Migrating Legacy Systems: Gateway, Interfaces & the Incremental Approach; Morgan Kaufmann 1995.
- Method Reference: aim42 is a collection of practices and patterns to support software evolution, modernization, maintenance, migration, and improvement of software systems
- Spotify Engineering Culture (by Henrik Kniberg)
- Innersource model: Github Whitepapers