Written by Sekeol Kim
One of the more satisfying aspects of designing flows within a large Service-Oriented Architecture, such as the constellation of services which power every aspect of the Groupon ecosystem, is the constant struggle to make systems more efficient.
There are obvious benefits (new functionality!) and drawbacks (more maintenance!) that compound each time a new service is added to the stack. It’s also the case that you often don’t have an idea what requirements your final product will have; it is an incredibly fortuitous rarity if your final architecture looks anything like the one you started with. More often than not, you find yourself in a situation where your aging stack needs a good Marie Kondo-esque revision.
These don’t have to be grueling spring cleaning sessions, and they often don’t have to be done in lieu of feature work. Finding synergies between components within an existing service architecture and combining them to reduce overall complexity can be, at once an exercise in simplification, as well as an exercise in risk reduction. This can be crucial to enhancing the anti-fragility of your architecture model.
As I write this, our booking teams are in the middle of a long-dated simplification effort. The booking systems at Groupon began with early features baked directly into the monolithic web application which powered our entire website. As we began to outgrow our primary Rails-based web application, the online booking components were stratified into several discrete services:
In this diagram, a thin web client interacts with booking endpoints exposed through the Online Booking Appointments API. This API, for historical reasons, maintains its own data store, but primarily communicates state changes (that is, appointment creation, cancellation, or rescheduling) through a message bus to two different targets:
- A Calendar Service, which maintains a cache of appointment availability and a record of booked appointments (as opposed to fetching that data on every page refresh from the partner service);
- A Third Party Appointments Adapter, which communicates that appointment state to a booking partner service. This interface is also used for our own Merchant Booking Tool; depending on the booking partner information stamped onto the deal, we route calls to either an external partner or to the internal Booking Tool service.
While the arrows in the diagram above demonstrate the propagation of CRUD operations from the web UI through the API to the partner (and the bi-directional arrow between partner service and the third party adapter represent the request and response, potentially triggering a retry or failure scenario), these systems also communicate with each other in other ways — for instance, loading the web appointments frontend will load the reservations from calendar service directly through the appointments API, rather than using the message bus.
So far, so good. This booking platform has all the components to fetch and cache bookable appointment availability, as well as to create, reschedule, and cancel bookings, either internally or with a third party. This works well for cases where Groupon controls the inventory purchase funnel entirely and only needs to communicate appointment-related information to a third party. However, as the utility of the booking platform grows, we increasingly find ourselves in a situation where we need to integrate with partner services with considerably more complexity. In addition to managing appointments, some partner platforms have their own inventory management, meaning that we need to communicate purchase events via a partner’s public API.
Here’s where our opportunity for simplifying forwards comes in. While Groupon has a third party platform built expressly for the purpose of coordinating supply and order management workflows with an external platform, those systems were not built with real-time appointment management in mind. Consequently, when designing this feature, we can realize the greatest productive use of engineering bandwidth by extending our service architecture in ways that allow us to combine the functionality of the booking platform and third party platform, preferably also allowing us to reduce our service footprint in the future by consolidating functionality in newer components:
There are a few different behaviors which this architecture needs to support:
Availability Fetching. At various points during the inventory lifecycle, customers will need to know whether bookable inventory is present. There are actually two types of “availability” of interest to a consumer — the purchasable inventory itself, and the calendar of appointments. These are often 1:1 — a customer may purchase a ticket to a live show which corresponds directly to a ticket with seat and time information — but may be 1:n if a customer purchases an item that corresponds to multiple sessions, or even unlimited appointments within a limited time span. The Third-Party Orders & Appointments Adapter periodically fetches availability for caching purposes, but can also be called synchronously directly by the Calendar Service to fetch live availability in cases where consistency is important (for instance, right before purchase).
Reservation Locking. When a customer indicates intent to purchase inventory with limited appointments, we usually want to remove that availability to prevent simultaneous reservations from taking place. Some partners provide an interface for this, allowing reservation locking which expires after a period of time; if the customer makes a purchase during this time, then the locked availability is “upgraded” to a booking. (If a partner doesn’t support this feature, the Calendar Service maintains the reservation as ‘locked’ anyways, to support the remaining flows; it just doesn’t communicate the status to the partner platform).
Inventory Purchase. Where the last flow left inventory and order management up to other parts of the Groupon platform, we now need to send purchase signals as well as appointment bookings to the partner platform. While the Groupon orders system is not displayed in the above diagram, purchases are propagated via message bus to the Third Party Orders / Appointments adapter above, which propagates purchase information to the partner. The Calendar Service also receives the same message, which triggers the promotion of stored locked reservations into full appointments.
Appointment Reservation. This happens either automatically on the purchase or, for some types of deals, the customer makes a purchase and then makes one or more appointments from the customer’s My Groupons web page or Groupon mobile application. The signal from the consumer frontend is provided to the Calendar Service, which returns an immediate response to the frontend (“reservation pending” status), and asynchronously propagates the reservation to the partner platform, which returns either a success or failure. These confirmations/denials are sent to the customer as notifications, and the updated status is exposed via the consumer frontend.
Appointment Status. In general, the current appointment status doesn’t require a partner call but is present in the Calendar Service. However, we need to occasionally update our booking status information from partners in order to cover the following cases:
- Sometimes the customer purchases an item with multiple bookings, but makes appointments directly through the partner platform or with the merchant; Groupon needs to update booking information on occasion to make sure we reflect the time and status of all appointments the customer has made.
- A customer may also decide to cancel or reschedule bookings through the partner platform or directly with the merchant. In this case, we want to ensure we provide up-to-date information when the customer checks their Groupon account.
Cancellation and Rescheduling. We also need to support the opposite case, where the customer decides to cancel a booking through Groupon. We handle this in the same way as creating a reservation (“cancel requested” status), and on success or failure, the reservation is moved into “canceled” or “booked” state (as a failed cancellation is the same as a booked appointment — but we send a notification to the customer to inform them of what happened).
So what have we changed in this design, compared to the last design? What are we doing to improve the platform in a way that enhances functionality without increasing complexity?
- In the former design, the Calendar Service functioned as a cache for availability and appointment information, and a separate service managed appointment logic. However, since support of this feature requires considerable frontend changes as well, we made the conscious decision to simplify the API and provide frontend-ready responses directly from the Calendar Service. This allows us to set a timeline to migrate the remaining features out of the Appointments API and into the Calendar Service.
- The Third-Party Appointments Adapter for the original service was a component of the booking platform, devoted to the singular task of synchronizing appointment information with external partners. As the requirements for an adapter increased, we chose to build a new adapter component to supersede the old one. There were a couple of reasons for this:
- The older Third Party Appointments Adapter is fairly out-of-date and has been minimally maintained; it would require a decent modernization effort to bring it in line with the increased latency and throughput requirements of the newer adapter.
- The newer Third Party Orders & Appointment Adapter occupies a more integrated role in the Groupon ecosystem — not pictured in the design implementation above (for simplicity’s sake) are the platform services utilized for mapping partner ids to internal ids or for managing signed requests. The original adapter was a much more standalone component.
- Using Calendar Service as the local source of truth for appointment information and the initiator of synchronization tasks allows us to bypass a flaw which might be obvious from the initial diagram — if the Third-Party Appointments Adapter fails to act on a booking sync message, we can lose consistency when the local, Calendar Service data does not reflect the partner appointment status.
After making the above changes, we now have two flows — the legacy flow through the old service architecture, and the newer behaviors which support synchronization of both appointment and purchase events. Since the capabilities of the newer platform are a superset of the capabilities of the older platform, we have a pretty straightforward task in deprecating the older Third Party Appointments Adapter and Appointments API and migrating them into the new service.
These kinds of large changes are very difficult to scope for when there is a constant demand for increased feature development and functionality. Often, in order to make meaningful iterative improvements to systems (and more importantly, to avoid building hacky workarounds), it’s a useful exercise to consider how the implementation of newer features, especially large ones, might also be useful in modernizing your older flows. One should be considerate of the amount of technical debt accrued in doing so, of course — in a heavily feature-driven environment, the cost of service deprecation, feature migration, and refactoring is not insignificant — but very often, if platform shortcomings are considered carefully alongside new feature requests, significant steps forward can be made in both areas for much less effort than it would be to tackle both items separately.
To summarize, it pays to be cognizant of the limitations of an existing platform whenever a large feature presents an opportunity. Often, if you consider the known pain points or future plans for related parts of your platform, you can design a versatile solution which solves not only your immediate feature goals but also lays the foundation for improvements within the larger scope within which your feature change resides.