Groupon follows SOA (Service Oriented Architecture) that allows communication between various platforms to build functional applications. Apart from enabling businesses to respond quickly, SOA has several other benefits, especially for enterprise level software. However, it does come with its own challenges, such as management of service metadata, service discovery, service lifecycle, compliance, etc. To deal with these challenges, enterprises may implement a set of processes to ensure best practices, architectural principles, government regulations, laws, and other determining factors also known as SOA governance.
At Groupon, Service Portal is at the heart of SOA governance.
Groupon started as a Rails monolith and transitioned to SOA incrementally…
From its inception, Groupon has followed industry-wide best-practices in terms of automated testing, Agile development, and pragmatic decision-making regarding where we invest our engineering time to maximize the value of our deliveries. Groupon started off with a small team of engineers in the early years of the company, hence a monolithic codebase (vs SOA) was more logical as it simplified code reuse and facilitated shared ownership.
As the company grew (the fastest-growing company in history at one point), we hired many more engineers, all working in the same codebase and distributed across the world.
This introduced complexity such as large, long-running builds and test-suite, complex deployments, and multiple teams touching the same code, increasing difficulty in defining a cohesive software design strategy between teams. This led us down the path of introducing SOA, which allowed us to scale as a business, improve feature throughput, and simplify delivery.
SOA had its own complexities:
The first few services broken off from the monolith were very similar to the original codebase:
- They shared the same languages, owners, libraries, dependencies, engineering processes/practices.
- Over time, these things diverged as-needed.
- Over time, as the company further embraced SOA, the number of services grew significantly.
- Complexity increased, not for the individual service teams, for which complexity was actually greatly reduced in most cases, but for the infrastructure and operations teams that had to understand how these separate services worked together.
- Complexity came in the form of: different architectures and technologies between services, different ways of logging, performance monitoring, and determining who to talk to with questions about a service, and differences in where the code lives, who to page for issues, understanding how to integrate between services, and managing the health of those integrations.
- Service Portal was built to help manage these complexities.
Service Dependency Graph at Groupon.
SOA Governance at Groupon
To address these complexities, we built Service Portal.
- As an internal engineering solution provider, Service Portal supports the workflows for cross-service engineering concerns. These concerns can include engineering processes, best practices, etc., (generally referred to as “initiatives”). Its value comes from its ability to reduce costs associated with managing, participating in, and taking action as a result of the outputs of these engineering initiatives.
- Cross service initiatives usually start outside of Service Portal with a small group of people that want to address some need or desire within the organization. Examples of this include the following:
- Operational Readiness Review — A process that manages the lifecycle of services, defines stages of that lifecycle, and expectations within each stage in terms of change management, asset provisioning, regulatory compliance, etc.
- Metadata about each service at Groupon, including: ownership, documentation on architecture, operations, REST API schema, infrastructure, etc.
- InnerSource — An engineering program that facilitates shared ownership by enabling service teams to support contributions to their service from anyone within Groupon, establishing guidelines to facilitate those contributions, such that they adhere to the service team’s engineering practices, and can be properly prioritized to be incorporated into the service by the service team.
- Building Blocks — Components, technology, standards, and services within Groupon Engineering that service teams can (re)use in order to provide implementation consistency and get features or functionality that they don’t need to build or manage themselves.
- The owners of these initiatives often have to: Spread awareness, track adoption, manage changes across all dependent services, and manage compliance.
In the beginning, these initiatives are often managed centrally by the owning group, which was good for quick, focused iteration in the early days but given that Groupon has over 500 active services, is time-consuming and makes coverage across all services slow. This can lead to stale tracking data and low/slow adoption.
- Once an initiative is mature/established enough, attention changes from adequately defining it and towards improving efficiency. Service Portal can help to reduce the costs and improve efficiency by:
- Raising the visibility/prominence
- Centralizing and structuring data collection and reporting
- Automating verification of data and state changes both within itself, as well as across inter-related data sets
- Enforcing adherence in accordance with agreed-upon standards within the company
- Streamlining and automating communications to owners and service teams as a result of state changes
The Service Portal team works to fulfill this role as a hub to decentralize these initiatives. Service Portal builds features that increasingly support the requirements of both the process owners and the service teams, but doing so as efficiently and incrementally as possible, given that we have a small team size.
In addition to building a set of features and UX, we build Service Portal as a platform for internal engineering operations, defining the contract for representing resources used by services at Groupon, and letting those engineering teams provide the data for those resources to Service Portal.
This is as opposed to Service Portal itself trying to pull this data from an ever-growing, ever-changing engineering ecosystem. We may start with Service Portal pulling some piece of data directly, and then over time work to invert that relationship. This requires orienting engineering teams’ processes towards providing this data to Service Portal, which is an ongoing effort for us.
Another goal for Service Portal is to allow TPMs, PMs, engineering leads, etc. determine things like:
- Which services are using a particular database engine, offered as a managed asset (building block) by the company?
- Which services use version x.y.z of an internal Java library that integrates with different assets?
- Which services are part of the purchase funnel?
This data will also help with service lifecycle management (a topic we will cover in detail in a separate article), e.g. when a request is made to Service Portal to decommission a service, it can determine whether all dependent services have stopped depending, and all assets have been decommissioned.
An example of Service Health through Service Portal UI
How we operate:
Our small dev team is tightly knit. We use tools, technology, and methods to get our work done that allows us to deliver value to our stakeholders as early and often as possible.
Service Portal is built with Ruby on Rails. Ruby is a language that gets out of our way and lets us quickly deliver value with a minimum of complexity, producing code that is well-designed, well-tested, and simple to maintain. Rails provides out-of-the-box features and conventions that let us focus more on the problem we’re trying to solve than on the technology used to solve it.
The Ruby ecosystem is rich with libraries and technologies. We use these libraries and technologies for database integrations, background job processing, distributed locking mechanisms, integration with back-office systems, integration with 3rd-party REST APIs, and more. Our platform is containerized to encapsulate runtime dependencies and simplify deployments.
Our team uses Agile principles to facilitate planning and incremental, iterative, sustainable delivery of features. We work directly with our end-users and other stakeholders and develop features to their specifications. We help them to define and refine their workflows, and to determine how best to support those workflows in Service Portal. We’re careful to clearly define the scope each step of the way, so as to avoid wasting time on speculative development. YAGNI (‘you aren’t going to need it’) is alive and well on our team. We avoid dynamic frontend behavior unless strictly necessary. “Don’t optimize for optics” is one of Groupon’s core values, and we don’t need a shiny frontend to demonstrate value for internal platforms like Service Portal. It does its job and gets out of your way.
We keep our eye on the future, maintaining a shared “technical goals” document with what we think we’re building towards technically, and how we might avoid accruing technical debt or facilitate paying it down while maintaining a sustainable delivery pace. We do this by boy scouting the code and identifying opportunities to make incremental progress towards our goals.
Our testing is pragmatic and we ensure that there is:
- Good test coverage
- High-value per-test
- Low test maintenance costs
Keep an eye out for a part 2 where we delve into some of the cool things we’re working on and our future vision for Service Portal!