Enforcing Modularity inside a Rails Monolith
Like many startups of its age, Airtasker was born as a Ruby on Rails monolith which got a bit bloated.
We believe a modern API architecture consists of a set of loosely-coupled, highly-cohesive bounded contexts. A bounded context is a term from Domain-Driven Design (DDD) which refers to the logical boundaries surrounding a single subdomain of a business.
At Airtasker, greenfield bounded contexts usually come in the form of a Kotlin microservice. But what about the core functionality we already have? When you do the cost-benefits analysis of rewriting a large chunk of code that already exists and works, it’s easy to conclude that you shouldn’t rewrite.
Fortunately, a bounded context doesn’t necessarily imply a separate service. In fact, there are lots of benefits to having a modular monolith. Like many before us, we’ve made use of Rails engines for modularity within our Rails app.
Unfortunately, adding and enforcing boundaries between Rails Engines is a DIY job. In this blog, I’ll walk you through:
1. Where we started. The status quo of our Rails Monolith and our use of engines and adaptors.
2. Why we want to enforce boundaries. Our motivation for trying to enforce strict and explicit boundaries between our Rails Engines.
3. How we enforced boundaries. A walk-through of the methods we tried and the benefits and downsides of each.
1. Where we started
Our monolith is partially, but not yet entirely, carved up into a set of bounded contexts.
As we mentioned in the intro, a bounded context is a term borrowed from DDD that refers to a single business subdomain with well-defined boundaries. Since this can be difficult to conceptualise, let’s walk through an example.
Imagine you’re an engineer at the hotshot cat grooming startup Luxuricat that just made it through Series C funding. Your business and codebase have grown so large that it’s no longer feasible to have a unified domain model. Each product team seems to have their own concerns and vocabulary for their respective parts of the app. Yet they’re also constantly treading on each other’s toes.
Since you want to keep up momentum, you decide to try and modularise your codebase along the lines of the different subdomains of your business. To identify those subdomains, you chat with domain subject-matter experts.
You find a few obvious subdomains that aren’t special to Luxuricat, like Notifications and Payments. These Generic Subdomains are prime candidates to be extracted. You’re also surprised to find out that the needs of your pampered cats have little to do with the rostering of cat groomers. This seems like two different Core Subdomains of your business, so you decide to create two more bounded contexts: CatClients and CatGroomers.
Each of these bounded contexts has their own domain model and can be owned entirely by a single product team. As long as these contexts have well-defined boundaries, each team can work autonomously within their own business subdomains without fear of disruption to others.
In our monolith we’ve used Rails Engines to encapsulate our bounded contexts. According to the docs, Rails Engines are ‘miniature applications that provide functionality to their host applications’. While they do come with a bunch of useful tools out-of-the-box, the modularity they provide is mostly superficial. We’ll explore why soon.
Given this modularity is paper-thin, you may ponder their value. Why not just properly rip out functionality into a service? There’s no boundary harder than an HTTP call.
Other than being a much smaller commitment, there is one massive benefit.
Our engines are essentially nurseries for baby services. According to DDD, your domain model is constantly evolving and you’re going to get a lot wrong. The worst case scenario is you get it wrong enough to end up with a distributed monolith, which is miles worse than the monolith you started with.
Getting the boundaries wrong between services means weeks of refactoring. Getting the boundaries wrong between engines means spending 15 minutes moving some code around before your next meeting.
What do our boundaries look like?
For our Kotlin microservices, we jumped fully onboard the Hexagonal hype train.
I don’t want to delve too deep into Hexagonal Architecture, but here’s the 5-second summary: protect your domain logic with adaptors.
Given ActiveRecord and the lack of interfaces, most of Hexagonal doesn’t translate well into the Rails world. You’ll be fighting uphill if you want to truly adopt it. However, one fundamental part does translate perfectly — protect the core domain logic with adaptors.
Hexagonal splits the adaptors into two kinds: inbound (driving) adaptors and outbound (driven) adaptors.
Inbound adaptors are entry points to an application that drive a particular user flow (for example, a Rails controller). Outbound adaptors are external calls that the application initiates — think the DB, payment providers, email, etc.
At its finest, Hexagonal promises to let you switch technologies without touching a single line of business logic. Want to move between REST, gRPC or CLI input? No worries, just write some more adaptors.
How does that apply to Rails engines?
Since we’ve established our engines are baby services, there’s no reason they shouldn’t attempt to follow the same pattern. Each of our engines defines a set of inbound and outbound adaptors specifically for inter-engine communication.
Let’s return to our cat grooming startup Luxuricat. If the CatGroomers engine needs to grab information about a cat in the CatClients engine, it doesn’t stick its hand over the fence and call:
cat = CatClients::Cat.find(cat_id)
This would put us straight on the highway to Coupling Town. Instead, we can go through an outbound adaptor:
deserialised_cat_model = CatGroomers::OutboundAdaptors::CatClients.fetch_cat(cat_id)
Which at some point would call an inbound adaptor in the CatClients engine:
serialised_cat_model = CatClients::InboundAdaptors::Cat.fetch(id)
The call between the outbound and inbound adaptor is the engine boundary. Since this essentially forms the interface of the CatClients engine, it’s worthwhile making that interface explicit. We often use dry-validation for this purpose.
At first glance this may seem like a lot of extra boilerplate. However, since these adaptors use the medium of method calls rather than HTTP, their contents are generally super slim. Serialisation is barely more than putting attributes of a domain model into a hash, and you don’t need to bang your head on the wall with retries or handling response codes.
Adding these adaptors has additional value:
1. It encourages interface-first development.
2. It makes inter-engine communication explicit and observable.
3. If we do need to pull out an engine into a service, no domain logic needs to change.
So what’s the issue?
Great, so we can add a bunch of these adaptors and only communicate through them to other domains. Problem solved, right?
Rails has something it calls an ‘autoloader’. In Rails 6 this is handled by the gem Zeitwerk, which according to the README means you ‘don’t need to write require calls for your own files, rather, you can streamline your programming knowing that your classes and modules are available everywhere’.
In your Rails app, pretty much every class is available everywhere. Yet in our case, we’re deliberately trying to protect our domain logic so it isn’t available outside of its bounded context.
2. Why we want to enforce boundaries
What we’ve described is the state of Airtasker’s monolith. We’ve got subdomains inside engines using inbound and outbound adaptors, relying on culture and code review to encourage their use.
So why experiment with enforcing boundaries? There are two motivations.
2.1 — Make it hard to do the wrong thing
The first benefit is the obvious one that comes to mind: encouraging good behaviour.
When a developer writes code in a domain and realises they need information from an entity in another domain, ideally they’d pause and go back to the domain model and ask themselves questions.
1. Is this model in the right domain?
2. If it is, should I add adaptors between these two domains?
3. What should the contract look like?
Whatever the outcome of their domain puzzling session is, 90% of the time it’s going to be more effort than simply reaching out and grabbing the model directly. And if a developer is new to the codebase, they may not even be aware of the adaptor pattern at all.
‘But wait!’ you cry out. The developers that work on your application are super switched-on and go through a rigorous code review process. They’d never let human error like that get through.
That is fair — we can presume most instances of boundary leakage would never make it through PR. However, not all boundary leakage issues will be so obvious as
leaky_model = AnotherDomain::LeakyModel. In our decoupling efforts, we discovered some real head-scratchers.
Cultural practises can only take you so far. When the path of least resistance is at cross-purposes with your architecture, time and entropy will slowly erode code quality.
2.2 — Measure progress
If it’s worth doing, it’s worth measuring.
If you start with a bloated monolith and want to take proactive steps towards decoupling it, you need to invest significant time, effort and creativity. Progress will be slow and improvements won’t be obvious. At first, developer productivity on the monolith may even start to decline. You may have to deal with sceptics.
Adding a bunch of adaptors and reducing coupling is a vague goal. Humans hate vague goals. It makes us anxious and unmotivated.
On the flip side, humans love seeing numbers go up or down. We love discrete blocks of work that can be said to be done.
For us, the biggest motivator for enforcing boundaries is to produce 1) a metric for progress and 2) a set of small, discrete goals.
3. Methods of enforcing boundaries
We had decided to enforce our engine boundaries. But in order to proceed, we had to clearly define what our goal was. What does it actually mean to enforce a boundary?
We settled on the definition of an enforced engine boundary as one with 0 instances of inbound and outbound leakage. Inbound leakage is when another part of the codebase reaches in and pilfers domain logic inside your engine. Outbound leakage is when your engine reaches outwards to pilfer another.
With our goal spelled out, our next course of action was clear. Google the problem!
We quickly discovered that enforcing engine boundaries isn’t a road well travelled. However, we did manage to find a few trailblazing blogs (links below) that were extremely helpful. It seemed there was precedent for two methods of boundary enforcement: isolated test suites and linting. We decided to experiment with both.
Our guinea pig was an engine built before we introduced the pattern of adaptors. The team that owns the domain had a spare few weeks to dedicate to the project and were keen on introducing adaptors and reducing leakage.
3.1 — Isolating engine unit tests
Your Rails app is going to autoload every class it can get its hands on. But do your tests need to? What if each engine had an independent set of unit tests that only loaded the constants available inside the engine? That way if your engine tried to sneak into another domain, the class it accessed would be unavailable and the tests would fail.
This seemed really intuitive. If you have a bunch of services, each one would have their own independent set of units tests. So why shouldn’t engines?
We set to work on our guinea-pig engine, determined to introduce an independent test suite that would enforce against outbound leakage.
Adding tests to a Rails Engine
We’ve always run the tests for our monolith from a single RSpec config. So we were surprised to learn that the conventional way to test an engine is using a ‘dummy’ rails application. This is an empty, yet fully-functional, rails application used as a ‘mounting point’ for the engine during testing.
It seemed we’d generated our engine without the dummy app, so we needed to go back and regenerate it. Afterwards, we naively typed rspec and hoped nothing would break.
Turns out, a lot of things broke. This was our first learning: expect to spend a lot of time introducing and maintaining config.
At minimum, you’ll need to maintain separate rails_helper.rb and spec_helper.rb with only the config relevant to that specific engine and its dependencies. Your engine .gemspec needs to be in tip-top shape as well, with all gems manually imported with require.
Once we’d solved the config problem, we were greeted with a towering list of failing tests. And we had our second learning: isolated test suites are brutal at catching boundary leakage.
Isolated tests were effective
The tests caught a lot more leakage than we expected. Some of the leakage was obvious, others hidden and implicit. Some were quick changes, while others took days of work for a single green test.
They were also a superb motivator. Slowly reducing the failing tests to 0 was addictive and ultimately adding it as CI step was a proud moment.
There was an additional benefit which we hadn’t expected. We’d known our engine was coupled — but what about our tests? It turns out our tests, specifically their use of FactoryBot factories, were in a much worse state than the code they were testing. Our factories were ripe with associations created across engine boundaries.
What about inbound leakage?
We’d had such a blast enforcing against outbound leakage for our guinea-pig engine, we were excited to do the same for inbound leakage. We eagerly sought the advice of our original inspiration, the awesome blog Modular Monolith.
It had an interesting approach. To illustrate, let’s return back to our cat grooming startup with their two engines: CatClients and CatGroomers. CatClients is a dependency of CatGroomers — to groom a cat you need to know that cats exist, but cats don’t need to know that the cat grooming profession exists.
To follow the blog’s approach, the test suite for CatClients would run in complete isolation. However, the test suite for CatGroomers would also load the CatClients engine as a dependency.
In this instance, a portion of the tests inside the CatGroomers engine are acting as integration tests. If you test a CatGroomers method that fetches a cat, you run code inside the CatClients engine and can be confident that no integration issues occur for that user path.
That’s awesome. But how do you know that the CatClients domain hasn’t leaked into the CatGroomers domain? How do you know that a service in the CatGroomers domain isn’t ignoring adaptors and reaching in to grab CatClients domain models?
There is an immediate solution to this problem. If isolating the CatClients test protects against outbound leakage, we can protect against inbound leakage by isolating the CatGroomers tests.
Therein lies another problem. If you have two engines that collaborate together, both unit tested by mocking the boundaries between each other, how can you gain confidence that there are no integration issues?
If these were truly two separate services, the solution would be clear. CatGroomers needs a separate set of integration tests.
While this does sound like the ideal long-term solution, the upfront commitment to write a new set of integration tests is high.
Is there a nearer-term method of measuring and enforcing inbound leakage?
3.2 — Linting with Rubocop
Enter our second experiment, courtesy of this amazing blog by Flexport. The idea is pretty neat — just lint boundary leakage with Rubocop!
Flexport wrote three custom cops: one to prevent new models being created in the core
/app, one to stop engines accessing code inside the core
/app, and one to protect against access into an engine. They’ve even admirably open-sourced them.
Our main interest was the final cop that linted direct access into engines. This sounded like a great next step. We could measure and enforce inbound leakage for our guinea-pig engine while still keeping the existing tests that were inadvertently acting as integration tests.
We dug into the cop implementation. To our pleasant surprise, our first learning was that custom rubocop cops are pretty intuitive and a lot of fun. Definitely recommend playing around with them.
The idea behind the cop is fairly simple and replicable in our codebase. Essentially, the cop lints for two specific scenarios:
Uses of classes with an engine namespace:
And associations using an engine namespace:
has_one :leaky_model, class_name: “NotYourEngineNamespace::LeakyModel”
The simplicity was reassuring, but also brought us back to earth. Linting wasn’t going to be a silver bullet — the subtle leakage issues the isolated tests caught were going to fly under the radar.
This approach does have a lot of benefits: it’s super easy to get set up and starts to enforce against leakage from the get-go. It’s definitely a lighter touch than the isolated tests, which required a whole heap of config and didn’t enforce until we cleared the failing tests and added it as a CI step.
That’s the result of our experimentation so far. We’ve got a single engine enforcing against outbound leakage with an isolated test suite and enforcing against inbound leakage via a custom Rubocop cop.
Using Rubocop was a lot easier to get started and produced results immediately. However, the coverage wasn’t exhaustive. Isolating the tests was a pain in the arse, yet caught way more leakage than we ever imagined.
Overall, we’re satisfied with these approaches and plan to use a mix of both — Rubocop in the short term, working towards the goal of an independent test suite for each engine and a supporting set of engine integration tests.
Let us know if you have any questions or want to share your own views of enforcing Rails engine boundaries!