Breaking up Barracuda

Published in

EarthVectors

12 min readAug 6, 2019

A few years ago, I had the opportunity to break up a Ruby on Rails monolithic web application, code named “Barracuda”. I’d like to share my experience extracting a subset of Barracuda in order to scale and enhance it.

Background

I worked for a small successful startup that created the Barracuda web application, mobile apps, and other tools such as business intelligence in the retail vertical. Barracuda was a typical Ruby/Rails monolith that was quick to build but was reaching some architectural limitations.

One of Barracuda’s features was around photo processing. An integration with a third party vendor was used to add metadata to the photos via an API call from Barracuda. For fun, let’s name the vendor “Atlantis”. Some of the responsibilities related to this feature include:

Associating metadata to customer photos
Accurately billing customers for photo processing on a per-photo basis
Error handling when photos needed to be re-processed
Display of the metadata in a separate web application for BI purposes

The feature was at a pivotal turning point where it had proven successful for one customer and the company wanted to scale it to all their customers.

The Problems

The existing implementation prevented widespread adoption because of several limitations.

Manual Configuration

To enable a new customer for photo processing, a combination of code changes and database updates were required. The update process was error prone and tedious. In addition, code changes required a code deployment and integration testing with Atlantis in production.

Testing

All new processing models were tested in production because of a lack of an integration sandbox with Atlantis. Code changes around this area were high risk because the lack of sufficient automated tests created the need for manual regression testing.

Billing

In order to calculate billing for customers with photo processing enabled, database queries were manually run at the end of each month to calculate costs. The billing logic had to correctly calculate costs based on the number of photos processed in the time period and exclude re-processing under certain conditions, such as processing mistakes. Some processing mistakes were categorized as mis-configuration errors, and would not incur additional charges to the customer. Others were passed along to the customer as additional fees. Many of these exceptions required manual intervention to determine the category.

Architecture

Because the photo processing was part of monolithic Barracuda, the data was stored in a single database. This data was used for Barracuda, for the photo display web app, and for billing calculations. Having the data commingled caused the separate concerns to be difficult to manage. Barracuda might want to store the data a certain way to optimize the configuration, while the billing would be easier by changing the data to a different format.

The below diagram shows some of the photo logic that Barracuda was responsible for. Although the photo logic is shown as separate processes, the code was commingled with Barracuda’s other concerns which are not included in the diagram:

As demand for the photo processing feature grew, we knew that the existing design was reaching its limits. The company had two high level goals: Enable photo processing for additional customers and expand the feature set of photo processing. To support these goals, the team identified these immediate needs:

Eliminate manual configuration
Extract photo processing to its own application
Automate billing

Approach

Our long term vision was to extract a separate application (code named “Guppy”), create a new product team, and expand the feature to all our customers. Getting there was going to be a long process and we were dedicated to these principles:

Test Coverage

The manual process of testing with every release was severely inhibiting feature requests. In addition, we lacked tribal knowledge of how the photo code worked. So we were committed to increasing test coverage. We employed TDD for all new code and added tests for all legacy code we planned to refactor or redesign.

Releasability

We had other responsibilities in Barracuda including bug fixes and other features. So it was important to work in a fashion that allowed continued releases and short feedback loops.

Maintain functionality

We operated within the constraint that we couldn’t disrupt or change our current customer’s expectations nor our agreement with our third party vendor, Atlantis. We had to move towards our new vision while keeping features backwards compatible.

Working with our product owner, we had a whiteboard session to identify dependencies between our three immediate needs. For example, improving billing would be dependent upon getting the data in a new format, so we decided the app extraction should happen first. Similarly, we could easily automate the configuration without making the app extraction harder, so we moved that to an early stage. As people raised issues to be resolved and work to be done, we decided as a team which phase fit best for each item. Phase I moved away from manual configuration. Phase II was refactoring to create a seam to extract the new app. Phase III involved the extraction, and Phase IV added features to the billing process.

Phase I

The goal of the first phase was to remove the hardcoded configuration that required code releases to add a customer to the photo processing feature. Additional whiteboard planning and design sessions allowed us to drill down into the details of the first phase. These are the steps we came up with:

1. Backfill tests

Because there were no automated integration tests for this behavior, we began by writing characterization tests to provide cover for our upcoming refactor. We wrote these at a low enough level to focus on just the changes we were about to make, but high enough to allow flexibility in how the code was written beneath the test umbrella.

2. Move hardcoded configuration

Keeping our long term vision of moving the configuration into Guppy in mind, we created completely new tables with only minimal references to existing foreign keys.CUSTOMER_IDwas used as a foreign key to tie our new tables to existing customer data.

3. Populate tables

Using our tests to protect us, we moved the hardcoded values to the database. Relying on the newly created tests ensured we didn’t break our existing customer’s experience.

4. Automate configuration

We planned to give our internal users a UI to edit the customer configuration data, but that logic needed to live in the new app, Guppy, which didn’t exist yet. For now, we created a rake task to allow those changes to be made by developers, but without the need for a code deployment.

The new code in the rake task was covered by TDD which helped drive our design and improved our test coverage.

Phase II

The focus of the next phase was to separate concerns within the monolith in preparation for creating the new app. Extracting the logic into Guppy was going to be a high risk change which required coordination with Atlantis, so we began with more test coverage. Pair programming, frequent rotation between pairs, and mini-design sessions were key elements that allowed us to respond to unexpected challenges we encountered along the way. It was during this time that we decided during one retro to use “Clicking start on a user story” as a trigger to pull the dev team together for a mini-design session. This was extremely helpful to keep us all on the same page and avoid getting bogged down into implementation details during story grooming.

1. Breaking external dependencies

Up to this point, the only way to verify functionality with our external vendor was to “test in prod” in close coordination with Atlantis. This was another obstacle to scaling — if we couldn’t change our architecture without involving the vendor, we would be forever dependent upon their availability. Breaking this external dependency in test was critical. We added test coverage for the photo processing logic by stubbing the calls to and from Atlantis. We covered all happy paths and edge cases with additional characterization tests. We uncovered several bugs in the process but prioritized those bug fixes separately to keep us focused on our long term vision.

2. Extract service class

We identified a seam in Barracuda where we could separate photo processing from its other responsibilities and pulled that into a class in a new service layer. We called this a service class as an indication of a separate layer. At this point, it was all still part of Barracuda’s monolithic architecture.

Pulling the logic in a single controller apart was messy. We had to separate the parts of the procedure that involved the photo processing from the other steps. We began by identifying the if clause that checked if a customer was enabled for photo processing and followed that logical seam.

As we worked, an interface between Barracuda and future-Guppy started to emerge. In addition to CUSTOMER_ID, we could see that some minimal details of the photos themselves would be required to coordinate between the two apps. We kept the service’s parameter list to that minimal interface. It would serve as our RESTful API later on.

As part of extracting the service class, we added unit tests to cover its existing behavior. Although the service we extracted was rather big and ugly, we knew that was good enough for now and we would have plenty of opportunity to improve it further.

Phase III

With the logic cleanly separated in Barracuda, it was time to create Guppy! This phase saw a lot of emergent design as we worked with the new data model. We had a “DO NOT ERASE!” whiteboard with our data model and lots of arrows to show how the legacy data would be mapped to the new data. We referred to this diagram daily and made updates as needed. It was a highly collaborative and fun process.

1. Guppy is born

We created a new Rails app for Guppy. We chose to stay with Rails as the framework because the code we wanted to move was already RoR and would speed up the extraction. We were focused on getting Guppy separated from Barracuda as quickly as possible without getting sidetracked by additional architectural improvements (yet). We chose HTTP Basic Authentication for inter-app communication between Barracuda and Guppy as it was simple to implement and we did not have more stringent security requirements driving us yet.

2. Separating responsibilities

At this point we could see that our configuration tables needed to be split further to support Barracuda’s and Guppy’s separate responsibilities. We decided that Barracuda would be responsible for knowing which customers had photo processing enabled (the what) and Guppy would be responsible for knowing how to do that processing. Barracuda maintained the list of customers enabled, and we created new tables in Guppy for the details of how to integrate photo processing with Atlantis.

The service class we had extracted moved to Guppy and Barracuda’s controller call to the service changed to a RESTful call to Guppy.

Part of Guppy’s responsibilities would be to receive the callback from Atlantis, but that required Guppy to be fully deployed. So we wrote code for Barracuda to forward those requests to Guppy and put that logic behind a feature flag.

3. Release management

The above changes could not be made wholesale without introducing downtime for the photo processing feature. Instead, we coordinated deployment between the two apps with the following strategy:

Add new feature to Guppy alongside existing behavior
Start using new feature in Barracuda
Remove old behavior from Guppy

For example, to move a new responsibility from Barracuda to Guppy, we would:

Modify Guppy to accept an optional parameter sent to it from Barracuda. If Guppy saw the parameter, it used it, but otherwise did not rely on it. Release Guppy with this optional capability.
Modify Barracuda to start sending the parameter to Guppy. Since Guppy was already released, it could count on the parameter being handled. Release Barracuda with this change in functionality.
Modify Guppy to require the new parameter and remove the optional branch in the code. Release Guppy with this updated contract now complete.

This involved many small releases and steps but allowed us to have 0 downtime and to work with small feedback loops to ensure everything was working perfectly. Note that the releases were order dependent, but otherwise the timing allowed for flexibility. They didn’t have to be released at the same time or even on the same day. We could have also used feature flags but they provided no additional benefit given our deployment process.

3. Atlantis callback

Guppy was ready for prime-time! We changed our internal routing of the callback URL to point to Guppy’s API and the third party vendor was unaware of any differences from their end.

4. Syncing data

Separating the photo processing code proved to be far easier than separating the data. Poor dependency management on Barracuda’s database meant that there were many dependencies on the data outside Barracuda’s control. We had to maintain that data contract. But we also wanted some of that data to move to Guppy. Thus we introduce data synchronization between the two apps. Guppy would receive the metadata from Atlantis’ callback, store it internally as it saw fit, and then call Barracuda with the data in the legacy format Barracuda expected.

Phase IV

At this point, Barracuda and Guppy had clearly defined responsibilities with the logic and data separated and we were in a place to grow the feature.

1. Billing reports

Using the old queries as our baseline for comparison, we created a new billing endpoint in Guppy. We used manual comparisons to ensure that we captured the correct logic in the new code and then wrote automated tests to lock down that logic.

2. Data migration

In order for Guppy to accurately create billing reports, it needed old data that lived in Barracuda’s database. We created backfill tasks to migrate the data to Guppy. The amount of data was too large to migrate in a single batch operation so we sliced it by date range. Beginning with the most recent data, we moved backwards in time incrementally until it was all moved over.

Now that Guppy was unhindered by Barracuda’s concerns, we were free to create data structures that made sense for creating billing reports. As the data was being migrated, it was also transformed into tables optimized for billing purposes.

As each date range was migrated, we were able to validate old billing reports from Barracuda against Guppy’s new report to ensure correctness. It was a cool way to validate our theories in real time instead of relying on exhaustive analysis.

3. Billing Enhancements

Whew, we were close! With Guppy handling the photo processing, data, and billing responsibilities, we were in a place to make the requested enhancements.

To finish off our highest priorities, we enhanced the billing API to allow date ranges (in addition to just end-of-month), scoping by customer, and more analytics.

New Architecture

Our final architecture after extracting Guppy looked something like:

Future

We had accomplished our high level goals of eliminating manual and error prone processes which set us up to turn the feature on for additional customers.

In the future, Guppy would see a new team created which grew the little app into a suite of tools and customer enhancements that enabled the company to grow a whole new line of business.

Lessons Learned

We learned a lot about how to work with legacy code. One of the secrets of our success was the continued focus on high level goals. While adding tests to existing functionality and migrating data, we encountered a ton of areas for improvement. But we knew that if we fixed them all, we would delay getting to the place where Guppy could be extracted and improved. We kept a clear focus on either refactoring or adding features and never commingling the two actions.

We were surprised at how hard it was to separate the data. As mentioned earlier, allowing others to be dependent upon Barracuda’s data structure caused a lot of pain. Even though these dependencies were internal company processes, they were almost impossible to remove or change. (Obviously, avoid this situation by creating a layer of abstraction as soon as possible!) Direct database reads made our lives very difficult! We accomplished the separation with extra processes to copy data and a commitment to which app owned which part of the data.

Legacy data is difficult to work with and we encountered a lot of surprises when we moved it. Slicing the data by date range allowed us to make course corrections as we discovered data anomalies instead of having to predict and fix all of them at once. Iteration for the win — in code and data!

Another win for us was to keep a long term vision without nailing down all the details too soon. Every step was a mini-design and discovery process which worked well. We knew what we wanted to accomplish in each phase, but not exactly how to do it until we got there.

The whole process took about four months to complete. I credit much of the success of this project to the highly collaborative nature of the team and willingness to adapt to changes. We were fortunate to be working with a product organization that accepted the unknown timeframe and willingness to work with us in a highly transparent way.