Rewriting the Monolith — Breaking Out APIs and Workers

Sharon Grossman
Fabric Engineering
Published in
5 min readJul 12, 2022

At Fabric, we’re all about the real world. We’re creating the future of retail through robot-enabled micro-fulfillment, with a platform focused on speeding up delivery times and reducing last-mile costs by bringing products closer to consumers. Recently, we were faced with a fairly typical challenge for a startup: We needed to scale up a system already in production to handle greater load.

In most companies, a bug is something you can solve quickly and then move on, but in our platform a bug in production can be two robots crashing one another — impacting service levels, customer satisfaction, and ultimately a retailer’s bottom line. And we wouldn’t want that to happen, so we need all of our parts moving in perfect harmony.

Simply put, we overhauled our entire underlying platform architecture, moving from a monolith to a suite of microservices — all without impacting production.

Let’s take a look at why we did it — and how!

Typical Startup Growing Pains 😣

In our urgency to get features out the door and ensure our order fulfillment was bulletproof, we ended up with a single monolithic application driving a lot of our critical functionality. This monolith was riddled with tech debt, making it very challenging to identify and resolve bugs. Furthermore, as we grew the product, we discovered a few misconceptions we had about the underlying models, as well as a few anti-patterns we’d inadvertently introduced.

Ultimately, a few key points prevented us and our code to scale accordingly:

  • Adding new features often exercised in frustration, with cascading failures throughout the highly coupled code base.
  • Onboarding new recruits and training took quite some time and even then, it was hard to cover all of the monolith’s different parts and aspects.
  • General product evolution meant a lot of the code had little context, and most of our time went to refactoring and getting new business demands done, instead of refactoring.

We needed to do things differently — and quickly.

Back to the basics 👇

To help us scale up and meet our customer’s growing needs, we opted to hop on the microservice bandwagon. Sure, everyone’s doing it and microservices are the current fashion, but there are very real benefits that a microservice architecture has to offer.

To start tearing the monolith apart, we returned to the base principles in our system. This generally consisted of finding the “nouns” and “verbs” that drive our system, at a conceptual level. Imagine, for example, I have a pallet of Coca-Cola in warehouse bin 34. All of my warehouse bins are the same size and shape, and what they contain is important to the buyer and seller, but this is absolutely of no consequence to the robot responsible for lifting the item in bin 34 off the shelf and taking it to shipping station 7.

This allowed us to identify the core objects in our application and separate out the concerns. Because of our unique problem space (robotics and software), we didn’t have the luxury of re-using someone else’s code — whatever we were going to do, we would have to build it ourselves.

APIs and Workers to the rescue 🙌

We settled on a pattern that focuses on the divide between APIs and Workers, which lets us classify every service appropriately.

API is a “dumb” interface, existing entirely to provide CRUD access to the underlying database. The API doesn’t handle any “business” logic, but rather stores, defines, and maintains the application’s “models.”

We were then able to group these APIs by business domain using Fabric, so our “Picking” models could be fundamentally distinct from our “Quality Check” models, and so forth.

To perform the actual work of the application, and further decouple these APIs from one another, we then created a suite of very small, very simple worker services. Each worker would be responsible for one portion of the model’s transformation as an operation was conducted, usually by consuming an event and reacting to it.

With APIs and Workers separated and clearly defined by domain, we could split our team up so groups of developers could take ownership of individual domains. This allowed the whole team to move more quickly, as we were able to more effectively distribute change requests across the entire team and thus more efficiently complete feature delivery.

Let’s look at an example

With the deconstruction of the monolith, it became necessary to truly understand how the data objects and services interacted with our robotic hardware.

Here’s an example flow of how robots are being requested into the different stations, to pick items for incoming orders:

  1. A request is sent to the robotic-arm-api service to move a Robotic Arm into Picking mode.
  2. robotic-arm-api persists this information and publishes a message to Kafka.
  3. picking-allocator, a Worker, consumes the Kafka message and calculates the orders that need to be processed in the micro-fulfillment center.
  4. picking-allocator then creates Allocations by calling allocation-api, using the provided endpoints.
  5. allocation-api persists this information and publishes a message to Kafka.
  6. The request-robots Worker consumes the “new allocations” Kafka message, translates the message into robot requests Kafka message that is then sent to the robotic system.
  7. The robotic system now knows which robots are supposed to arrive at the station to be picked for the incoming orders

Going Forward ⏩

We’ve instituted strong templates for new microservices, as well as a culture of continuous refactoring of our existing services to match new standards and technologies.

Some of the key cultural principles now include:

  • Development Velocity — Move fast, break fast
  • Collaboration between teams & domains and clear boundaries in shared code and services.
  • Tracking issues is now more clear, with domain responsibility, and pinpointing bugs is a lot easier.
  • Standardization and introduction of new technologies is more versatile now, as opposed to integration in a monolith.

We also collaborate heavily with our SRE team and introduced new tools to enable our teams apply these principles:

  • Enabled tracing with Istio and Jaeger, which allows us to quickly create a correlation between adverse events in disparate services, creating a visual flow of information that helps us in tackling bugs.
  • Usage of Dafka to handle Kafka message consumption and production via HTTP requests, and a clear decoupling of the messaging layer from our services.
  • Integration with Prometheus and Grafana, as well as general HTTP-oriented tooling to enhance error reporting, logging, monitoring, and system resilience.

This process has enabled us to handle the orchestration of many microservices, and expansion of our business domains with the ability to meet demands at a high velocity, but more importantly — make a shift in our core development principles, to be at the edge of technology, and incorporate the methodology that’ll take us higher, as a team.

--

--