Lessons Learned with Integrations of Teams and Systems When Building Products with Microservices

Published in

United Effects™

12 min readNov 3, 2021

A few years ago my team at the time and I needed to help come up with a way to enable multiple distributed teams to build microservices that could be orchestrated to work together to create a seamless experience. The trouble was that our distributed teams were tripping each other up because of dependency management between them.

Team A was responsible for a microservice on Domain A while…
Team B was responsible for a microservice for Domain B, but…
Team C, who was responsible for Domain C, had transactional dependencies for both Domain A and B.

There were several issues that conspired together to create the delays and difficulties we faced, both process oriented and technological. They included issues with how we mapped domains (too small) and the fact that our UI was a single monolithic UI interfacing with multiple backend components and API definitions. I won’t speak about domain decomposition for domain driven design in this article but we will address it in the future. Regarding the UI, United Effects is implementing an approach to this using micro-frontends which mitigates the issue, but this too is a topic for another time. In this article, I’d like to focus on some of the high-level lessons learned around how development organizations and their teams perceive and address work in a distributed environment, and how those lessons continue to guide my approach today.

At the time, there were several processes and technologies piloted to address the challenges described. For example, we used SAFe Agile in an attempt to manage dependencies. We also tried out a technical solution to ensure API integrations did not break called PACT.io. Each of these approaches had benefits and costs, but often seemed to slow down progress dramatically. In the end, what I learned was that often we employed these external crutches to account for a simple missing concept: software development maturity and quality. We were using these tools and processes to tell us we did something wrong so we could address it after the fact, an approach that inherently slows progress, when instead we should have had conversations up front and designed our code and processes to avoid these situations in the first place. This is the definition of mature quality control, ensuring that your work is free of issues rather than focusing on catching issues after the fact through testing alone.

With those experiences, here are my lessons learned and my approach to building distributed software and APIs at United Effects.

Business vs Technical Concept of a Product

I’ve come to realize that there is a perceptual change required when designing and building “products” in a microservice (distributed systems) environment. Typically, we are accustomed to the idea that a product is one concept both from a technology perspective and a business perspective. This 1:1 business to technology perspective is often used as a convenient way to describe all of the resource alignments, costs, and strategies that go into the product. We assume that because it is all in service to this one concept, we can and should finance the effort through a single silo of funding, which in turn, leads to a single team of people who coordinate the stream of work required to take the product from concept to reality, and then continue its support.

You may have heard of Conway’s Law:

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

This is an adage I’ve come to realize is almost certainly true; however, as executives and technology leaders attempt to glean insights in organizational design from its implications, I’ve noticed that they often ignore the impact that a siloed funding pattern can have on the resulting organization structure. The result is that our best notions of team allocation, separation of concerns, agile processes, and technology innovations can all be for nothing if ultimately the funding pattern forces us to address our distributed efforts as a single working team, which eventually leads to serialized work streams, which ultimately leads to dependencies, and finally monolithic products.

The perception of what a “product” is impacts our notion of development processes as well. Developers are often trained to think about Agile implementation as taking a small vertical slice of the product, completing it, and then iterating forward from there. Again this makes sense because in a monolith 1:1 tech-to-business model, a single team (however large) is held responsible for all of the components of the product.

The problem is that the business definition and technology definition of a product are not 1:1 in a microservice environment. When dealing with decoupled services, each with their own teams and CICD pipelines, you are also dealing with concurrent work streams, all attempting to reach development milestones at the same time. Additionally, the microservices being developed may represent more than one business product. This reality means we must shift our understanding of a product from a technical perspective. The business to technology alignment is now 1:n, where n is the number of concurrently developed backend services required for the business concept being defined.

3 teams take on 5 bodies of work for 3 products where only 3 bodies of work are funded

In the above picture you can see that while Team A is fine in its funding silo with its monolithic product and technology, Team B and Team C are going to run into trouble. They will trip over each other trying to work out who works on Microservices 2 and 4. They are also taking on additional work that was not funded by their initial value stream.

Your Component is a Product

If you are a team assigned to building one microservice that will be used in one or more products, you must first trust that your partner teams in charge of those integrations will be responsible for the correct implementation of the interface you provide. Next, you must relieve yourself of the notion that a vertical slice of the product in question requires you to have access to the external integration point.

Draw a box around your component and think of this as your product. If you must make logical separations of the work, do so within the context of your piece of the puzzle only (i.e. the box). You should also take a moment and try to understand who will actually be the end user of your component. Is it the customer buying the final product? Probably not. Your end user is the team that must integrate with your API. To that end, you should try to productionize your implementation as much as possible so as to ensure end users are successful. All of the rules of product design still hold true… you should work to delight your user, whether they are internal or external.

Iteration Is Different with Distributed Systems

When dealing with monoliths, there are only so many architectural patterns possible between the UI and database — or rather, you probably have a clear understanding that whichever pattern you choose will continue to be implemented for all functionality. In those situations, it may be ok to work your way across the functional spectrum of the product without fully understanding how the final elements will actually come together. In a distributed system, this can be a dangerous assumption.

It is true that you may not need to know the low level implementation of every feature of your component, but the potential integrations and interface points are not nearly as concrete and well established as they are in a monolith. You have no idea how an integrating team will break your interface (and someone will break it).

It is my experience that to promote the best possible chance of success, you should actually create a wide but shallow horizontally complete implementation of your interface immediately. Define the API, define temporary controllers, define the database schemas and wire this all together into a skeleton of the overall component. From there you can back up and iterate one controller or class at a time to complete the functional implementation.

Define and Version your API Manually

I suspect this will be a more controversial point. I am of the opinion that an API should be defined and maintained manually, especially when that API is REST and communicated through JSON Schema via OpenAPI. Allowing auto generation of the API definition is problematic for a few reasons but 3 of my standouts are:

We immediately break the previously stated lesson learned of creating an initial shallow but wide implementation of the interface.
We are at the mercy of the library generating the API definition and it’s supported versions of the API specification language.
Independently versioning an automated specification can be tricky.

Manually managing your API specification has many advantages:

Immediately defining the API before coding allows partner teams to get a sense of what they will need to build for client integrations.
API specifications such as OpenAPI can also be used to generate code for client integrations, test services and more, long before full functionality is completed.
API specifications can act as requirements for the code implementation, ensuring that teams discuss the intended result and do not accidentally coding something unintentional that would be a breaking change.
A single versioned API specifications can be the source of truth for your component’s interface. Multiple concurrently operating teams, along with the implementing team, can all rely on this source of truth to build matching validation tests independently without the need for excess coordination.
Many API definition specifications like OpenAPI also allow you to include detailed descriptions, instructions and other metadata which can then be used to generate easy to navigate documentation, further productionizing your component for your end users.

With all of this said, perhaps the best reason to do this is simply to force every member of the team to understand what the API does or does not do so that they can better support the component.

Version the API and Code Together

The simple truth of managing distributed systems is that you have to be able to tell two running instances of the same service apart in a predictable way. The simplest way to do this is through versioning every component and every API.

I advocate versioning the API of a service along with the code as a single unit within a source control repository like GitHub. My preference is a semantic version syntax of “major.minor.patch” (e.g. 1.2.3). I think it is important to get into the habit of incrementing the version even when the change is very minor or a simple patch. This helps ensure a level of maturity and specificity regarding exactly what is being deployed at any moment, which will come in handy if and when your organization pursues any sort of SOC certifications in the future. Every commit to the main branch of your code should result in a version change that is documented, even if it is never deployed. Transparency is the name of the game.

If You Have to Integrate, Pick Stable Versions

I have seen teams attempt to work together by creating live dependencies on each other’s development branches in GitHub. The approach assumes neither team will ever have bugs or issues, so the fastest possible implementation is one that utilizes the newest possible code. This is a flawed idea. There will be bugs and the only thing this approach will accomplish is that the teams will simply take turns delaying each other through unexpected integration issues.

Remember that each component should be considered a product unto itself. You would (hopefully) never intentionally release untested and potentially buggy code to a customer, and so you should not do so to your potential end users (other teams) here.

If you must have integration points, designate a stable branch with a known version and release it for consumption as you would any other product. Alternatively, if you are using containers, define a stable version tag to indicate latest safe images to pull (you should always have images tagged with your versions anyway, latest or otherwise). If you are integrating with another team, ask for this branch or image and only integrate with the version of the code and API represented there. If you need new functionality, ask for the changes as you would any service provider and wait until it is released.

Have a Process to Deprecate APIs or Functionality

Define and communicate how deprecated interfaces or functions will be handled. Define a window of time where you communicate the change and then follow through with the change. The first few times, you may break someone who deprioritized handling the deprecation warning; however, this kind of hiccup is good for the larger organization as it promotes positive growth and maturity. If you don’t follow through, you will create an expectation that nothing will get deprecated, and over time this can become a headache. Moreover, if eventually you do pull the trigger, the resulting changes could create more risk than anticipated because of the technology debt having piled up.

Design Your Code for Quality Validation and Testing

A team should be writing unit tests, mocked integration tests, and some post-deployment smoke tests. These tests should all be part of the component CICD process. I am of the opinion that nothing about these tasks requires (though certainly does not preclude) a dedicated tester or automation engineer. I am also of the opinion that 100% coverage, while nice, is not a necessity.

Tests are not an artifact of business value in and of themselves. You don’t sell the tests. Tests are insurance… a price you pay in the hopes of averting an even bigger cost later. Nevertheless, we should evaluate the prioritization and cost of tests just as we would any feature of the system. My general approach is to create a decomposition of the component I am designing or implementing and identify high value or high risk functional areas that require coverage. High value functional areas are parts of the system that directly drive business opportunity. High risk functional areas are those that have the potential to drive business to a halt.

With that said, before tests can be written, the code must be designed for ease of testing. In my experience, this is a non-obvious skill set for a lot of developers to learn. A few observations:

Unit tests require discernable units of functionality. If you can’t delineate those units just by looking at your code, you are probably going to have some trouble figuring out how to write a test.
Whether you are using Object Oriented, Functional, or some other or hybrid coding approach, the key to testability is modularity and low complexity of design. If you create a highly performant algorithm that recursively iterates across 20x different looping functions, you’ve created complexity. Maybe that’s a necessity for your implementation, but more often than not there is a less complex way to define the same algorithm which may trade negligible performance increases for readability and easier testing of the individual methods being called.
Pay attention when mocking dependencies. Sometimes the need for a test seems obvious despite a large number of dependent components being required. I’ve had the experience of beginning the process of mocking those components one by one until I finally realize that my method is not actually doing all that much on its own, but rather coordinating a bunch of integrated classes and libraries. At that point, you have to ask, what are you really testing? Remember that unit coverage isn’t a good excuse for a test. If it were, you could just write an “assert(true)” test for everything and be done.
If you can design an element of your functioning live code to self-validate in real-time or build-time, you eliminate the need for a test.

There are a few examples of this last point in action out there. Anyone who has ever coded in an interpreted language vs a compiled one knows that an interpreted language often requires additional testing to catch bugs that might otherwise be caught by a compiler. This is one of the clear advantages of TypeScript over straight Node JavaScript (though I admit I have not fully transitioned yet).

My Approach to REST APIs

One of my favorite ways to utilize the design concept of self-validation in code is in the REST API itself by validate all incoming requests against an OpenAPI specification. This implementation provides several benefits:

Confidence that requests coming into a service do not have unexpected properties or instructions that the implementation may have neglected to ensure the controller or business layers ignore, removing the need for a lot of negative tests.
Request bodies are trusted and mapped to data layer models without a lot of modification.
An integrating consumer will know immediately if an expectation about the API is wrong as they will get a 400 error upon the first request.
Service tests and code can use the OpenAPI specification directly to catch breaking or unexpected changes before deployment.
Updates to the specification will immediately create an inconsistency between the code and API which can serve as a TDD style requirement for development and stop deployments in the CI pipeline until resolved.

You can find my approach to this concept in my boilerplate template here:
https://github.com/theBoEffect/boilerplate

diagram of implementation with code snippets

I hope you’ve enjoyed this perspective and overview about how United Effects approaches these common questions regarding development processes to bring our product Core EOS to market. If you’d like hear more about this topic or Core EOS, visit unitedeffects.com and contact us!