Using a Proxy Microservice as Part of a Gradual Monolith Decomposition Process

Walkme Editor’s Monolith Decomposition: Part 1 — the Proxy Microservice

Yedidya Schwartz
WalkMe Engineering
11 min readSep 3, 2020

--

In the following article series I’ll write about the gradual decomposition process of our monolith to microservices. The process is still ongoing while writing these lines, but so far we have encountered some pretty interesting architectural challenges, difficulties and problems which we had to deal with.

The articles in this series are partially based on a meetup I gave in WalkMe offices at the beginning of this year. If you want a quick brief without deep diving into the issues, you can watch it here.

Introduction

I’m not going to write about the pros and cons of microservice architectures, since you can easily find tons of articles about it. I would like to share with you our specific case study, which included a Proxy Microservice solution that allowed us to gradually decompose the monolith without sacrificing in downtime, without stopping the development flow of our product, and with an option to rollback to the old architecture if needed, without any code deployment. I’ll also share a few amazing achievements we had thanks to that process.

Why did we Start the Process?

The reasons to decompose our monolith are the classic reasons:

  • Stability issues began to emerge in our product
  • Old code that has become more and more cumbersome over the years
  • A technology that we don’t want to have in our stack anymore
  • Complicated risky deployments, even if we want to fix a single code line
  • Improve testability
  • And more…

If I focus on the main achievements we wanted to gain from the process, it can be summed up with the following: horizontal scalability and CRUD logic encapsulation.

Firstly, we had to scale out instead of scale up, since as I mentioned, the Monolith suffered from stability issues.

Secondly, we didn’t want to expose the “monster” behind the Monolith. As that logic is working and should not be changed, it needs to be encapsulated and served as an API, instead of being changed each time a new feature is added.

I will detail below why exactly we had problems with these issues, and how the Monolith decomposition solved this. I hope you can benefit from the ideas I bring here for your project, or just open your mind to an interesting case study.

The Old Editor App’s Architecture

Our old .net monolith is the backend of WalkMe’s Editor App — where people can create and edit their guidance items (for example, Smart Walk-Thrus and ShoutOuts), before they are publishing them to their sites. Each guidance item is called a ‘deployable’. The Monolith, until it was started to be decomposed into microservices, was the destination of almost every request that came from the Editor’s client.

Figure 1: The architecture before we began the monolith decomposition process.

The Monolith is responsible, among other things, for the deployables CRUD functionality. Every deployable CRUD-related action is performed through the Monolith. Since those are the most performed actions in the Monolith, and we anticipated some new features that could complicate it, we decided to focus on that domain in the first stage of the Monolith decomposing.

As part of the process, even before decomposition, we made a decision to code freeze: no more new features inside the Monolith; from now on, new features will be written exclusively as microservices.

First step: Monolith Code Freeze

The new features that we wrote as microservices were Visual Designs and Translations, which are sub-entities of a deployable. I.e: a deployable may contain in his structure a Visual Design and Translations objects.

Now, in accordance with the Monolith freeze code decision, the new features were written outside of the Monolith as stand-alone microservices. The only code we added in the Monolith is a call to those new microservices.

Figure 2: Screenshot from WalkMe’s Editor App: WYSIWYG editor of a Shoutout type deployable. This deployable contains a Visual Design object (marked in red) and a Translation Object (marked in green). They are saved as JSON objects that describe the deployable’s design and its content translation to various languages.

Here is the architecture as I explained it so far, after adding the two new microservices:

Figure 3: The client sends requests to the monolith, and the monolith communicates with the two new microservices.

Second Step: Monolith Code Extraction — the Proxy Enters the Picture

After the decision of Monolith Code Freezing, we reached the stage of reducing the Monolith responsibility. We want to decrease the dependency between the Monolith and the new features so that the Monolith can be deprecated in the future.

Therefore, we decided to add to the architecture a new microservice that will constitute a Proxy between the Client and the new features (i.e: the new microservices). From now on — requests that are coming from the Client will pass through the Proxy, and the Proxy will “know” to ask the relevant microservice for the relevant deployable enrichment. After that, it will combine the results to be one object that will be returned to the Client.

This is what happens in the Proxy when the Client asks it to get a deployable:

The Proxy will fetch the basic deployable structure from the Monolith: Creation date, owner user, etc., without the enrichments of the new features (visual design, for example). From each other microservice, the Proxy will fetch the relevant enrichment. After that, the proxy has all the relevant data to build the fully deployable object; it combines all and sends it to the Client.

Now, the Monolith decomposition has become much easier: we can treat the Monolith just as another theoretical microservice (a pretty big one, so not really “micro:…) that has a responsibility of providing data to the Proxy.

This is how the architecture looks after adding the Proxy:

Figure 4: The Client communicates with the Monolith and the new microservices through the Proxy microservice. The responsibility of communicating with the new microservices has moved from the Monolith to the Proxy.

This is actually how we also made a huge step towards unting the dependency between the Monolith and the Client.

Now, let’s talk about the risk of such a wide-ranging change in architecture and how we did it safely.

“Backdoor” for Immediate Revert Without Need for Code Rollback

After finishing the setup of the new architecture, we didn’t want to direct all our customers at once to use it. We were interested in starting an experiment group so as not to hurt a lot of users in case the architecture was unstable or problematic.

Moreover, this is a significant change that can have many implications for system performance at high scales, and many times there are surprises in production. We couldn’t risk down time.

This is why we left a “backdoor” that allows us to keep using the old architecture — the one without the Proxy, where the Monolith communicates with the new microservices straightly (figure 3) — in parallel to the new architecture (figure 4).

To implement this ability, we used feature flags. This gave us the power to decide for each individual user whether it will use the new or old architecture.

This feature flag is not a regular simple feature flag, since it’s a cross-component feature flag. It actually affects a huge part of the application: it decides which architecture will be used by the user.

Figure 5: Both flows are living in production in parallel: the new flow is in gray lines and the old flow is in red.

The first decision the flow has to take by the feature flag is in the Editor Client — how should I call deployables CRUD actions? Through the Proxy using the new architecture, or send them directly to the Monolith using the old architecture?

So to start the process, we have the feature flag value in the Editor Client on its first load.

If the feature flag is turned on for a user, the Editor Client will know to send requests to the Proxy, and the Proxy will ask the Monolith only for the basic deployables structure, without the new microservices enrichments. This way, the Proxy will have the responsibility of calling to the relevant microservices for the deployables enrichments. It will combine the results and will return the full object to the Client (grey lines, figure 5).

If the feature is turned off, the Editor Client will communicate as before we had the Proxy (red lines, figure 5) — directly with the Monolith. In this case — the Monolith will fetch the enrichments from the new microservice by itself, because the Proxy is not doing that.

Practically — this is what we do to make it work: let’s take for example GetDeployable endpoint in the Monolith:

Figure 6: GetDeployable endpoint in the Monolith.

In the Monolith endpoints, we’ve added a parameter called ‘skipExternal’. If that parameter equals ‘true’, that’s a sign that the current request is coming from the Proxy (the Proxy will always call the Monolith with a ‘true’ skipExternal). This means that the Monolith doesn’t need to fetch the deployable’s enrichments — it can skip external calls to microservices; The Proxy will handle it.

If skipExternal equals ‘false’ — that’s a sign that the request wasn’t sent from the Proxy, but directly from the Editor Client. The feature flag is turned off, and the Proxy is not in the picture; The Monolith has to provide to the Client the fully enriched deployable object by fetching the relevant data from the external microservices (lines 7–8, figure 6).

This way, a feature flag that affects seemingly only the first request destination coming from the Client side at the start of the flow, can get a reference later in the flow by more “sub-flags” like ‘skipExternal’. That “sub-flag” was necessary for us to keep the Monolith ability to handle both architectures in parallel.

Looking back, I can say that this “Backdoor” saved us a few times from issues that came up in production. For example, a performance issue that immediately arose and slowed down the requests. Once noticed, we just turned off the feature flag for all users and investigated it with no pressure from production issues, until we found the bottleneck. I will write about that issue, the investigation, and the solution in the next article in this series.

Things we Gained From the New Architecture #1: Super-Easy New Features Integration

Before the Monolith decomposition process, adding new features to the deployables CRUD domain involved a lot of hassle, significant risk and many problems. For example:

  • Further development on top of the Monolith added a burden to complex project that was already complicated
  • Coordination with other teams on new features in the Monolith, so that they know the new code and will be careful not to damage it.
  • Using the .net technology we strive to no longer use.

After the new architecture implementation, a new deployables enrichment feature was developed as a microservice in a team of our R&D department from another country. All it took to integrate it with the deployables CRUD mechanism was configuring a few values in the Proxy microservice and adding a few code lines in it.

Actually, the team who developed the new microservice sent us a Proxy microservice pull-request, and that’s it. The process was fast and easy. The new feature adding process was a kind of “black box” for us, since it required a minimal effort and risk.

Just imagine how many tweaks, bugs and complications we could experience if the new feature was required to be written in the Monolith.

Figure 7: The easily-integrated new microservice

Today we have a significant amount of microservices that interfaces with the Proxy microservice. From that aspect, the change has totally proven itself.

Things we Gained From the New Architecture #2: Flexible and Dedicated Scale Out of Every Component in Architecture

In the old architecture, in case we noticed that additional resources were needed, we had to scale out the entire Monolith. Back then, turning the Monolith to be multi-instance was not even possible, since it was stateful, so we had to hold only one instance of it to not damage existing features.

As a result, we scaled up our single instance instead of scaling it out, so we had a single super-machine that held our Monolith. It was a fragile situation that endangered us in potential downtime in case the server crashed.

The ironic thing about that situation was the fact that most of the actions that were performed on the Monolith were deployables CRUD related; actions which are stateless.

These are the actions that consume the most Monolith server memory.

The stateful actions are not related to deployables CRUD, they occur very infrequently, consuming almost no resources — and they were the only reason we could not scale out the Monolith!

Now, after the architecture change, the Proxy uses the Monolith only for deployables CRUD actions — which are stateless actions. The Editor Client uses the Monolith by direct calls, not through the Proxy, for the rest of the actions — among the rest, stateful actions.

To take advantage of this separation, we created a group of Monolith instances that will be used only by the Proxy, for deployables CRUD actions. The single Monolith instance will be used by the Editor Client for stateful actions, as has been done so far.

Figure 8: The full picture: scaled out architecture. Monolith’s single instance for the stateful actions communicates with the Editor Client directly. The multi-instanced Monolith for the stateless deployable CRUD actions communicates with the Proxy microservice.

Monolith’s single instance job is also a fallback option for the old architecture if the feature flag is turned off, as described above.

(Before starting the Monolith decomposition process, we came up with a quick Redis pub/sub mechanism solution that solved the described stateful server problem, and therefore enabled the Monolith to scale out. You can find more about that process in one of my previous articles.

Bottom line — we have gained the ability to raise instances for the Monolith, as well as for each component in the architecture according to its needs.

Summary

These were some of the highlights we experienced in a gradual and challenging process.

In this article I focused on the topic of decomposing the deployables CRUD and enrichments from the Monolith. The Monolith, as mentioned, still has a lot of other responsibilities like user settings, deployables publish and preview, and much more. These components are the next bumps in our journey towards the Monolith extinction.

To sum up, remember the two important decomposition steps that will help you make sure you are not doing things too fast: Monolith code freezing first, code extraction later. And in other words: stop increasing before splitting.

The Proxy microservice which we implemented is the practice we chose for the gradual decomposition of the Monolith, while enabling a “backdoor” to have an option of a quick architecture fallback without the need of code rollback.

Finally, I gave two examples of real value we gained by this move: improved feature integration process and horizontal scalability, that was one of the main goals of the process.

In the next article of “Walkme Editor’s Monolith Decomposition” series I’m planning to write about a bottleneck we had in the Proxy microservice: how we discovered the source of the problem and solved it. Stay tuned.

Thanks to Mary Gofer.

--

--

Yedidya Schwartz
WalkMe Engineering

Backend Tech Lead | DevOps | AWS Community Builder | AWS Solution Architect