Design to Scale (Part 1 — Pitfalls of Monolithic Application)

Vikas Thareja
Xebia Engineering Blog
7 min readDec 10, 2019

In this series, we will discuss the pitfalls of monolithic application and how modern architecture and optionally cloud migration can solve some of the issues.

The series will focus more on WHAT and WHY and less on HOW part. We will cover topics like Why Docker (or container), Kubernetes (or orchestrator), Event Hub (or Message bus) and Why migrating the architecture to cloud (if at all).

The reason to focus on WHY instead of HOW is because:

a) There have been a lot of discussion in various enterprises directly on implementation. For e.g. a discussion like “let’s go to cloud within two years or else we will be irrelevant.’’ Well, this all sounds great but WHY exactly and WHAT do we even mean by going to cloud?

b) There is already a lot of documentation available on HOW part. We will also add a few links as we go which can help on getting started or deep dive on the modern technologies and off-course you can find a lot more through Google!

In this part, we will take a sample Monolithic application and discuss some design limitations.

In the second and final part we will try to find out the solution to these problems. It is important to notice that this is a sub-set example and real life applications have many more flows and complexity and accordingly the design decisions are considered.

So Let’s get going.

Context

With the evolution of internet and adoption of mobile devices, the IT development world has changed dramatically within the last decade. There was a time when it was considered a great success to implement complex business functionalities in a monolithic app and deploy to production over the weekend releases.

The issues related to scalability, maintenance, release frequency and production stability were assumed obvious with a large application and solved in the possible measures available at the time.

For midsize or large organizations, these issues mean high cost to run, extend and scale applications, or in other words directly or indirectly higher cost to customer ratio. And if more cost is going into running the apps then it means less money available for building the organization further, executing the research projects and most importantly less money available for the employee appraisal and bonus. Well we don’t like that, do we?

Also, the way technology is evolving the relevance of businesses depends on how they can scale the customer base while regulating the cost. If we look at the trends over the past decade, it is evident that the organizations which have managed to scale the customer base without proportionally increasing the cost have thrived & become leaders in their space. Since overall customer base in an industry is a zero-sum game, while one organization is thriving then unfortunately few others are perishing as well.

So, as it was important to adopt IT in early 2000s, it is now imperative to be cost effective in using your IT systems. Being cost-effective needs prudence across the life-cycle — Development, release pipeline, extending features, scaling and maintenance. And what can help here, yes you guessed it right — our ability to Design for Scale!!

The problem Scenario

Now, let’s take a concrete example of a monolithic application and try to understand the problems and underlying cost impact. Say, there is an eCommerce application and we are looking at payments and account management services of the app.

In this scenario, user is connecting to a payment form to deposit / withdraw funds over the web interface. After applying standard security and validations, the UI connects to a payments service which talks to swift gateway to initiate the movement.

Swift is used for payments handling and you can read more about it here.

To keep this scenario simple, after confirmation from swift gateway the payments service connects with Accounts service, which handles cash balance and currency conversions. Real-life scenario is more complex where payments and accounts services can potentially talk to hundreds of components.

Both payments and Accounts service connect to a central DB and after confirmation from accounts service, payments service routes back the confirmation to the user.

This looks good, the feature is delivered and it’s working fine. What’s wrong here then, let’s see:

1. High coupling & low extensibility

Payments service has high dependency on accounts service and the business process can complete in synchronous fashion only when both the services have processed the request.

What if accounts service is relatively slow or momentarily down?

Yes, you would have guessed right, it will clog the whole process and you can start getting production alerts / incidents and you are just wondering “how I immediately fix this if account service is relatively slower than payments service?” and there is too much of load on the system right now!!

Well there is only one solution, spawn the system on more nodes, so to scale accounts service we end up scaling the whole system, this means more money spent than needed.

What if there is a new developer / team for account service who isn’t aware about the tight coupling between services and enhance a feature in accounts service which indirectly breaks the account service?

Well, in a two-service example we can argue that developers need to search dependencies across the system but as the system grows this becomes a monstrous problem to solve. Many developers are apprehensive (if not scared) to extend the parts of a legacy system because they are not sure about the overall impact area.

How nice it would be if the teams can independently work on their service without being too worried about the impact on other services in the system. Won’t it improve productivity for the organization? Doesn’t it also mean less learning curve for new team members?

2. Scalability

Let’s say that during the festival season, this organization puts a discount offer which dramatically increases the active customers for a week, increasing the load by 5–10 times. Now, CTO is asking the IT team to devise a cost-effective solution to handle this issue.

To make this problem a bit more interesting, imagine there are hundreds of similar app in the overall offering of this organization and all of them need scalability in peak season. These all apps are deployed on individual server in internal data center for desired segregation and performance.

Team is challenged by following, while contemplating to solve this problem:

a. Server capacity

As we stated, each server is dedicatedly running an app. For payment and account scenario say these are on a server which normally runs at 40% load. But during peak-season when load is 10x, clients are getting time-outs unless instead of one total four servers are procured, which will still be running at 100% during peak load.

Is it efficient, what do we do after 1 week of peak load time, waste all this processing power?

b. Service capacity

Another important problem here is the performance of each service. We will publish a separate article on improving the performance but for the sake of this example let’s assume that accounts service is slower than the payments service, which causes unwanted delay and bad user experience. This doesn’t solve by even adding more servers because the services are tightly coupled while serving the business scenario.

c. DB as bottleneck

This is the most critical one. Design of most monolithic application is based on a single database and it is assumed that while you can potentially scale out the servers you can scale up the DB to meet the scalability goals. This is true to an extent and beyond that DB becomes a big bottleneck. Say, in our example customer base goes to 100 times, then you can scale the application by buying 100 servers (though inefficient but it will work) but what about DB? How do we solve this challenge?

3. Release downtime & CI / CD

Considering coupling, complete system can be released together after thorough testing in an integration environment. This means downtime during release and slower time to market.

What if business wants to roll out a new payment feature very fast?

The application development team still must run the complete test suite to ensure there is no regression impact. If not, you may get an unexpected issue in accounts service because of this change and team members blaming each other, while what is wrong is the process and architecture here!

4. Limitations

a. Not available on mobile device

Most of the customers want to use the application on their mobile device. Team is contemplating, if a new application should be built from scratch?

They know that the features are similar but what changes is UI and device, shouldn’t we have a better solution here?

b. Can’t leverage out of the box analytics / cognitive services

Management is open to shell out extra dollars to leverage some out of the box cognitive services available on cloud platform, but the IT teams are telling them it’s not possible unless the whole solution is deployed on cloud.

Really?

Can the modern architecture solve above issues? Is it mandatory to move to cloud to fix all these problems?

Let’s see in Part 2 of this series.

--

--

Vikas Thareja
Xebia Engineering Blog

Enterprise Architect. Passionate about effective use of correct technology practices to generate optimal value for the businesses.