Scaling Applications Part 2 — Scaling the Monolith Web Application

Hughie Coles
The Startup
Published in
5 min readMay 5, 2020
Photo by Sean Pollock on Unsplash

This is the 2nd in a ~7 part series on how to scale applications. The series will cover common scaling patterns for applications, as well as their pros, cons, and caveats. *Note — This structure may change as the series goes along.

Setup

The example I’m going to use is a typical monolithic web application with a single monolithic database. This application will consist of an application layer (think controllers, services, business logic) and a database. The details of layers don’t matter for the purposes of this article. I’ve also ignored the UI layer since it’s not relevant to scaling in this way. This is a starting point for the vast majority of modern systems, and the place I always suggest starting. One thing to keep in mind when designing applications is that it’s better to react to problems rather than to prematurely optimize for a problem that may not occur. In most cases, it’s also better to start with a simple architecture and spend your efforts on features rather than messing around with infrastructure and communication structures.

This type of application will typically have a web server sitting in front on it. The web server will handle requests, pass them on to the web application, and send the response back to the browser.

Problem

As the load on your application grows, the first performance issue that you’ll likely see is that your application will start taking longer to process requests due to the volume of requests coming in.

Your web application can only handle so many requests per second, and anything above that will have to wait for a currently handled request to finish.

Solution

At some point, you’ll have to face the fact that a single instance of your application just can’t handle the number of requests you’re getting. In this case, the simplest way to allow your application to handle more load is to add more instances of your web app. The way to do this is to direct traffic to a load balancer instead of to your application. The load balancer can then distribute the load between the multiple instances of your web app. These applications all interact with the same single instance of your database (for now). This is a simple example of scaling horizontally.

Alternative Solutions

The first thing to do in this situation is performance analysis. This can illuminate performance issues in the code. If you can optimize slow spots to improve application performance, that will allow you to get more out of your current architecture. Scaling out your application will cost money, time, and complicate your architecture. It’s best to delay this as much as possible by making sure that your code is sufficiently fast. Those wins will also continue to compound when you scale out.

Another alternative is to scale vertically. This means to upgrade your server to one that has more/faster processor(s) and/or more memory. Depending on the size and aptitude of your team, this might be the best option. It allows you to keep the exact same architecture. Vertical scaling costs more in infrastructure, but less in upkeep and effort. There is also a limit to how much you can scale vertically, and the cost begins to get prohibitive since a very high-performance server costs more than many commodity servers.

Pros

This approach is very simple and time-tested. You simply deploy multiple versions of your application, all with different internal IPs. You deploy a load balancer (nginx is very popular), point traffic directly to the load balancer, and allow the load balancer to distribute the traffic among your different application instances. As a bonus, this architecture also makes your application more fault-tolerant. If one instance of your application goes down, you’ve got others handling requests. It also allows you to scale dynamically, spinning up new instances when load spikes, and deleting them when the load returns to average. Vertical scaling

Cons

This takes a bit of know-how, and makes everything a bit more complicated. To get an accurate picture of errors, logs across all machines have to be aggregated.

Considerations

For this to work, your application should be stateless. If you store information anywhere but the database or a shared cache, then users will have to be pinned to a specific server, and the whole thing becomes a mess.

This pattern only works when your application is CPU-bound (the requests are waiting on your calculations) or I/O Bound (specifically disk-bound or network-bound, waiting for the application to write to or read from disc or hit another service). If your performance issues stem from the database load, then this pattern will not work, in fact it will just make things worse since it will allow more connections to an already overburdened database. In this case, you’ll have to look at the next part of the series covering read-write replication. Also keep in mind that as your application scales out with multiple application instances, the database will eventually become the problem anyways(which is why that’s the next installment in the series). As the number of requests grow, the number of database queries will grow and your poor database will eventually be overwhelmed.

Summary

Scaling your application out horizontally into multiple instances is the first logical step to scaling in the face of increased load. This allows you to leave your application code essentially untouched (unless you’ve implemented it in a stateful way), and pushes all of the additional complexity out to infrastructure. It also adds redundancy and failure tolerance.

--

--

Hughie Coles
The Startup

I’m an EM, full-stack developer, speaker, mentor, and writer. I blog about software development, software architecture, leadership, and culture. @hughiecoles