Software Design by Example: Scaling

Published in

The Startup

10 min readSep 1, 2020

This article is part of a multi-post series covering Software Design. In the previous post, we outlined a high-level architecture of our example platform. In today’s article, we are going to explore scaling that high-level architecture to meet new demands.

Since this article is “by example,” It is worth calling out that our scaling problems are fake. They will be in areas we expect. Whereas in the real world, scaling issues show up in the most unexpected ways.

For this article, we will cover scaling the architecture into what I would call the second phase of our design.

Scale when we need to

One of the rules of software design is “don’t over-optimize.” It is sometimes a hard rule to follow because most engineers when they see an issue or a potential issue, they want to solve it. It’s part of their natural wiring.

Scaling is one of those areas where it is easy to find yourself optimizing too early. It’s often difficult to know what the right level of scale to begin with is. Being able to determine the correct size to start traffic with, is a skill that takes experience to build.

In this section, we are going to call out different areas of the platform and the ways we can scale to meet demand.

Heavy Calendar API usage

On the first few days of our product launch, we saw a lot of user registration and profile setup. The first reaction to this would be to add more web application instances to handle the load. However, as our initial launch traffic started to wane, we see a different issue.

If we look at our high-level architecture, we can see that the calendar API services both web and backend traffic. That means, our web application is getting hit with both web users using the calendar via the web UI, as well as the backend system updating the calendar API.

With all the API activity, our entire Web application is starting to slow down. Including user sign up pages that have nothing to do with the calendar API. The reason for this is because we built our web application as a macro service. It’s handling all web and calendar functionality.

So what can we do?

We can split the web application by function. We do this by moving the calendar API into a stand-alone service — a service with a dedicated database.

By splitting the calendar API off into a single service, we can scale it independently. We can either scale up by increasing the CPU & memory of the Calendar API service. Or we can scale out by increasing the number of service instances.

The key is, the calendar functionality is 100% stand-alone now. When the usage of the calendar increases, we can focus our scaling strategies at that point.

With breaking out calendar functionality, we should also re-assess our technology choices. What fit the use of our user profiles, may not fit the need of our calendar. With our services as one macro service, it would be essential to keep the DB somewhat simple.

However, now that we’ve split our databases into User and Calendar, there is more freedom. If we wanted to select a database that fits the calendar usage model better, we can.

Luckily, making database changes to our Calendar API is relatively simple. Sure it means work on the Calendar API code. But it doesn’t mean changes anywhere outside of the Calendar API service.

When we were coming up with the foundational design, we made a fundamental choice. We ensured that all calendar activity goes through the API. While this was the right choice, it wasn’t necessarily the natural choice.

In our design, we have a process called Task Publisher. Its job is to identify calendars that need to be re-synced via background processing. As it stands, the Task Publisher calls the Calendar API. But it would have been simple and more performant to have the Task Publisher read the Calendar database directly. It would have also been simple to have the Worker service update the database itself.

But we didn’t do that; we didn’t take the easy path. By not taking the easy way, we have saved ourselves a lot of headaches later. Because a change to the Calendar API’s database structure would result in a lot of work, but now, we only have to change the Calendar API service itself. The rest of our architecture is ignorant of the changes.

The Task Publisher and Worker can also be unaware of our plan to split the Calendar API out. Today these services talk to the Web app for all calendar API functionality. Tomorrow, they will speak with the Calendar API service directly. But to make that change, it’s a simple URL configuration, not a code change.

Design for change

The choices we made around the design of the calendar API, like forcing all communication through the API. Is putting into practice the architecture principle of “Designing for change.”

All architectures change over time. By assuming this to be accurate, we can often find opportunities to optimize our design to allow for faster change later.

Like our Calendar API example, it is common to have a few services that perform tasks against the same set of data. In our case, we have scheduling, backend processing, and CRUD operations. These are common tasks that often get split amongst many services.

It is also prevalent to share databases between these services. But this is a fundamental mistake as these services are now tightly coupled. A change to the database structure means a change amongst all these services.

On the flip side, by forcing one service to own all operations against the database. The other services can be left unchanged during any database structure changes. Yes, this means more effort in the API design and creation. But by keeping in mind that all things change, and by paying that effort early. When our architecture does need to change, it can.

Note: While this concept of designing for change has saved us work within our small platform. This concept as a whole is even more true amongst major platforms. If the thought of changing three services sounds bad due to a database change. Imagine trying to do this in a large enterprise setting. Where you must coordinate these changes across three different teams.

Backend Processing is taking too long.

After splitting the Calendar API into a single service, our scaling and performance issues have changed. Now we are noticing our backend processing is taking too long.

Having scaling issues move around is quite standard. Once one bottleneck in a process gets fixed, the next bottleneck will surface. With our API performing better, we are now seeing that our backend can’t keep up with the increase in demand.

Specifically, the problem we are seeing is that the tasks pushed to our Message Queue are backing up. Luckily, we’ve planned for this.

Message Queuing and Workers

Publish and Subscribe in the Backend Processing

If we look at the design of our backend processing, it is easy to recognize as it is a widespread pattern. In our model, we push tasks to a Message Queuing service, which distributes those tasks out across many worker processes.

This design is the classic Publish/Subscribe pattern, and it is straightforward to put in place. But it is also easy to scale and deal with resiliency.

Scale, because the Message Queuing service does the job of load distribution for us. Resiliency, because messages stay in the queue until a service reads them off.

In our design, we have a Task Publisher; this publisher is creating tasks for our Workers. Every task gets sent to a subscribing Worker process. Since our Workers subscribe to the queue, we can add new Workers with minimal effort.

Once a new Workers added, the Message Queuing platform will start distributing tasks to it. This distribution is also triggered by the subscribers (Workers) reading from the queue. This method is different from load distribution techniques for other protocols like HTTP.

In HTTP, when you load balance traffic across applications. The load balancer will send the transactions across many instances. Because these requests are real-time, we must be able to handle the load, or the HTTP requests will timeout.

With Message Queuing, even if our workers take a long time, the task never times out. It stays in the queue until a new Worker can process it. This design makes it very easy to deal with scaling issues.

Even if the process takes longer than expected, it still happens. If the backlog of messages is too much, we can add new Workers. As we add new Workers, it means the more tasks we can handle at the same time.

The benefit of this approach is that when we hit scaling issues, it is a lot easier to handle. You can add new Workers and make an impact. In comparison, this issue for real-time requests may not always be solved by adding new instances. Sometimes, it requires tuning the application.

Scaling Publishing

With our message queue backlog fixed by scaling out the number of workers we have. We now see a new bottleneck. We are not able to publish fast enough.

Our product has been so successful that our simple Task Publisher is taking a very long time to create tasks so long that each scheduled task is overrunning beyond the next scheduled run.

After investigating, we can see two problems. The first is that when the Task Publisher is running, it is using a lot of CPU. This problem is solved very easily by scaling up.

Scale-Up vs. Scale-Out

In previous scaling solutions, we’ve always increased the number of instances/services. This process is scaling out. It is a common approach and today’s preferred scaling pattern. The alternative to scaling out is scaling up.

Where scale-out adds more workers/instances/services, Scale-up, adds more resources to existing instances, whether it’s CPU, Memory, or Disk. The idea is that while most problems get solved by distributing work. Some problems can be solved by giving more resources.

After increasing the available CPU for our Task Publisher, we can see some improvement. But our job is still taking a long time to run. When we look deeper, we can see that the CPU is fine. But the problem is while fetching the list of calendars.

As we looked into the issue, we can see that even though we are limiting the number of calendars returned by the API. The act of fetching all the calendars against the Calendar database is taking a long time.

Now, this doesn’t mean all Calendar database operations are slow. In fact, for our standard API usage, our Calendar database is exceeding our needs. The problem is when trying to fetch such a large number of items at once.

So, what ways do we have to improve our time of pulling this giant list without making massive overhauls to our API?

One of the most straightforward answers is in-memory caching.

Caching Read-Only Data

The go-to scaling solution that I see Engineers use. Is caching. Now that doesn’t mean it’s always the right solution. But it is prevalent because it is relatively simple, and it often yields favorable results.

In our case, we have a lot of data that needs to be accessed quickly. This data is read in high volumes, but not necessarily written in high quantities. What this means is we will write the information somewhat in-frequently, but we will read the data over and over again.

This case is a good use case for in-memory caching. Suppose our API wrote data frequently and read it in-frequently. That would be a different story.

The reason in-memory caches make things faster is simple. Data in memory is faster to read than data on the disk. Depending on the backend database, an in-memory caching service can drastically improve reads.

Even though reads are often improved, we must consider how often the same data gets read. It’s one thing to make reads faster, but the value isn’t to make one information read more quickly. The benefit is where we can make it faster to access the same data over and over again.

In our case, we need to access the same data repeatedly. Once we add the in-memory cache, we can see our Task Publisher time cut in half. Whereas before, it would take a long time to fetch calendars. Now that data is quickly accessible, allowing the Task Publisher to process faster.

Summary

In this post, we have outlined several bottlenecks in our original design. We also explored different approaches to scaling.

When our web traffic was being slowed down by one specific usage, we explored the concept of isolating functionality, which is one of the core concepts behind microservices. This design made our calendar functionality independently scalable.

With our backend processing, we can see the all too common publish and subscribe pattern. We talked about how easy it is to scale out this pattern.

We also talked about the difference between Scale-Up vs. Scale-Out. Not everything needs to scale out, sometimes scale-up is perfectly fine.

The last design concept we talked about is in-memory caching. When does it make sense, and when doesn’t it.

At this point, our design looks quite a bit different. In the next post, it’s going to change even more. As we will be exploring what changes we need to make to achieve our desired availability needs. These changes include designing our system to run out of multiple availability zones.