9 Key Strategies and Best Practices for Scaling Modern Systems

9 min readApr 4, 2024

This article will review the strategies required to build a highly scalable system. Here, we will cover the best practices that we have to follow in the Infrastructure, backend, database, and frontend side.

I have kept notice of every minute detail to develop the highly scalable system so that it will perform well under heavy load.

What does a scalable system mean?

A scalable system means the ability of the system to handle increasing workloads or data without sacrificing performance. Scalability is the key component in the modern system to ensure that the system is highly available where there is an increased no of traffic.

Types of Scalability

Vertical Scalability: Also known as scaling up, this involves increasing the capacity of a single server or resource to handle more workload. For example, adding more CPU power, RAM, or storage to a server.
Horizontal Scalability: Also known as scaling out, this involves adding more instances of servers or resources to distribute the workload across multiple machines. It involves creating a cluster of servers that work together to handle incoming requests.

Strategies and Technologies for Building Highly Scalable Systems

Now we will discuss a couple of strategies in detail so that we can get benefits using simple modifications either in the code, database, or on the infrastructure level so that we will reduce the overall cost to handle the increasing workload.

1. Event-Driven Messaging Systems (Asynchronous Communication):

In this approach, we will first figure out the critical and non-critical tasks and we will perform the non-critical tasks asynchronously because the non-critical tasks don’t need the response immediately e.g. sending the email to the clients is the non-critical tasks.

So, Queues within these systems play a vital role in performing the operations asynchronously. In this approach, we will buffer the data or messages in the queues to handle the burst of workload.

So, with this approach, we don’t need to add more resources to our existing infrastructure immediately to accommodate the increasing load but we will add these messages to the queue and then accommodate these messages when the receiver is free.

2. Containerization and Microservices: Decoupling for Scalability

Microservices provide a vital role in decoupling the system. In containerization we divide the application into modules that are called containers and the greatest benefit of this approach is each container can be packaged, deployed, and scaled independently.

Example: We are developing the Airbnb application, and we can create the modules for each requirement e.g. There is a separate module for searching for property, reviewing the seller, and booking the property.

In case the application receives a large volume of requests to search the properties then we can auto-scale the module of property searching only without scaling the whole application and if the property searching module crashes due to some error the other part of the application will work fine only the property searching module will be affected.

3. Load Balancing and Autoscaling: Ensuring High Availability and Performance

Auto-scaling is the horizontal type of scaling in which we increase the number of resources when receiving a large volume of requests. So, instead of increasing the specifications of the current server, we try to add more resources to handle increased request volumes efficiently.

This horizontal scaling provides more advantages E.g. if one server crashes the other server will be launched and take the place of the crashed server and if the volume of the requests is decreased then we can reduce the server units to reduce the overall cost of infrastructure. So, it is very crucial to scale your infrastructure to accommodate new requests.

4. Socket Management: Improve Performance with Cached Connections

We all used the sockets most of the time to create the real-time chat module inside our application. So, we used the given code to create the connection on the backend server.

const express = require('express');
const http = require('http');
const socketIo = require('socket.io');

const app = express();
const server = http.createServer(app);
const io = socketIo(server);

io.on('connection', (socket) => {
    console.log('Client connected');

    // Handle incoming messages from the client
    socket.on('message', (message) => {
        console.log('Received message from client:', message);

        // Broadcast the message to all connected clients
        io.emit('message', message);
    });

    // Handle client disconnect
    socket.on('disconnect', () => {
        console.log('Client disconnected');
    });
});

const PORT = 3000;
server.listen(PORT, () => {
    console.log(`Server started and listening on port ${PORT}`);
});

However, the above code is not scalable. This is because when we have a lot of traffic and we need to add more servers to horizontally scale the server then our socket connection with the client will be lost.

This happens because the connection data between the server and client is stored on the server and when we add more servers to handle other requests then either we need to reinitiate the connection with that server or the other good solution is to store the connection in a separate place and get the data of connection from there.

So, here I have a much cleaner solution to accommodate this problem. Here I use the Redis cache adaptor to store the connection data in the cache and the second time I receive the request I will get the connection data from the cache. Redis is just the cache where we can store the data temporarily in the RAM so that we can access that data very fast.

const express = require('express');
const http = require('http');
const socketIo = require('socket.io');
const redisAdapter = require('socket.io-redis');
const redis = require('redis');

const app = express();
const server = http.createServer(app);
const io = socketIo(server);

// Configure Redis adapter
const redisClient = redis.createClient();
io.adapter(redisAdapter({ pubClient: redisClient, subClient: redisClient }));

io.on('connection', (socket) => {
    console.log('Client connected');

    // Handle incoming messages from the client
    socket.on('message', (message) => {
        console.log('Received message from client:', message);

        // Broadcast the message to all connected clients
        io.emit('message', message);
    });

    // Handle client disconnect
    socket.on('disconnect', () => {
        console.log('Client disconnected');
    });
});

const PORT = 3000;
server.listen(PORT, () => {
    console.log(`Server started and listening on port ${PORT}`);
});

5. Scalable Data Models in NoSQL Databases

Designing scalable models, especially in NoSQL databases is very crucial because this can become the bottleneck of your application performance after some time if you have not created the scalable schema.

Here I want to quote the example of a non-scalable schema and then I will improve that schema to make it scalable.

Non-Scalable Schema

So, in this schema, if the no of posts for the user increases the size of the document will be increased image the user has 1000 posts. For sure the size of the document might exceed the maximum size of one document in MongoDB i.e. 16 MB

// User document with embedded blog posts
const userSchema = new mongoose.Schema({
    name: String,
    email: String,
    posts: [{
        title: String,
        content: String,
        comments: [{
            text: String,
            user: {
                type: mongoose.Schema.Types.ObjectId,
                ref: 'User'
            }
        }]
    }]
});

Scalable MongoDB Schema

Here I created another collection to hold the data of posts and use the user ID as a foreign key for reference. Now, if the no of posts for the user increases this will not affect the size of the user document.

// User document with separate blog posts collection and references
const userSchema = new mongoose.Schema({
    name: String,
    email: String,
});

const postSchema = new mongoose.Schema({
    title: String,
    content: String,
    user: {
        type: mongoose.Schema.Types.ObjectId,
        ref: 'User'
    },
    comments: [{
        text: String,
        user: {
            type: mongoose.Schema.Types.ObjectId,
            ref: 'User'
        }
    }]
});

6. File Upload Operations

Almost in every project, we uploaded the images to the server to store them. Now we will talk about how to scale this operation so that if we have a large volume of file upload operations this would not affect the overall performance of the application.

Related Article: How to send a large file from server to client

Use Streams To Upload File

So, the best way to upload the file is to upload using the streams. Instead of uploading the entire file, we will divide the file into smaller chunks and then upload them to the server one by one. This would lower the RAM usage and also reduce the overall cost of your infrastructure.

File Storage Options

You can also choose an appropriate file storage solution based on your scalability requirements. For small applications, local file storage might be a good option as it will reduce your storage cost.

However, for larger applications, consider using cloud storage services like Amazon S3, and Google Cloud Storage. These services offer scalability across multiple servers

Apart from this above solution you also need to consider the file size because if the file sending from the client size is too large this could result in crashing your system. So, always add proper validations on the size and extensions of the file.

7. Caching Strategies To Reduce Latency

Caching is the technique to store the frequently accessed data so that we can reduce the cost of DB queries. It not only saves the cost of the database but also increases the response time of API calls and increases the performance of the application.

Redis and Memcache are the most widely used options for caching the data. But it depends on your requirements. However, Redis supports large no of data types on the other side Memcache is a slightly cost-optimised option but it comes with limited data types.

Example: You are working on the E-commerce application and here you can store the data of the products in the cache. So, that you can send this data without querying the data from the database. You can invalidate the cache when you. This would increase the response time because the database operations are slow as compared to cache because cache stores the data in the RAM.

8. Database partitioning and sharding techniques

Database partitioning involves dividing a large database into smaller, more manageable segments called partitions. Each partition contains a subset of data based on specific criteria, such as range partitioning (based on values like dates), or hash partitioning (based on a hash function).

The major benefit of partitioning is it will improve query performance by allowing parallel processing of queries on different partitions simultaneously.

It also helps in efficient data management, especially for large datasets, by dividing data into smaller chunks that can be managed and accessed more effectively.

Sharding:

It is the distributed database systems or architectures that horizontally scale data across multiple nodes on different databases. (horizontal scaling, is good for massive scalability, and is fault-tolerant if one partition fails and does not affect the other )
MySQL supports only the partitions but we can do this using a manual approach but requires careful planning. However, NoSQL databases like MongoDB provide this by default.

Partitioning:

Partitioning is the logical separation and organization of data within the same database. It is more relevant to vertical scaling but has limited benefits because the database is hosted on a single instance.

9. Database Replication For High Availability

Database replication is a technique in which we create multiple copies of a database across multiple servers or locations. The goal of this replication is to enhance data availability and fault tolerance. This point is not related to scalability but very crucial for highly available systems.

High Availability: Replication provides high availability of data because data is replicated across multiple servers. So, if data on one server is destroyed it will remain available on the other servers.

Improved Performance: Replication can enhance query performance by allowing read operations to be distributed among replica servers. This also reduces the load on the primary server and improves overall system responsiveness.

In summary, making a system scalable involves using different tools and techniques. By using things like messaging systems, containerization, efficient data handling methods, optimized APIs, and good infrastructure practices, we can create systems that can handle more work without breaking. These strategies also help us plan for the future and keep up with changes in technology and business.

If you enjoyed this article, please feel free to like, share, and subscribe! Your support is greatly appreciated.