How to troubleshoot slowdowns in high data traffic APIs

Rômulo Pauliv
9 min readApr 26, 2023

--

Suppose you have been tasked with developing an API that provides data to users or other services. An elementary approach would be to implement an API that supports HTTP methods (GET, POST, PUT, and DELETE), interfacing with a database.

This architecture is not feasible when we need a high-performance and scalable application. If there is high traffic on the API, it may experience congestion in its services and concurrency issues when accessing the database, resulting in slow access for users and other services.

Database Structure

Methods for searching and organizing data in the database will not be discussed, but a common strategy for increasing scalability and distributing access to services is horizontal scalability.

In the image above, it’s possible to notice the addition of two replicas of the main database. Note that in the API structure, the GET method has a unique external service and connection, while the other methods still connect to the main database.

The GET method connects to a load balancer (such as NGINX), in order to split traffic between the two read databases. This way, if replica A is congested with multiple requests, the load balancer will start redirecting requests to replica B.

This is a simple yet very effective technique for solving issues related to slow data reading requests and database connection concurrency.

Caching

In addition to adopting horizontal scalability, it’s possible to implement caching in the data reading structure of the API. Before accessing the load balancer, the API can be programmed to check if the data is already stored in the cache database.

The conditional treatment of reading between the database and cache should be done in the API structure. The fundamental goal of caching is to make future requests benefit from a faster API response time, avoiding the need to access the database again.

But in order for this benefit to occur for a future request, it is crucial that the previous request, which did not have this privilege, is responsible for triggering the activation in the cache writing system.

In the above image, the first request asks for files X and Y. After receiving the request, the API checks the cache for the existence of the files. Since they don’t exist, the API is forced to fetch the files from the database.

In the second request, files X and Z are requested. After receiving the request, the API checks the cache and identifies the existence of the file X. Thus, the API only fetches file Z from the database.

Note in the above image that Δt is much greater than Δt’. We can see that there is a significant benefit in the API’s response time with the use of the cache.

The area where Thread 2 is indicated is where the writing of the files that were not in the cache is being executed. The API should be programmed is such a way that it does not interfere with the client’s response time with the task of writing the data to the cache. Therefore, the response and cache writing should be performed on separate threads and executed simultaneously.

L2 Cache

The level 2 cache (or L2 cache) is a caching technique used to improve data access performance in systems that already have a primary cache (L1 cache) as we implemented above.

In general, the L1 cache is stored in the system’s RAM, while the L2 cache is a disk cache, often used in conjunction with a database. The L2 cache is responsible for storing data at an intermediate level between RAM and persistent storage, such as a hard drive or SSD. This allows data to be retrieved more quickly than if it were accessed directly from persistent storage, but still slower than if it were in the L1 cache.

For example, when a client makes a request, the API first checks if the data is available in the L1 cache, and if it’s not, checks if it’s in the L2 cache. If it’s still not available, the API requests the data from the replicas database and stores it in both caches for future access.

In distributed database systems, the L2 cache can be shared among multiple nodes in the cluster, allowing data to be cached in a distributed manner and further reducing the traffic of access to the main database.

Above, we have information about L1 and L2 caches. To implement caches in our structure, we need to configure and verify the following information:

Storage: The storage capacity we will provide to the cache. As in the L1 cache, we are using the server’s own RAM, it should be a fraction of the server’s RAM. Otherwise, the system may be entirely affected.

TTL (Time-To-Live): The maximum time a resource should be kept in the cache before being considered obsolete and discarded.

RTT (Round-Trip Time): The total time a request takes to go from the source to the destination and back, including the processing times of each party involved in the communication.

In the first case, the API requests the L1 and L2 caches, but is forced to retrieve the file from the database. After receiving file X, in a second thread, the API writes the file to the L1 and L2 caches.

In the second case, the API requests file X. In this case, the file is cached in the L2 cache. After the API receives file X, in a second thread, it performs the task of writing the file to the L1 cache. This optimizes short-term access to the file.

In the last case, the file is cached in the L1 cache, responding almost instantaneously to the API.

Connection Pools

It is a technique for managing and reusing database connections in applications. Instead of opening and closing individual connections for each transaction, the connection pool maintains a series of pre-opened connections ready for use. This reduces the connection and disconnection time, as well as the number of connections that need to be established with the database, which can significantly improve API performance.

We can implement connection pooling (in the case of replicas) after the load balancer layer. This way, in addition to managing the traffic of accesses that require reading from the database, we will have the advantage of having pre-established connections, speeding up access to the database.

It is also possible to implement a connection pool with the main database, for writing and modifying data.

Message Queue

We already have a prototype of our solution for data reading, but for writing, more efficient measures need to be adopted. One of the problems with maintaining a direct connection to the main database for writing functions is the possibility of system overload.

If the API receives a high demand for modifications and writing requests to the database, it is possible that the pool connections will be exhausted, leaving other requests waiting and increasing the risk of receiving a timeout.

In this scenario, the API may take longer to respond to the client, congesting the traffic of other clients who also need to perform write operations on the database. It is important to adopt intelligent measures to manage data traffic and avoid system overloads. Implementing a message queue can be a solution to balance the workload and prevent overload on the main database.

In the structure above, the API is responsible for identifying if there is any available connection in the pool, if not, the request is redirected to the Message Queue (such as Kafka). In front of the queue service, there is a worker responsible for processing the request coming from the queue.

The worker can be compared to a microservice that processes the request coming from the queue and establishes a connection with the database connection pool so that the POST request can be executed. Modifying or writing data in the database.

In case 1, there is a situation where there are no available connections in the database connection pool. When the API detects this condition, it redirects the request to be queued in the message queue. After queuing the request, the API returns the HTTP 202 status code, indicating to the other API consumer service that another instance or server is handling the request.

By using this method, no request is left pending in the API waiting for a response, freeing up network traffic and not overloading API processes.

In case 2, there is a request where there are available connections in the connection pool, avoiding the need to use the message queue.

However, the efficiency of this approach comes with a cost. In the scenario described above, the service requested a POST operation to write file X to the database, but the connection pool was overloaded, triggering the queuing system and worker.

If the service requests an operation to read file X from the database in a short period of time, there is a chance that the queue and worker are still processing the data sent in the previous POST request, resulting in a 404 error in the response to the GET request.

One possible solution is to establish a Web Socket between the API and its consumer, notifying when the resource is available for reading. However, this would be a specific case in the structure.

In a choreographed microservice's architecture, attention must be paid to the delay that may occur due to the use of a message queue. A delay in reading a resource that is supposed to have been made available can corrupt the system as a whole if not programmed to understand this delay.

Infrastructure

With a clearly defined database access model, it is possible to complement the API infrastructure by implementing a load balancer between the containers of each redundant application, in order to efficiently distribute traffic.

Indeed, security is a critical aspect of any high-performance service. It is essential to implement mechanisms that ensure the security of data, logging, and network monitoring and protection against potential cyberattacks.

Contacts for business purposes or inquiries.

LinkedIn | GitHub

Book Recommendation

Scalability Rules: 50 Principles for Scaling Websites | Martin L. Abbott and Michel T. Fisher

The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise | Martin L. Abbott and Michel T. Fisher

Building Microservices: Designing Fine-Grained Systems | Sam Newman

Site Reliability Engineering: How Google Runs Production Systems | Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services | Brendan Burns

Scalable Internet Architectures | Theo Schlossnagle

--

--

Rômulo Pauliv

ML/AI Engineer | Data Scientist. Working with simulations in the field of Computational Physics.