Breaking the Connection: How We Overcame Persistent HTTP Connection Issues When Migrating Traffic

Agoda Engineering
Agoda Engineering & Design
6 min readMay 15, 2023

by Digvijay Singh and Jatin Garg

Introduction

In the world of technology, graceful termination refers to the safe shutdown of processes and connections by the operating system. This is crucial for preventing accidental data loss and other unexpected issues during intentional shutdowns.

However, for businesses like Agoda, persistent HTTP connections can pose a significant challenge, where a connection remains open between the client and server even after the server goes down or becomes unavailable.
This is because subsequent requests from the user are still directed to the same server, even if traffic is being routed to a different data center, leading to a suboptimal user experience and potentially increasing the risk of downtime.

In this blog, we explore Agoda’s approach to solving the persistent HTTP connection issue and its impact on the business. We discuss the importance of graceful termination, the challenges posed by persistent HTTP connections, and the steps we took to mitigate this issue. We also discuss the problems faced before and after implementing the solution and how the new approach has improved the user experience.

Background

Agoda’s Affiliate Program enables partners to integrate Agoda’s property data into their platforms and make bookings through an XML Data File/Feed and an intermediate Affiliate API that provides additional data such as room, rate, and allotment.

We use a Geo-location based F5 load balancer to route traffic to the nearest available server based on the user’s geographical location. This approach improves the user experience by reducing latency and ensuring a faster response time. However, this load balancer has led to a persistent HTTP connection issue. The problem arises when a user establishes a connection with a server in a particular data center, and subsequent requests from the user continue to be directed to the same server even if traffic is being routed to a different data center.

Traffic routing is needed mainly when:

  1. the server goes down or becomes unavailable.
  2. the server is healthy, but a dependency blocking, or non-blocking has an issue downstream. The non-blocking nature Booking system exaggerates this problem because we cannot do reliable HealthCheck.

Persistent HTTP connection can lead to a suboptimal user experience and potentially increase the risk of downtime.

To explain the same via diagram:

Old client connections should have gone via 3 instead of persisting on 1 i.e. Old connections are still going in Primary DC, whereas new connections are going in Failover DC (2). As shown in the diagram, the issue with Primary DCis that Affiliate Service cannot connect to downstream service. Because of this known issue, traffic is routed to failover DC. But somehow, old connections are still going in Primary DC, leading to booking loss.

Solving Persistent HTTP Connection Challenge

To address the issue of persistent HTTP connections in Agoda’s load-balancing system, several approaches can be explored. One possible solution is to utilize the headers available in the HTTP connection to reset the connection from the server side.

Closing via Connection header

  • Periodically: Send a header after every specified interval to reset the connection.
  • Trigger-based: Keeping a boolean parameter which sends header post it is turned true. This boolean parameter may be updated via an API call or when traffic is routed to a different DC.

Closing via Keep-Alive header

  • Max Requests: An integer that is the maximum number of requests that can be sent on this connection before closing it
  • Timeout: n integer that is the time in seconds the host will allow an idle connection to remain open before it is closed.

While testing the above approaches, it was observed that Keep-Alive could not reset the connection, whereas the Connection header could do the same. Hence, we chose the Connection header approach to solve the issue.

Using a Periodic connection close header would increase the request time whenever a connection is reset. In trigger-based, it is a dependency creation on the external party, which may or may not comply. On the positive side, periodic connection reset will help reset connection without external dependency, and a trigger-based approach will be able to maintain request time.

Implementation Details

To solve the persistent HTTP connection issue, a periodically-based connection close header was chosen as the solution. To implement this solution, we need to create a middle-ware Directive that will be executed for each incoming HTTP request. This Directive will be responsible for checking if a connection needs to be closed based on the time elapsed since the connection was established.

To achieve this, we need to maintain an in-memory map where the key will be the client IP address, and the value will be the start time of the connection. Whenever a new connection is established, we will add an entry to the in-memory map with the client IP as the key and the current time as the value.

We also need to define a configurable TTL variable that will hold the maximum time for a connection value. This variable will determine whether a connection needs to be closed.

Now, every time the middle-ware Directive is executed, it will check the TTL value against the difference between the current time and the start time of the connection for the client IP in the in-memory map. If the time difference exceeds the TTL value, the connection has persisted for too long and needs to be closed. In this case, the middle-ware Directive will send the connection close header to the client, initiating the connection closure process.

Results

Traffic migration post solving the connection close header issue.

After implementing the solution of periodically closing persistent HTTP connections using a configurable TTL value and an in-memory map to track connection start times, we observed a significant improvement in the stability and reliability of Agoda’s load-balancing system.

  • We were able to address the persistent connection issue, which was causing a bottleneck in the system. By resetting connections periodically, we were able to distribute the load more evenly across multiple servers and reduce the risk of overloading any one server. This resulted in a more stable and scalable system overall.
  • We were able to implement graceful termination of applications. This means that when an application is shut down, it can notify the client to stop sending new requests to that application and allow existing requests to complete gracefully before shutting down completely. This prevents any data loss or errors that may occur when requests are abruptly terminated.

Overall, the implementation of periodically closing persistent HTTP connections was a success and resulted in a more stable and reliable load-balancing system for Agoda. By combining this with graceful termination of applications, we minimized downtime and reduced the risk of data loss or errors.

Conclusion

The issue of persistent HTTP connections can significantly impact the performance and reliability of a load-balancing system. It prevents traffic from being routed to the most optimal server and causes increased latency and potential service disruptions.

To mitigate this issue, we explored various solutions and ultimately implemented a periodically-based connection close header. This solution involved maintaining an in-memory map of client IP addresses and their connection start times, along with a configurable TTL variable to define the maximum time for a connection. A middleware directive was then used to periodically check if the TTL value was breached, and if so, send the connection close header to reset the connection.

Implementing this solution allowed us to effectively address the persistent HTTP connection issue, resulting in improved performance and reliability of our load-balancing system. Continuous monitoring and optimization of our systems remain crucial to ensure their efficient and effective operation.

--

--

Agoda Engineering
Agoda Engineering & Design

Learn more about how we build products at Agoda and what is being done under the hood to provide users with a seamless experience at agoda.com.