Solving SSL Handshake Delays with .NET Core and Proxies

Agoda Engineering
Agoda Engineering & Design
7 min readAug 31, 2023

by Phadtrapong Supakitudomkarn

Agoda offers more than 200 payment options to provide customers with the best travel experience. We use .NET Core to build payment services and connect with various third parties to handle payment processing.

Over the past few months, my team and I have been working on a project that involved integrating Agoda’s payment system with another company’s system. Although we encountered some challenging issues, we worked together seamlessly and successfully resolved them. Throughout this journey, we gained valuable insights and knowledge, which I am thrilled to share in this blog post.

How the payment system was setup

We run dotnet Core API running within a ‘container’ and is hosted across various servers and locations. These servers fully comply with Payment Card Industry (PCI) standards and offer a highly secure environment — no direct external internet connections are allowed.

How do we connect our services to the internet? That’s where our infrastructure team comes in. They provide what we call ‘proxy URLs.’ The proxy URLS act as safe gateways for our services to reach the outside internet world. The interesting part is we have multiple proxy URLs for different regions.

This allows us to connect our services to the internet via the nearest proxy URL. For example, if our service is hosted in Asia, we can use a proxy URL from the Asia region for the best connection.

The Problem

Before launching this project, we observed an issue with our payment servers in a specific region. These servers encountered prolonged response times exceeding 30 seconds when making HTTP POST requests to a third-party API endpoint. Initially, we suspected the problem might lie with the third-party API itself. However, upon further investigation, we discovered that the API’s response times were normal and not the source of the delay.

To analyze the network connection to a third party, we identified the need to dissect the various events occurring within it. Our goal was to pinpoint the event that consumed the most time. To achieve this. First, we used the ‘tcpdump’ command to capture all outgoing packets directed towards the third-party IP. Second, we used the dotnet library’s application insights to monitor which external endpoints our payment API was connecting to and measure the duration of each connection. This approach led us to identify key areas for further investigation.

Using the logs captured from tcpdump and analyzed with Wireshark, we found that the SSL handshake event between our container and the third-party API was taking a lot longer than it should have. Additionally, our application insight flagged another concern. Our servers were taking 30 seconds to establish a connection with a URL that we had never seen before in our code. This was another unexpected delay that caught our attention and seemed to occur alongside the third-party API connection.

Analyzing tcpdump data with Wireshark, communication from no.7 to 8 took 30 seconds.

Before we explore the reason behind our servers connecting to the unfamiliar URL, let’s briefly revisit how SSL certificates work. Below is a simple overview to give you the big picture.

Imagine you’re planning a night out at a restaurant that claims to have a Michelin star — a hallmark of culinary excellence. This situation is similar to a website displaying an SSL certificate, which is a digital credential indicating that the site is secure and trustworthy. Both a Michelin star and an SSL certificate act as badges of approval from respected authorities: Michelin for restaurants and Certificate Authorities, or CAs, for websites.

Before you make a reservation, you’d probably verify the restaurant’s Michelin star status on the Michelin Guide’s official website. In a similar vein, your web browser consults a Certificate Authority to confirm a website’s SSL certificate is valid. These verification steps are crucial for ensuring you’re in for a high-quality experience, whether you’re dining out or browsing online.

Now, here’s an important point to consider: Just as a restaurant can lose its Michelin star if it doesn’t maintain certain standards, an SSL certificate can be revoked if a website becomes insecure. In the digital realm, there’s something called a Certificate Revocation List (CRL), which keeps track of certificates that have been invalidated.

Your web browser checks this list to make sure the site you’re visiting still meets security standards. Similarly, the Michelin Guide updates its list of starred establishments, revoking stars from restaurants that no longer make the grade. In both cases, ongoing verification is key to maintaining trust and ensuring a good experience.

Building on the Michelin star analogy, consider the extra steps your browser takes to maintain a secure connection to a website. Just as you would verify a restaurant’s Michelin star before dining, your browser performs a ‘CRL check’ during the SSL handshake to establish a secure HTTPS connection. The Certificate Revocation List (CRL) contains all the certificates that were initially issued but were later revoked for various reasons, such as suspected compromise or no longer being needed.

When your browser performs this ‘CRL check,’ it’s similar to ensuring you’re not dining at a restaurant that has lost its Michelin star. If the website’s certificate isn’t listed in the CRL, it’s like a restaurant retaining its Michelin star — giving you the green light for a secure, high-quality experience. You can then proceed to shop, browse, or provide sensitive information with the assurance that you’re in a secure and authentic environment.

Good news! We’ve figured out what that mysterious URL was about. It was connected to the Certificate Authority (CA) and required an external internet connection. However, one lingering question remains: why was the process taking so long?

So how did we fix this?

As we delved deeper into the issue, we discovered that the payment system within our server was attempting to establish a direct connection to the Certificate Authority (CA), bypassing the proxy provided by our infrastructure team. Interestingly, the same payment system hosted in a different region did not experience similar delays. We consulted with our infrastructure team and learned that this disparity was attributed to our network configuration.

We formulated a hypothesis to address this challenge while working within the existing network setup. What if we ensured that all third-party connections were routed through a proxy? With this idea in mind, we experimented using the Dotnet Core HttpClient in C# to validate its feasibility.

In our payment system’s code, we had already established a proxy for connecting to the API endpoint. This was done using the ‘Proxy’ property from the ‘HttpClientHandler’ class. You can find a detailed explanation on the Microsoft documentation website here.

Here’s an example of how we set up proxy in HttpClientHandler

var handler = new HttpClientHandler();
handler.Proxy = new WebProxy(“http://agoda-proxy-server:port");
var client = new HttpClient(handler);

During CRL checks, HttpClient will create a separate instance of HttpClient to connect with the CA. This newly spawned HttpClient instance does not inherit the Proxy property from the HttpClientHandler used for the main http request. Thus, when the HttpClient communicates with the CA, it does not use the proxy we have set up in our HttpClientHandler.

For more details, please visit the discussion on this GitHub issue.

To ensure all https requests were routed via the proxy, we leveraged a property named ‘DefaultProxy’ in the ‘HttpClient’ class, part of the Dotnet Core framework. More information about ‘DefaultProxy’ can be found on the Microsoft documentation website. Our payment system operates within a Linux-based container. In this context, we needed to update the environment variable ‘HTTPS_PROXY’ and assign the proxy URL to it.

Here’s an example of how you can set up DefaultProxy property in a Unix-based platform bash script.

export HTTPS_PROXY=http://proxy_server:port

Following these steps successfully resolved the issue. Now, all HTTP requests are processed without delays, proving that the problem was linked to how we handled https requests through the proxy.

Exploring alternative solution: Bypassing CRL checks

One alternative way to solve this problem is to disable the CRL check. This can be done by tweaking our HTTP client, in our case, the HttpClientHandler.CheckCertificateRevocationList property in .NET.

Here’s how you can implement this in c#:

var handler = new HttpClientHandler();
handler.CheckCertificateRevocationList = false;
var client = new HttpClient(handler);

However, this method potentially reduces the level of security as we’re no longer checking if the certificate used for establishing a connection has been revoked.

Conclusion

In summary, we successfully identified and resolved a latency issue in our payment system. The issue was initially caused by SSL handshake CA connections being blocked by our network. It’s important to note that this is not a commonly encountered problem. For those interested in a deeper understanding of this issue, further details can be found in the Dotnet runtime GitHub repository.

Acknowledgments

I’m truly grateful to my managers, Dawid Bojarowicz and Khan Shrouk, for their invaluable guidance and feedback. Special thanks to Janet John for editing this blog and to my Agoda colleagues who assisted with reviews.

References

--

--

Agoda Engineering
Agoda Engineering & Design

Learn more about how we build products at Agoda and what is being done under the hood to provide users with a seamless experience at agoda.com.