Bring the Best Out of NGINX

Clearwater Analytics Engineering
cwan-engineering
Published in
8 min readApr 30, 2023

Nginx is one of the most widely used open-source web servers. It is primarily used as a reverse proxy, proxy cache and load balancer to manage incoming requests. The event-driven, asynchronous Nginx architecture delivers high performance, low resource usage, and a high degree of concurrency.

Architecture Overview

Nginx includes a master process along with a set of worker processes, and it can divide its job into those worker processes. Each worker process is independent and single threaded. Nginx can handle hundreds of worker connections simultaneously and efficiently. Each new connection creates another file descriptor and consumes a small amount of additional memory in the worker process. There is little additional overhead per connection. It uses disk-based cache to cache the responses and minimizes the load to backend servers.

While the Nginx server architecture provides high performance out of the box, it comes with a wide set of configurations that can be fine-tuned to unlock its full potential.

At Clearwater, we use Nginx widely to host static content and as a reverse proxy. We have leveraged Nginx configuration capabilities to solve various challenges related to simplifying the Nginx configuration structure, fine tuning caching, working with dynamic backends, and performance optimization via adjusting various parameters. In this blog post, we will explore several ways to optimize Nginx configuration. We will assume you already have a basic understanding of Nginx configuration, so we will not cover the fundamentals.

Nginx Configuration Inheritance Model

Nginx configuration works with a set of contexts (written as block) organized in a hierarchical manner. Each directive is defined based on its context. These directives get inherited in a downward direction.

That means a directive value defined in server block will be inherited by all the location block inside. The same can be overridden by individual location block.

In the example below, the proxy_read_timeout directive is defined in the http context, and it would be applied to all the servers and its locations. For location context /xyz we have overridden the same with a different value.

Leveraging the inheritance model, we can keep Nginx configuration simple and concise.

Additionally, Nginx provides the “include” directive to include a set of configurations from another file. The included file should consist of syntactically correct directives and blocks. This can be placed in any type of context.

include <file>

In the above example, the configuration defined inside “app.conf” will be included inside server context. This is helpful when dealing with large and complex sets of configurations, because it makes the sets maintainable by separating out the overall configuration into multiple files.

Dynamic DNS Resolution

Have you ever faced a situation where your Nginx server suddenly started giving gateway timeouts (504), but your backend continues to work properly?

One of the possible reasons could be “Missing dynamic DNS resolution.”

When you directly configure your backend DNS as the parameter directly in a proxy_pass directive, Nginx tries to resolve this at the time of startup using only the DNS resolver configured at /etc/resolv.conf.

It caches the DNS record value until a restart occurs. That means it doesn’t respect the TTL of the value of the record, and in the case where the IP of load balancer is configured for the record changes, it will continue to send the request on the already resolved IP — which leads to the gateway timeout error.

Since it is very common to have your backend services getting destroyed and re-created (especially in this era of infrastructure-as-code), it is important to solve this problem. Nginx offers a resolver directive to solve this issue.

resolver –specifies the name servers that should be employed by Nginx to resolve hostnames.

To use the resolver:

a) Define a resolver directive with an appropriate name server and an appropriate context. In the below example, it is in ‘http’ context.

b) Use a variable to specify the domain name in the proxy_pass directive.

If the proxy_pass value is defined as variable, Nginx does not resolve the corresponding record at startup and uses the configured nameserver to resolve the name when it is referenced. It queries and caches the IP address along its TTL value. When a new request is made, it first checks if the corresponding TTL value expired. If expired, it will query the DNS server for the value, or use the already cached IP if unavailable. The “resolver” directive also comes with an optional attribute “valid” to override the caching duration.

Proxy Caching

Deploying Nginx as a reverse proxy provides a full set of caching features. To enable Nginx caching you just need to define two directives:

proxy_cache_path — This directive specifies the number of parameters: where the cache is stored on your file system, the name of the cache, the size of the cache in memory for the metadata, and the size of the cache on disk.

proxy_cache — This activates caching of all content that matches the URL of the parent location context.

This minimum configuration can work for simple use cases, but there are always ways to optimize the configuration when the default behavior does not match the behavior that you want.

a) Cache only if it is popular

Did you end up in cases where you are not sure about the frequency of a resource being accessed, and so you want to cache only if its access surpasses a given frequency (e.g., a popular resource)?

Nginx provides the concept of delayed caching which can be helpful to achieve this task. We can use below directive to define this. It can be applied at http, server, and location level blocks.

proxy_cache_min_uses <number>

This sets the number of requests after which the response will be cached. This could be of great use if you have lot of content being accessed once or twice in an hour or day.

b) Cache Revalidation

Sometimes, the resource content size is heavy and might not be getting changed for a long period (even beyond a cacheable period). Cache revalidation can help to minimize the upstream bandwidth and disk write. In this case, we can use the following directive:

proxy_cache_revalidate <on | off>

The default value is “off”. This need to be set as “on” to enable the same.

Instead of making a simple get request to origin server, it would make a conditional request to check to get a resource content only if it got modified after it get last cached at proxy. It uses “If-Modified-Since” capability of HTTP header. Origin server has the option to respond with 304 NOT Modified status without sending the content.

c) Limit traffic (for same resource) to Origin Server in case of a MISS

Consider a use case where we might be getting a lot of requests for same content which is cacheable but resulted into a MISS, which means all those requests will go to origin server for same request. Nginx provides another directive to solve this:

proxy_cache_lock <on | off>

If it is on, only one request at a time will be allowed to populate a new cache element. What if a request is taking too much time to populate the cache? It provides another directive to control the same:

proxy_cache_lock_timeout < time >

After this time, the request will be passed to the proxied server, however, the response will not be cached.

Static Content Caching

Static content is web content that remains the same over a period of time and doesn’t change frequently. Examples of static content include CSS, JavaScript files, images, videos, and other static documents like user manuals. If your website uses a lot of static content, then you can optimize its performance by enabling client-side caching where the browser stores copies of static content for faster access. Nginx provides the “expires” directive to enable client-side caching.

expires — This enables or disables adding or modifying the “Expires” and “Cache-Control” response header fields.

We should define an appropriate expiration based on the type of static content and its usage to achieve a better performance. For instance, JS and CSS files can be cached forever or for a long period of time if these file names include a hash generated based on its content (i.e., for angular applications, webpack can be configured to include hash in the generated file names). In the case of icons/images, it can be set to a lower value (even this can be higher if the naming is based on release versions).

Summary

Nginx is a performance oriented and resource optimized server. It comes with a wide set of configurations that can be optimized using its inheritance-based configuration model, and then fine-tuned based on your specific use cases. It comes with an impressive set of controls to support the static and proxy caching, and it can be easily configured with a name server to resolve the DNS records at runtime while acting as reverse proxy.

Note that this blog post only focuses on a few key aspects of the Nginx configuration. You can always explore the official documentation to see the complete set of contexts and directives.

About the Author

Prashant Mishra is a Senior Software Development Engineer at Clearwater Analytics, with over 10 years of experience working on large-scale distributed applications. His passion lies in creating innovative solutions to complex problems, with a particular interest in Java and cloud-related technologies. His experience in software development spans across multiple areas, including design, development, testing, and DevOps-related tasks.
In his free time, he enjoys expanding his knowledge by reading about new technologies and programming languages. He is also an active individual and likes to stay fit by practicing yoga and playing badminton.
When he’s not working or staying active, he enjoys spending quality time with his family.

--

--