NGINX: Serving the Kenshoo Tag

Published in

skai engineering blog

5 min readJul 29, 2019

The problem: supporting tens of thousands of download requests per second

Kenshoo enables customers to manage, optimize, and analyze their digital marketing activities across multiple advertising channels and devices.

It does so by embedding a tag (called “Kenshoo Tag”, or ktag for short) in every page of the customers’ websites.

The embedding of the tag in every page means that for every person who accesses a page in a customer’s website, the Kenshoo Tag is executed.

Running the tag is not a problem; it runs in the background on the client side, and doesn’t affect page performance or require server resources.

However, before the tag can run, the browser has to download the ktag.js file, which contains the code to run. And this is a problem — we have to make sure the file is delivered to each user, and fast, before they leave the page.

We could potentially reach 20,000 download requests per second. This is a very high load, and it requires a good solution within a reasonable cost frame.

At Kenshoo, we considered two solutions to our problem before settling on the one that best suited our needs.

First Solution — A Content Delivery Network (CDN):

A CDN is a geographically distributed network of proxy servers and their data centers. At first, this seemed like the ideal solution to the problem. The whole purpose of a CDN is to distribute content (in our case, a file). It does so efficiently and reliably. A request is routed to the server with the lowest latency, and much like other cloud solutions, the service can be relied upon to operate without much effort.

Since a CDN is already in use in Kenshoo — Amazon’s CloudFront (used to serve the UI javascript pages) we decided to go with that.

However, this solution proved unsuitable when we checked the cost, which was very high, mostly due to the high request rate and geo distribution required. With that in mind, we decided to continue looking for a better solution

Second Solution — NGINX

A CDN is one way to serve static files at a high rate. The second popular way is to use NGINX. NGINX is a free, open-source, HTTP server and reverse proxy, known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.

Right from the start it was obvious that, cost wise, NGINX was definitely the better choice. The NGINX server itself is free (there is a paid version, but it wasn’t necessary for what we needed), so the cost we’d see would be only for the servers that run it. In our case we use AWS instances and an AWS ELB for load balancing. The cost for these servers is relatively low.

To achieve low latency across different geographical regions, we would also need to use a GeoDNS. Potentially, a GeoDNS can get a bit pricey as well, because just like with the CDN the request rate is factored into the price. However, even with the GeoDNS, the cost of this solution is considerably less than the cost of the CDN solution.

NGINX performance is also very good, due to it’s sophisticated single-threaded, event‑driven architecture. A performance test showed that when using 6 c3.large AWS instances, NGINX can handle up to 30,000 requests per second with no degradation in response time. And if we need more, we only need to add another instance (or add another site in another geographical location).

A potential downside of this solution was the effort needed. As opposed to the CDN solution, here we would have to build the solution ourselves. We would need to implement the NGINX configuration and to handle the deployment and management.

As it turns out, this proved to be rather simple thanks to NGINX’s simplicity and Kenshoo’s proprietary microservice platform — “Microcosm”, which automates the deployment and pipeline of docker images on AWS.

Configuration

The NGINX configuration is quite simple. There is a primary configuration file called nginx.conf and additional configuration files can be included. The configuration files consists of directives and their parameters. The directives are organized into groups known as contexts (such as http, server, location, etc.).

Performance configuration

location / { add_header Cache-Control public;
 expires 86400;
 gzip on;
 etag on; gzip_static on;
}

To improve performance, we enabled caching, gzip, and ETag. The “gzip_static on” directive tells NGINX to look for a pre-compressed version of the static files. We compress the files beforehand, so NGINX won’t have to do it at runtime.

Logging configuration

log_not_found off;map $status $loggable {
 ~^[23] 0;
 default 1;
}error_log /var/log/app.log;
access_log /var/log/access.log acc_log_format if=$loggable;

Like most web and application servers, NGINX has two logs: an error.log and an access.log.

To improve performance, we wanted to make sure that only real errors appear in the logs (logging thousands of successful requests every second is redundant and inefficient).

The “log_not_found off” directive tells NGINX not to log 404 errors in the error log.

The map which is used in the access_log directive tells NGINX to only log requests that failed (received 4xx or 5xx errors, or more accurately, not 2xx and 3xx response codes).

NGINX doesn’t automatically rotate its logs. The NGINX process must receive a specific signal (USR1) to rotate the logs. We decided to use the Linux service logrotate for this purpose. You can see our logrotate configuration here.

Monitoring configuration

server {
 listen 8081; location /nginx_status {
 stub_status on;
 access_log off;
 allow all;
 }
}

NGINX has a module called: ngx_http_stub_status_module, which provides basic status information. We used this module and the Linux service collectd to collect request and connection statistics and send them to Hosted Graphite.

The collectd service uses plugins. We used the nginx plugin to collect statistics from NGINX, the tail plugin to collect statistics from the NGINX logs (4xx and 5xx errors) and the write_graphite plugin to send the statistics to Hosted Graphite. You can see our collectd configuration here.

In addition, we used collectd on the AWS instances to collect instance statistics like: CPU, memory, etc.

The ngx_http_stub_status_module only provides basic statistics on requests and connections, which are for the entire process and not per location. To get better statistics per location, we added the open source module graphite-nginx-module.

Summary

NGINX turned out to be a good solution for our problem. It’s much cheaper than a CDN, its performance is very good, and it can scale easily by adding new instances (which happens automatically in AWS when needed). The development and management were also very simple due to NGINX’s simplicity.

As a bonus, NGINX provides additional capabilities that a CDN does not, like: reverse proxy and URL rewrites. We use these capabilities for advanced testing.