Indexing performance boost and load balancing for Elasticsearch with NGINX as a sidecar of your application stack

Stefan Pöltl
Jul 31 · 4 min read

Feeding Elasticsearch works via the HTTP endpoint reachable on Port 9200 by default. Basically you can feed your data with HTTP1.1 requests, which can be implemented with any language on any platform.

The benefit for the HTTP endpoint is the loosley coupling between your application and the Elasticsearch cluster and the stateless communication. A big minus is that a new connection needs to be established for every request. So how can we reuse the HTTP connection or keep the connections alive?

We need a proxy!!!

To keep the closed connections active we can use a transparent middleware, a reverse proxy. The simplest solution and also recommended from elastic is NGINX. With NGINX we have an open source webserver that scales well and supports multiple proxy setups.

Sidecar???

The sidecar pattern is used to place a utility/tool for you app beside it to support the application. In our case we can spin up an NGINX instance that speeds up the indexing performance of Elasticsearch.

Development setup

As usual for my development approach I’m going to use a docker based setup to get a working example ramped up pretty fast. For testing purposes you should have CURL running on your system. The docker-compose.yml looks like this:

version: '3'

services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
environment:
- "discovery.type=single-node"
ports:
- 9200:9200

proxy:
image: nginx:1.17-alpine
ports:
- "8080:8080"
volumes:
- ./docker/vhost.conf:/etc/nginx/nginx.conf

The vhost configuration file for the NGINX looks like this to work as a transparent proxy:

http {
server {
listen 8080;
location / {
proxy_pass http://es:9200;
}
}
}

Let’s test the setup with docker-compose up and a simple curl statement:

Wow the NGINX proxy works as expected and you can already use it for logging purposes. Now we want to tune the index feeding performance by keeping the HTTP connections alive:

events {
worker_connections 1024;
}

http {
client_max_body_size 50M;
upstream elasticsearch {
server es:9200;
keepalive 15;
}
server {
listen 8080;
location / {
proxy_pass http://elasticsearch;
proxy_http_version 1.1;
proxy_set_header Connection "Keep-Alive";
proxy_set_header Proxy-Connection "Keep-Alive";
}
}
}

We can easily validate the result with curl:

curl localhost:9200/_nodes/stats/http?pretty | grep total_opened
Direct HTTP interface opened connection validation

As you can see for every request a new connection is opened -> counter increases. Let’s try with the proxy:

Proxy opened connection validation

Now the HTTP connection is persistent! Less IO on the Elasticsearch side and you will end up with a much better performance.

Let’s discuss our NGINX settings to achieve this:

worker_connections 1024 -> Number of connections opened including clients and proxied serverskeepalive 15 -> idle keepalive connections for a worker connection that are open

Also be aware of the keepalive_timeout setting which is default 75s. After this time a new connection gets opened by Elasticsearch. This means after 75 seconds without any HTTP call to your proxy the connections get closed. For most of the applications having interactions with Elasticsearch this should work fine. If you just have a single feeder that runs from time to time using the proxy, keep this in mind to adjust this setting also, if you have too long time frames between data collection and feeding.

For running Elasticsearch Bulk API POST-requests you need to be aware that NGINX has a client_max_body_size limit of 1MB by default. We increase it in our example to 50MB to support larger post bodies.

Load balancing you cluster

In a production setup you’re definitely running multiple nodes in a cluster(replication/performance). A must have is a load balancer that distributes the requests between all the nodes. With NGINX it’s pretty easy, we just add all the nodes in the upstream part in the config and get a round robin load balancing immediately. First we need to change our docker-compose.yml to run 3 Elasticsearch nodes within one cluster:

version: '3'

services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
container_name: es01
environment:
- node.name=es01
- cluster.name=docker-cluster
- cluster.initial_master_nodes=es01
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms256M -Xmx256M"
- http.cors.enabled=true
- http.cors.allow-origin=*
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9200:9200

es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
container_name: es02
environment:
- node.name=es02
- cluster.name=docker-cluster
- cluster.initial_master_nodes=es01
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms256M -Xmx256M"
- "discovery.zen.ping.unicast.hosts=es01"
- http.cors.enabled=true
- http.cors.allow-origin=*
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9201:9200

es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
container_name: es03
environment:
- node.name=es03
- cluster.name=docker-cluster
- cluster.initial_master_nodes=es01
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms256M -Xmx256M"
- "discovery.zen.ping.unicast.hosts=es01"
- http.cors.enabled=true
- http.cors.allow-origin=*
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9202:9200

proxy:
image: nginx:1.17-alpine
ports:
- "8080:8080"
volumes:
- ./docker/vhost.conf:/etc/nginx/nginx.conf

The NGINX configuration looks like this now:

events {
worker_connections 1024;
}

http {
client_max_body_size 50M;
upstream elasticsearch {
server es01:9200;
server es02:9200;
server es03:9200;
keepalive 15;
}
server {
listen 8080;
location / {
proxy_pass http://elasticsearch;
proxy_http_version 1.1;
proxy_set_header Connection "Keep-Alive";
proxy_set_header Proxy-Connection "Keep-Alive";
}
}
}

Now we can debug the round robin solution with curl:

curl localhost:8080 | grep es0
Round robin with NGINX

Now we have a perfect setup to build our application with a performance optimized Elasticsearch proxy. The example code can be found on:

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade