Setting up a Load-balanced Logstash behind and AWS ELB

From a Beats perspective Logstash seems to be built to be a single node application at a first glance. By tweaking its configuration, it is possible to have a load-balanced installation having multiple hosts with logstash that can be behind a load-balancer.

In my problem I have multiple hosts generating logs across the globe, and our existing design consisted of one Logstash instance per region. Though managing this setup was quite possible with ansible, it’s expensive and at the same time not failsafe (if an instance crashes in a region the logs won’t be transferred in time)

So, we had the idea of creating a load-balanced logstash installation behind an AWS ELB in an autoscaling group, this is how it looks:

Logstash running on EC2 instances behind an AWS ELB.

So, the autoscaling group scales out when there’s more than 50 CPU utilization in total.

Step 1: Instance Launch Template and Autoscaling Group

To create an autoscaling group we can start off by creating one of two things:

  1. Launch Configuration
  2. Launch Templates (newer)

Both are ways to define what an EC2 instance in that security group should look like. I use Launch templates as it helps me create new ones based on older versions, which helps be backtrack and retrieve the older configs that I used which are sometimes better when you try out new things.

Launch Template creation page

In the Launch Template, I mention the “user data” which is a bash script file that installs and sets up logstash on my machine while also retrieving the logstash configurations from AWS Secrets Manager which is a more secure place to store the sensitive credentials than “user data”.

Adding User data to Launch Template

Here’s my “user data” section:

#!/bin/bash
cd /tmp/
sudo yum -y install java
wget https://artifacts.elastic.co/downloads/logstash/logstash-6.5.4.rpm
rpm -vi logstash-6.5.4.rpm
access_key=`curl http://169.254.169.254/latest/meta-data/iam/security-credentials/my-logstash-role | python -c 'import sys, json; print json.load(sys.stdin)["AccessKeyId"]'`
secret_key=`curl http://169.254.169.254/latest/meta-data/iam/security-credentials/my-logstash-role | python -c 'import sys, json; print json.load(sys.stdin)["SecretAccessKey"]'`
session_token=`curl http://169.254.169.254/latest/meta-data/iam/security-credentials/my-logstash-role | python -c 'import sys, json; print json.load(sys.stdin)["Token"]'`
mkdir /root/.aws/
echo "[default]
aws_access_key_id=$access_key
aws_secret_access_key=$secret_key
aws_session_token=$session_token
region=us-west-2" > /root/.aws/credentials
aws secretsmanager get-secret-value --secret-id My_Logstash_Configuration | python -c 'import sys, json; print json.load(sys.stdin)["SecretString"]' > /etc/logstash/conf.d/log_config1.conf
/usr/share/logstash/bin/logstash-plugin install logstash-filter-geoip
sudo mkdir -p /usr/share/logstash/geoip/
sudo curl https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz | tar -xz -C /tmp/
sudo curl https://geolite.maxmind.com/download/geoip/database/GeoLite2-ASN.tar.gz | tar -xz -C /tmp/
sudo cp /tmp/GeoLite2-City*/GeoLite2-City.mmdb /usr/share/logstash/geoip/
sudo cp /tmp/GeoLite2-ASN*/GeoLite2-ASN.mmdb /usr/share/logstash/geoip/
sudo chown logstash:logstash /usr/share/logstash/geoip/
echo 'echo "path.data: /var/lib/logstash
queue.type: persisted
path.queue: /tmp/logstash/queue
queue.max_bytes: 2gb
path.logs: /var/log/logstash" > /etc/logstash/logstash.yml' | sudo -s
sudo service logstash start

So here I’ve given my instance permissions to retrieve some secrets from the AWS secrets manager service, by giving it an IAM Service Role.

So, the URL

169.254.169.254/latest/meta-data/iam/security-credentials/<role_name>

is used to retrieve the access key id and secret access key to obtain the secrets from the secrets manager using the aws cli tool.

Since I use an AMI of AmazonLinux, the aws cli comes pre-installed.

So if you take a closer look at the user data, you would see that I am creating a persisted queue for logstash. This is because I can use smaller instance types and not worry about RAM in case there’s a sudden surge in logs, because this will get persisted onto disk, and logstash will process these while there isn’t much load.

The security groups for the EC2 instances in the launch template is what protects it from being accessed by anyone, make sure you have configured one perfectly, as you wouldn’t want your logstash to be exposed to everyone.

Step 2: ELB Configuration

You would have to create a Network Load Balancer from the available load balancers if you want to listen for beats (filebeat, heartbeat etc.).

We would have to also mention a target group for the load-balancer, which would be a group to which the load-balancer sends the incoming requests to, and then attach that target group to an autoscaling group.

Creating a target group for the network load balancer

Now you need to associate this with the autoscaling group.

Step 3: The Filebeats Configuration

The logstash configuration remains the same as it would in a single node logstash deployment. But the thing that changes is the configuration of the beats that connect to logstash.

We would have to change the output section of the filebeat/heartbeat:

output.logstash:
hosts: ["blah_blah_blah.elb.us-west-2.amazonaws.com:5044"]
ttl: 120

The hosts section will contain the public URL of your network load-balancer and the ttl is an important parameter in this setup as you mention that the TCP connection disconnects in 120 seconds, and needs to be re-established to the Logstash server.

Along with this we also need to disable DNS caching on the machine on which the beats is running so that it doesn’t resolve the domain name to the same IP address as before, which would be like being connected to a single machine.

So, now every 120 seconds the TCP connection will disconnect and re-establish to the load balancer and this makes sure that the load is spread evenly.