AWS Elasticsearch/Kibana for Serverless Log Aggregation

One of the most interesting applications I have architected in recent times is a completely serverless web application hosted on Amazon Web Services. “Serverless”, in addition to being a cool technology for me, is a real breadwinner for my family. Besides that, it has the potential to save hundreds of thousands of $ for my clients over months to come. All this comes with built-in super-high performance of Dynamo DB, AWS Lambda, CloudFront and other cloud native technologies. I cannot stop effusing my sales pitch for FastUp and Amazon Web Services, so, I won’t get started.

Getting back to the topic: For a serious enterprise (think Fortune 1000), serverless technology paradigm brings great advantages, and also begs a new way of thinking in compute, storage, content and also peripheral solutions such as backup, log visualization, security and such. Today’s post is about Log Visualization. Having gone serverless, Application and Infrastructure logs land up in a variety of places and are very hard for developers and operators to locate, search and respond to production incidents. There are many third party solutions which provide complete Log Visualization services — such as Splunk and many others. While these third party solutions do provide superior features, they can be very expensive. Instead, Elasticsearch and Kibana are open source solutions that come very close to required functionality and are very inexpensive to run (for our use case). Amazon provides Elasticsearch with Kibana as a service under it’s group of “Analytics” services. Elasticsearch and Kibana are very versatile products and one should do independent research on those capabilities. For this post, I will focus on how we used it for Log Visualization. Here is a diagram.

Fig. Here is how our (mostly) Serverless stack sends log events to Elasticsearch via Kinesis Firehose and Lambda

Overview

On the left, on that diagram, are all services we have implemented for our Serverless Stack. Note that there are a few EC2 instances, those are for a legacy authentication solution for a product that we will be replacing with AWS Cognito very soon. The one service that is not in here is S3 — where we store our HTML and JavaScript and also User Generated Content. We don’t push access logs for S3 into Elasticsearch simply because we don’t need for day to day log visualization. We may need S3 access logs for forensics in case there is a security breach incident. For that, we have a completely different solution.

End User Access Logs

The top two boxes on the left (in the diagram) are CloudFront and the Application Load Balancers. When an end user downloads a file from CloudFront or attempts to authenticate against our Legacy Authentication Solution, these two services send access logs to an S3 bucket. CloudFront has a well defined log format and so does the ALB. That S3 bucket is configured to notify a couple of Python Lambda functions when a new log events are written to it. This Lambda function transforms a log message into JSON format and pushes it into a Kinesis Firehose Delivery Stream.

API Access Logs

The third box is the API gateway which logs it’s access and error logs into a CloudWatch Log Stream in a Log Group. The CloudWatch Log Group name corresponds to the API gateway id and stage and hence does not change. This allows us to easily integrate this stream to another Python Lambda function that transforms log messages to JSON and push those to the Kinesis Firehose Delivery Stream.

Application Event Log

Finally, the two bottom boxes on the left are all our compute (Lambda and EC2). Within our code, we send all log events as they arise into the Kinesis Firehose Delivery Stream using the AWS DotNet SDK. These log events are already formatted in the JSON format.

Delivery to Elasticsearch

The Kinesis Firehose Delivery Stream is configured to deliver all data into an AWS Elasticsearch Cluster once it has 1 MB of data to deliver or a minute has passed (aka Buffering). This way, all logs are delivered into Elasticsearch eventually within a minute or minute, thirty.

Log Visualization

AWS Elasticsearch comes with a Kibana front end where our developers go and search for log events. It is blazing fast and very featureful in terms of filtering in interesting logs, free text search, making visualizations of certain metrics and such things. Elasticsearch has a concept of index templates which we have implemented so that logs are indexed as we like and are sorted by log event time.

The only disadvantage of using AWS’ Elasticsearch/Kibana setup is that AWS does not yet support X-Pack and hence, we cannot do user authentication for the Kibana front end. That is not fun for log events that are security related or are sensitive in nature. For majority of Enterprise use cases, this should not be a big deal yet. Hopefully, AWS will release an X-Pack update for Elasticsearch. Or will they?

As I mentioned previously, there are other alternatives to using AWS’ Elasticsearch. There may be supported solutions in the AWS Marketplace or third party services such as Splunk.