Serverless App: AWS CloudTrail Log Analytics using Amazon Elasticsearch Service

Kuldeep
All things cloud
Published in
6 min readDec 10, 2017

In this article, I’m will talk about how you can build a Serverless application using AWS Serverless Application Model (SAM) to perform Log Analytics on AWS CloudTrail data using Amazon Elasticsearch Service. The AWS Serverless Application will help you analyze AWS CloudTrail Logs using Amazon Elasticsearch Service. The application creates CloudTrail trail, sets the log delivery to an s3 bucket that it creates and configures SNS delivery whenever the CloudTrail log file has been written to s3. The app also
creates an Amazon Elasticsearch Domain and creates an Amazon Lambda Function which gets triggered by the SNS message, get the s3 file location, read the contents from the s3 file and write the data to Elasticsearch for analytics.

I have referenced the below Github repo to write this blog post:

Let’s learn about what is AWS CloudTrail, Elasticsearch, Amazon Elasticsearch Service, AWS Lambda and AWS SAM.

What is AWS CloudTrail?

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. This event history simplifies security analysis, resource change tracking, and troubleshooting.

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

What is Amazon Elasticsearch Service?

Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.

What is AWS Lambda?

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service — all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

What is AWS Serverless Application Model?

AWS Serverless Application Model (AWS SAM) prescribes rules for expressing Serverless applications on AWS. The goal of AWS SAM is to define a standard application model for Serverless applications.

Now let’s look at how we can build a Serverless App to perform Log Analytics on AWS CloudTrail data using Amazon Elasticsearch Service.

This is the architecture of the CloudTrail Log Analytics Serverless Application:

Architecture for Serverless Application: CloudTrail Log Analytics using Elasticsearch

AWS Serverless Application Model is a AWS Cloudformation template. Before we look at the code for SAM template, let’s work on packaging our AWS Lambda.

On your workstation, create a working folder for building the Serverless Application.

Create a file called index.py for the AWS Lambda:

""" This module reads the SNS message to get the S3 file location for cloudtrail
log and stores into Elasticsearch. """

from __future__ import print_function
import json
import boto3
import logging
import datetime
import gzip
import urllib
import os
import traceback

from StringIO import StringIO
from exceptions import *

# from awses.connection import AWSConnection
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3', region_name=os.environ['AWS_REGION'])

awsauth = AWS4Auth(os.environ['AWS_ACCESS_KEY_ID'], os.environ['AWS_SECRET_ACCESS_KEY'], os.environ['AWS_REGION'], 'es', session_token=os.environ['AWS_SESSION_TOKEN'])
es = Elasticsearch(
hosts=[{'host': os.environ['es_host'], 'port': 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)

def handler(event, context):
logger.info('Event: ' + json.dumps(event, indent=2))

s3Bucket = json.loads(event['Records'][0]['Sns']['Message'])['s3Bucket'].encode('utf8')
s3ObjectKey = urllib.unquote_plus(json.loads(event['Records'][0]['Sns']['Message'])['s3ObjectKey'][0].encode('utf8'))

logger.info('S3 Bucket: ' + s3Bucket)
logger.info('S3 Object Key: ' + s3ObjectKey)

try:
response = s3.get_object(Bucket=s3Bucket, Key=s3ObjectKey)
content = gzip.GzipFile(fileobj=StringIO(response['Body'].read())).read()

for record in json.loads(content)['Records']:
recordJson = json.dumps(record)
logger.info(recordJson)
indexName = 'ct-' + datetime.datetime.now().strftime("%Y-%m-%d")
res = es.index(index=indexName, doc_type='record', id=record['eventID'], body=recordJson)
logger.info(res)
return True
except Exception as e:
logger.error('Something went wrong: ' + str(e))
traceback.print_exc()
return False

Create a file called requirements for the python packages that are needed:

elasticsearch>=5.0.0,<6.0.0
requests-aws4auth

With the above requirements file created in your workspace, run the below command to install the required packages:

python -m pip install -r requirements.txt -t ./

Create a file called template.yaml that will store the code for AWS SAM:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: >

This SAM example creates the following resources:

S3 Bucket: S3 Bucket to hold the CloudTrail Logs
CloudTrail: Create CloudTrail trail for all regions and configures it to delivery logs to the above S3 Bucket
SNS Topic: Configure SNS topic to receive notifications when the CloudTrail log file is created in s3
Elasticsearch Domain: Create Elasticsearch Domain to hold the CloudTrail logs for advanced analytics
IAM Role: Create IAM Role for Lambda Execution and assigns Read Only S3 permission
Lambda Function: Create Function which get's triggered when SNS receives notification, reads the contents from s3 and stores them in Elasticsearch Domain
Outputs:

S3Bucket:
Description: "S3 Bucket Name where CloudTrail Logs are delivered"
Value: !Ref S3Bucket
LambdaFunction:
Description: "Lambda Function that reads CloudTrail logs and stores them into Elasticsearch Domain"
Value: !GetAtt Function.Arn
ElasticsearchUrl:
Description: "Elasticsearch Domain Endpoint that you can use to access the CloudTrail logs and analyze them"
Value: !GetAtt ElasticsearchDomain.DomainEndpoint
Resources:
SNSTopic:
Type: AWS::SNS::Topic
SNSTopicPolicy:
Type: "AWS::SNS::TopicPolicy"
Properties:
Topics:
- Ref: "SNSTopic"
PolicyDocument:
Version: "2008-10-17"
Statement:
-
Sid: "AWSCloudTrailSNSPolicy"
Effect: "Allow"
Principal:
Service: "cloudtrail.amazonaws.com"
Resource: "*"
Action: "SNS:Publish"
S3Bucket:
Type: AWS::S3::Bucket
S3BucketPolicy:
Type: "AWS::S3::BucketPolicy"
Properties:
Bucket:
Ref: S3Bucket
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Sid: "AWSCloudTrailAclCheck"
Effect: "Allow"
Principal:
Service: "cloudtrail.amazonaws.com"
Action: "s3:GetBucketAcl"
Resource:
!Sub |-
arn:aws:s3:::${S3Bucket}
-
Sid: "AWSCloudTrailWrite"
Effect: "Allow"
Principal:
Service: "cloudtrail.amazonaws.com"
Action: "s3:PutObject"
Resource:
!Sub |-
arn:aws:s3:::${S3Bucket}/AWSLogs/${AWS::AccountId}/*
Condition:
StringEquals:
s3:x-amz-acl: "bucket-owner-full-control"
CloudTrail:
Type: AWS::CloudTrail::Trail
DependsOn:
- SNSTopicPolicy
- S3BucketPolicy
Properties:
S3BucketName:
Ref: S3Bucket
SnsTopicName:
Fn::GetAtt:
- SNSTopic
- TopicName
IsLogging: true
EnableLogFileValidation: true
IncludeGlobalServiceEvents: true
IsMultiRegionTrail: true
FunctionIAMRole:
Type: "AWS::IAM::Role"
Properties:
Path: "/"
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
- "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Sid: "AllowLambdaServiceToAssumeRole"
Effect: "Allow"
Action:
- "sts:AssumeRole"
Principal:
Service:
- "lambda.amazonaws.com"
ElasticsearchDomain:
Type: AWS::Elasticsearch::Domain
DependsOn:
- FunctionIAMRole
Properties:
DomainName: "cloudtrail-log-analytics"
ElasticsearchClusterConfig:
InstanceCount: "2"
EBSOptions:
EBSEnabled: true
Iops: 0
VolumeSize: 20
VolumeType: "gp2"
AccessPolicies:
Version: "2012-10-17"
Statement:
-
Sid: "AllowFunctionIAMRoleESHTTPFullAccess"
Effect: "Allow"
Principal:
AWS: !GetAtt FunctionIAMRole.Arn
Action: "es:ESHttp*"
Resource:
!Sub |-
arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/cloudtrail-log-analytics/*
-
Sid: "AllowFullAccesstoKibanaForEveryone"
Effect: "Allow"
Principal:
AWS: "*"
Action: "es:*"
Resource:
!Sub |-
arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/cloudtrail-log-analytics/_plugin/kibana
ElasticsearchVersion: "5.5"
Function:
Type: 'AWS::Serverless::Function'
DependsOn:
- ElasticsearchDomain
- FunctionIAMRole
Properties:
Handler: index.handler
Runtime: python2.7
CodeUri: ./
Role: !GetAtt FunctionIAMRole.Arn
Events:
SNSEvent:
Type: SNS
Properties:
Topic: !Ref SNSTopic
Environment:
Variables:
es_host:
Fn::GetAtt:
- ElasticsearchDomain
- DomainEndpoint

Packing Artifacts and uploading them to s3:

Run the following command to upload your artifacts to S3 and output a packaged template that can be readily deployed to CloudFormation.

aws cloudformation package \
--template-file template.yaml \
--s3-bucket bucket-name \
--output-template-file serverless-output.yaml

Deploying this AWS SAM to AWS CloudFormation:

You can use aws cloudformation deploy CLI command to deploy the SAM template. Under-the-hood it creates and executes a changeset and waits until the deployment completes. It also prints debugging hints when the deployment fails. Run the following command to deploy the packaged template to a stack called cloudtrail-log-analytics:

aws cloudformation deploy \
--template-file serverless-output.yaml \
--stack-name cloudtrail-log-analytics \
--capabilities CAPABILITY_IAM

Refer to the documentation for more details.

I recommend reading about Elasticsearch Service Access Policies using the documentation and modify the Access policy of the Elasticsearch domain to further fine tune the access policy.

Once the Serverless application is deployed in your AWS account, It will automatically store the AWS CloudTrail data into Amazon Elasticsearch Service as soon as the log is delivered to s3. With the data in Elasticsearch, you can use Kibana to visualize the data in Elasticsearch and create the dashboards that you need on the AWS CloudTrail data.

--

--

Kuldeep
All things cloud

Cloud Architect | Cloud Evangelist | Conference Speaker | Principal Engineer @ Expedia Group