Serverless App: AWS CloudTrail Log Analytics using Amazon Elasticsearch Service

Published in

All things cloud

6 min readDec 10, 2017

In this article, I’m will talk about how you can build a Serverless application using AWS Serverless Application Model (SAM) to perform Log Analytics on AWS CloudTrail data using Amazon Elasticsearch Service. The AWS Serverless Application will help you analyze AWS CloudTrail Logs using Amazon Elasticsearch Service. The application creates CloudTrail trail, sets the log delivery to an s3 bucket that it creates and configures SNS delivery whenever the CloudTrail log file has been written to s3. The app also
creates an Amazon Elasticsearch Domain and creates an Amazon Lambda Function which gets triggered by the SNS message, get the s3 file location, read the contents from the s3 file and write the data to Elasticsearch for analytics.

I have referenced the below Github repo to write this blog post:

ExpediaDotCom/cloudtrail-log-analytics

cloudtrail-log-analytics - Cloudtrail Log Analytics using Amazon Elasticsearch Service - AWS Serverless Application

github.com

Let’s learn about what is AWS CloudTrail, Elasticsearch, Amazon Elasticsearch Service, AWS Lambda and AWS SAM.

What is AWS CloudTrail?

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. This event history simplifies security analysis, resource change tracking, and troubleshooting.

AWS CloudTrail

AWS CloudTrail allows you track and automatically respond to account activity threatening the security of your AWS…

aws.amazon.com

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Elasticsearch: RESTful, Distributed Search & Analytics | Elastic

Distributed, open source search and analytics engine designed for horizontal scalability, reliability, and easy…

www.elastic.co

What is Amazon Elasticsearch Service?

Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.

Amazon Elasticsearch Service - Amazon Web Services (AWS)

Use Amazon Elasticsearch Service to easily deploy, operate and scale Elasticsearch on AWS. Launch your Amazon…

aws.amazon.com

What is AWS Lambda?

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service — all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

AWS Lambda - Serverless Compute - Amazon Web Services

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.

aws.amazon.com

What is AWS Serverless Application Model?

AWS Serverless Application Model (AWS SAM) prescribes rules for expressing Serverless applications on AWS. The goal of AWS SAM is to define a standard application model for Serverless applications.

awslabs/serverless-application-model

serverless-application-model - AWS Serverless Application Model (AWS SAM) prescribes rules for expressing Serverless…

github.com

Now let’s look at how we can build a Serverless App to perform Log Analytics on AWS CloudTrail data using Amazon Elasticsearch Service.

This is the architecture of the CloudTrail Log Analytics Serverless Application:

Architecture for Serverless Application: CloudTrail Log Analytics using Elasticsearch

AWS Serverless Application Model is a AWS Cloudformation template. Before we look at the code for SAM template, let’s work on packaging our AWS Lambda.

On your workstation, create a working folder for building the Serverless Application.

Create a file called index.py for the AWS Lambda:

""" This module reads the SNS message to get the S3 file location for cloudtrail
 log and stores into Elasticsearch. """

from __future__ import print_function
import json
import boto3
import logging
import datetime
import gzip
import urllib
import os
import traceback

from StringIO import StringIO
from exceptions import *

# from awses.connection import AWSConnection
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
                                
logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3', region_name=os.environ['AWS_REGION'])

awsauth = AWS4Auth(os.environ['AWS_ACCESS_KEY_ID'], os.environ['AWS_SECRET_ACCESS_KEY'], os.environ['AWS_REGION'], 'es', session_token=os.environ['AWS_SESSION_TOKEN'])
es = Elasticsearch(
    hosts=[{'host': os.environ['es_host'], 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)
    
def handler(event, context):
    logger.info('Event: ' + json.dumps(event, indent=2))

    s3Bucket = json.loads(event['Records'][0]['Sns']['Message'])['s3Bucket'].encode('utf8')
    s3ObjectKey = urllib.unquote_plus(json.loads(event['Records'][0]['Sns']['Message'])['s3ObjectKey'][0].encode('utf8'))

    logger.info('S3 Bucket: ' + s3Bucket)
    logger.info('S3 Object Key: ' + s3ObjectKey)

    try:
        response = s3.get_object(Bucket=s3Bucket, Key=s3ObjectKey)
        content = gzip.GzipFile(fileobj=StringIO(response['Body'].read())).read()

        for record in json.loads(content)['Records']:
            recordJson = json.dumps(record)
            logger.info(recordJson)
            indexName = 'ct-' + datetime.datetime.now().strftime("%Y-%m-%d")
            res = es.index(index=indexName, doc_type='record', id=record['eventID'], body=recordJson)
            logger.info(res)
        return True
    except Exception as e:
        logger.error('Something went wrong: ' + str(e))
        traceback.print_exc()
        return False

Create a file called requirements for the python packages that are needed:

elasticsearch>=5.0.0,<6.0.0
requests-aws4auth

With the above requirements file created in your workspace, run the below command to install the required packages:

python -m pip install -r requirements.txt -t ./

Create a file called template.yaml that will store the code for AWS SAM:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: >

    This SAM example creates the following resources:

      S3 Bucket: S3 Bucket to hold the CloudTrail Logs
      CloudTrail: Create CloudTrail trail for all regions and configures it to delivery logs to the above S3 Bucket
      SNS Topic: Configure SNS topic to receive notifications when the CloudTrail log file is created in s3
      Elasticsearch Domain: Create Elasticsearch Domain to hold the CloudTrail logs for advanced analytics
      IAM Role: Create IAM Role for Lambda Execution and assigns Read Only S3 permission
      Lambda Function:  Create Function which get's triggered when SNS receives notification, reads the contents from s3 and stores them in Elasticsearch Domain
Outputs:

    S3Bucket:
      Description: "S3 Bucket Name where CloudTrail Logs are delivered"
      Value: !Ref S3Bucket
    LambdaFunction:
      Description: "Lambda Function that reads CloudTrail logs and stores them into Elasticsearch Domain"
      Value: !GetAtt Function.Arn
    ElasticsearchUrl:
      Description: "Elasticsearch Domain Endpoint that you can use to access the CloudTrail logs and analyze them"
      Value: !GetAtt ElasticsearchDomain.DomainEndpoint
Resources:
  SNSTopic:
    Type: AWS::SNS::Topic
  SNSTopicPolicy: 
    Type: "AWS::SNS::TopicPolicy"
    Properties: 
      Topics: 
        - Ref: "SNSTopic"
      PolicyDocument: 
        Version: "2008-10-17"
        Statement: 
          - 
            Sid: "AWSCloudTrailSNSPolicy"
            Effect: "Allow"
            Principal: 
              Service: "cloudtrail.amazonaws.com"
            Resource: "*"
            Action: "SNS:Publish"
  S3Bucket:
    Type: AWS::S3::Bucket
  S3BucketPolicy: 
    Type: "AWS::S3::BucketPolicy"
    Properties: 
      Bucket: 
        Ref: S3Bucket
      PolicyDocument: 
        Version: "2012-10-17"
        Statement: 
          - 
            Sid: "AWSCloudTrailAclCheck"
            Effect: "Allow"
            Principal: 
              Service: "cloudtrail.amazonaws.com"
            Action: "s3:GetBucketAcl"
            Resource: 
              !Sub |-
                arn:aws:s3:::${S3Bucket}
          - 
            Sid: "AWSCloudTrailWrite"
            Effect: "Allow"
            Principal: 
              Service: "cloudtrail.amazonaws.com"
            Action: "s3:PutObject"
            Resource:
              !Sub |-
                arn:aws:s3:::${S3Bucket}/AWSLogs/${AWS::AccountId}/*
            Condition: 
              StringEquals:
                s3:x-amz-acl: "bucket-owner-full-control"
  CloudTrail:
    Type: AWS::CloudTrail::Trail
    DependsOn:
      - SNSTopicPolicy
      - S3BucketPolicy
    Properties: 
        S3BucketName: 
          Ref: S3Bucket
        SnsTopicName: 
          Fn::GetAtt: 
            - SNSTopic
            - TopicName
        IsLogging: true
        EnableLogFileValidation: true
        IncludeGlobalServiceEvents: true
        IsMultiRegionTrail: true
  FunctionIAMRole:
    Type: "AWS::IAM::Role"
    Properties:
        Path: "/"
        ManagedPolicyArns:
            - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
            - "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
        AssumeRolePolicyDocument:
          Version: "2012-10-17"
          Statement:
            -
              Sid: "AllowLambdaServiceToAssumeRole"
              Effect: "Allow"
              Action: 
                - "sts:AssumeRole"
              Principal:
                Service: 
                  - "lambda.amazonaws.com"
  ElasticsearchDomain: 
    Type: AWS::Elasticsearch::Domain
    DependsOn:
      - FunctionIAMRole
    Properties:
      DomainName: "cloudtrail-log-analytics"
      ElasticsearchClusterConfig: 
        InstanceCount: "2"
      EBSOptions: 
        EBSEnabled: true
        Iops: 0
        VolumeSize: 20
        VolumeType: "gp2"
      AccessPolicies: 
        Version: "2012-10-17"
        Statement: 
          - 
            Sid: "AllowFunctionIAMRoleESHTTPFullAccess"
            Effect: "Allow"
            Principal: 
              AWS: !GetAtt FunctionIAMRole.Arn
            Action: "es:ESHttp*"
            Resource:
              !Sub |-
                arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/cloudtrail-log-analytics/*
          - 
            Sid: "AllowFullAccesstoKibanaForEveryone"
            Effect: "Allow"
            Principal: 
              AWS: "*"
            Action: "es:*"
            Resource:
              !Sub |-
                arn:aws:es:${AWS::Region}:${AWS::AccountId}:domain/cloudtrail-log-analytics/_plugin/kibana
      ElasticsearchVersion: "5.5"
  Function:
    Type: 'AWS::Serverless::Function'
    DependsOn:
      - ElasticsearchDomain
      - FunctionIAMRole
    Properties:
      Handler: index.handler
      Runtime: python2.7
      CodeUri: ./
      Role: !GetAtt FunctionIAMRole.Arn
      Events:
        SNSEvent:
          Type: SNS
          Properties:
            Topic: !Ref SNSTopic
      Environment:
        Variables:
          es_host:
            Fn::GetAtt: 
              - ElasticsearchDomain
              - DomainEndpoint

Packing Artifacts and uploading them to s3:

Run the following command to upload your artifacts to S3 and output a packaged template that can be readily deployed to CloudFormation.

aws cloudformation package \
    --template-file template.yaml \
    --s3-bucket bucket-name \
    --output-template-file serverless-output.yaml

Deploying this AWS SAM to AWS CloudFormation:

You can use aws cloudformation deploy CLI command to deploy the SAM template. Under-the-hood it creates and executes a changeset and waits until the deployment completes. It also prints debugging hints when the deployment fails. Run the following command to deploy the packaged template to a stack called cloudtrail-log-analytics:

aws cloudformation deploy \
    --template-file serverless-output.yaml \
    --stack-name cloudtrail-log-analytics \
    --capabilities CAPABILITY_IAM

Refer to the documentation for more details.

I recommend reading about Elasticsearch Service Access Policies using the documentation and modify the Access policy of the Elasticsearch domain to further fine tune the access policy.

Once the Serverless application is deployed in your AWS account, It will automatically store the AWS CloudTrail data into Amazon Elasticsearch Service as soon as the log is delivered to s3. With the data in Elasticsearch, you can use Kibana to visualize the data in Elasticsearch and create the dashboards that you need on the AWS CloudTrail data.

Serverless App: AWS CloudTrail Log Analytics using Amazon Elasticsearch Service

ExpediaDotCom/cloudtrail-log-analytics

cloudtrail-log-analytics - Cloudtrail Log Analytics using Amazon Elasticsearch Service - AWS Serverless Application

What is AWS CloudTrail?

AWS CloudTrail

AWS CloudTrail allows you track and automatically respond to account activity threatening the security of your AWS…

What is Elasticsearch?

Elasticsearch: RESTful, Distributed Search & Analytics | Elastic

Distributed, open source search and analytics engine designed for horizontal scalability, reliability, and easy…

What is Amazon Elasticsearch Service?

Amazon Elasticsearch Service - Amazon Web Services (AWS)

Use Amazon Elasticsearch Service to easily deploy, operate and scale Elasticsearch on AWS. Launch your Amazon…

What is AWS Lambda?

AWS Lambda - Serverless Compute - Amazon Web Services

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.

What is AWS Serverless Application Model?

awslabs/serverless-application-model

serverless-application-model - AWS Serverless Application Model (AWS SAM) prescribes rules for expressing Serverless…

Create a file called index.py for the AWS Lambda:

Create a file called requirements for the python packages that are needed:

Packing Artifacts and uploading them to s3:

Deploying this AWS SAM to AWS CloudFormation:

Written by Kuldeep