Amazon Kinesis Firehose- Send your Apache logs to S3

Amazon Kinesis Firehose is a service which can load streaming data into data stores or analytics tools. You can configure a Firehose delivery stream from the AWS Management Console and send the data to Amazon S3, Amazon Redshift or Amazon Elasticsearch Service. It is a managed service which can scale upto the required throughput of your data. It can also compress, transform, and encrypt the data in the process of loading the stream. In this experiment we configure Firehose delivery stream which accept the Apache web server logs from an EC2 instance and stores into an S3 bucket. Kinesis Data Firehose buffers incoming data before delivering it to Amazon S3. You can choose a buffer size (1–128 MBs) or buffer interval (60–900 seconds). The condition that is satisfied first triggers data delivery to Amazon S3.

AMI: amzn-ami-hvm-2018.03.0.20180811-x86_64-gp2
OS: Amazon Linux 1
Kernel: 4.14.62-65.117.amzn1.x86_64
Packages: aws-kinesis-agent-1.1.3–1.amzn1.noarch
AWS Region: us-east-1
  • Configure Firehose
  • Prepare EC2 Instance
  • Install Kinesis Agent
  • Test the setup

Configure Firehose

  1. Login to AWS web console and access Amazon Kinesis service, Click “Get started”.

2) Next screen, click “Create delivery stream” in the Firehose section.

3) In the “Step 1: Name and source” enter your Firehose Delivery stream name . Note down this Delivery stream name as you will be using this when you configure Kinesis Agent for EC2 instance. Make sure that Source selected as “Direct PUT or other sources”. Click Next.

4) In the “Step 2: Process records” page select both “Record transformation” and “Record format conversion” as Disabled. ( Record Transformation allows your streams go though Lambda functions to transform the record and inject back to Firehose. With “Record format conversion”, Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3 using Amazon ETL service Glue). Click Next.

5) In “Step 3: Choose destination” page select Destination as “Amazon S3”.

Against S3 bucket click “Create new” and create a new S3 bucket. Also enter a Prefix. I used apache-log-105 and server-1 respectively.

6) In “Step 4: Configure settings” page , click “Create new or choose” for IAM role.

When new window opens up click Allow so that a new IAM Role firehose_delivery_role is created automatically. When control comes back to old window Click Next. Rest other options can be default.

7) in the “Step 5: Review” Page click “Create delivery stream” and your Firehose setup is ready now.

Prepare EC2 Instance

  1. Create a IAM Role which has the write permission to Firehose Delivery stream.

2) Create the Instance. I have used Amazon Linux 1 in us-east-1 Region. Attach the above IAM Role to the instance

Install Kinesis Agent

  1. Login to the EC2 instance as root and Install aws-kinesis-agent RPM package
# yum install aws-kinesis-agent

2) Edit the file /etc/aws-kinesis/agent.json and add the lines below (Note: Since I have used us-east-1 as Region, I don’t need to mention in the file as it is the default Region. Mention the AWS Region name, otherwise.)

{
"cloudwatch.emitMetrics": true,
"flows": [
{
"filePattern": "/var/log/httpd/access_log",
"deliveryStream": "logdata-1"
}
]
}

Make sure that aws-kinesis-agent-user has read permission for the logfile access_log. For the test setup you can enter the commands below

# chmod 755 /var/log/httpd
# chmod 644 /var/log/httpd/access_log

3) Start the Kinesis Agent service. Adjust the commands with your Linux version. I have used Amazon Linux AMI-1.

# /etc/init.d/aws-kinesis-agent restart
# chkconfig aws-kinesis-agent on

Note: Amazon Linux 2, use the command# yum install –y https://s3.amazonaws.com/streaming-data-agent/aws-kinesis-agent-latest.amzn1.noarch.rpm

Testing the Setup

  1. Got to your S3 console and access the S3 Bucket which is set for logging.

Conclusion

Firehose delivery stream can load the data into Amazon S3, Amazon Redshift or Amazon Elasticsearch Service. Now we know how to configure a Firehose Delivery stream and send the Apache logs from an EC2 instance to S3 Bucket with help of Amazon Firehose. This is one of the many ways you can send the logs to S3. You have option of selecting the best based on your scenario.