Ingest and Stream Huawei ECS Log Data in real-time using Amazon Kinesis Firehose and Kinesis Agent
Introduction
The goal of this guide is to show the process of ingesting streaming log data on a Huawei Cloud ECS using AWS Kinesis Agent, aggregating that data using Amazon Kinesis Firehose, and persisting the aggregated data on AWS S3 so that it can be analyzed and visualized.
We need to be familiar with these 3 concepts:
Huawei Cloud ECS
The name Huawei Cloud ECS (Elastic Cloud Service) might confuse those who worked with AWS ECS (Elastic Container Service) since they have the same abbreviation but totally different offerings. Huawei ECS is the equivalent of EC2 on AWS.
In this guide, we will create a Huawei ECS instance running CentOS, this instance will run Amazon Kinesis agent.
Amazon Kinesis Firehose
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, and others...
Amazon Kinesis agent
Amazon Kinesis agent is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose. The agent continuously monitors a set of files and sends new data to your Kinesis Data Firehose delivery stream.
Architecture
In this architecture example, the web server is a Huawei Elastic Cloud Service (ECS) instance.
- We will install Amazon Kinesis Agent on this Linux instance.
- We will create a new AWS IAM user account to give Huawei ECS instance access to the Firehose API
- We will configure the AWS-kinesis-agent to send data to the Firehose delivery stream
- The Kinesis Agent continuously forwards log records to an Amazon Kinesis Data Firehose delivery stream.
- Amazon Kinesis Data Firehose writes each log record to Amazon Simple Storage Service (Amazon S3) for durable storage of the raw log data.
Resources
We will use the AWS console to set up a Firehose Delivery Stream, an S3 bucket to store the upcoming data, and an IAM user
1. Create an Amazon Kinesis Data Firehose Delivery Stream
To create the Amazon Kinesis Data Firehose delivery stream:
Open the Amazon Kinesis console and in the Get Started section, choose Kinesis Data Firehose, and then choose Create Delivery Stream.
- For Delivery stream name, enter hwei-ecs-web-log-ingestion-stream
- For the Source, select Direct PUT or other sources.
- In the Process records screen, keep the default selections
- For the destination screen, choose Amazon S3
- For the S3 bucket, choose Create new, In the Create S3 bucket window, for the S3 bucket name, specify a unique name, for example: kinesis-data-firehose-hwei-cloud-ecs-access-log
- For the IAM role, choose Create or update the IAM role.
The Kinesis Firehose delivery stream is now ready to receive data
2. Create an AWS IAM user
To be able to send data from the Huawei ECS instance, we need to give it permission to use two AWS resources, Kinesis Firehose, and Cloudwatch.
Since the ECS instance is not an AWS service but rather an external “application” that we want to grant access to specific resources, we need to create an IAM user for it instead of an IAM Role.
This is a straightforward step, we open the IAM users management page on the console and click “add users” then we choose a name and give it programmatic access
On the permission screen (permissions), we search and check “AmazonKinesisFirehoseFullAccess” and “CloudWatchFullAccess”
We don’t have to change the remaining values but make sure to copy the newly created user’s Access Key ID and Access Secret Key we will need these credentials to configure the Huawei ECS instance
Now we are ready to set up our ECS instance
3. Launching a Huawei ECS instance and installing Amazon-Kinesis-agent
You can see me on this demo running my instance on open telekom cloud. You might be using also OTC, FE, or huawei cloud. The steps listed are definitely the same :)
You don’t have to worry if you are running a freshly created instance or following the guide on a low-traffic web server. We will install Fake-Apache-Log-Generator which will mock apache access log files for us so we can test our data stream.
Now let’s go to the ECS page on the Huawei Cloud Console and create a new instance.
For the OS, you can choose any OS you want to use, on this guide we are using CentOS 7, make sure to give your instance access to the internet by enabling EIP
On the User Data, let’s install all the dependencies for the Faker script as well as Amazon-Kinesis-Agent
Make sure to allow outbound traffic on the security group attached to this instance
4. Setting up AWS credentials configuration
We have the credentials of the IAM user that we created earlier! now we have the time to put them into use!
After SSH’ing into our ECS instance, we set up the credentials for our instance so it can access AWS resources
- Edit
/etc/sysconfig/aws-kinesis-agent
to specify your AWS Region and AWS access keys.
AWS_ACCESS_KEY_ID=<Your key id>AWS_SECRET_ACCESS_KEY=<Your access key>AWS_DEFAULT_REGION=<Your region>
5. Setting up AWS credentials configuration
To configure and start the agent
- Open and edit the configuration file:
/etc/aws-kinesis/agent.json
- In this configuration file, specify the files (
"filePattern"
) from which the agent collects data, and the name of the delivery stream ("deliveryStream"
) to which the agent sends data. The file name is a pattern, and the agent recognizes file rotations.
Don’t forget to change the value of “deliveryStream” to the name of the deliveryStream you created (if you chose another name)
6. Starting the Amazon-Kinesis-agent
We are almost there!
We will launch the Log mocking python script and start our agent to start listening to the log files being created!
Run this command to start generating fake log files
sudo python /tmp/logs/apache-fake-log-gen.py -n 0 -o LOG &
Now if we go to the /tmp/logs directory, we will see new files being created for which the name ends with *_access_logs
Let’s launch the Amazon-Kinesis-agent to start monitoring those files and streaming that data to Kinesis Firehose
sudo service aws-kinesis-agent start
Our Log Data is Streamed to S3!
Now let’s go to the AWS console and check the bucket that we set as a destination for Kinesis
We did it! New log files are appearing there!
Conclusion
In this guide, we applied a multi-cloud use case for log data ingestion and analysis bringing Huawei Cloud data to AWS. Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications such as digital marketing, application monitoring, fraud detection, ad tech, games, and IoT.