AWS Kinesis to Log Analysis
What kind of name is Kinesis? Well, looks like it means "movement in response of an external stimulus". And in fact, it makes sense… in AWS, Kinesis works as Data Streams, so, it will "receive" something and "supply" this something.
Looking at this definition, you can think: "wait a minute… this is Kafka!!". Yes, I think like you. It's Kafka managed by Amazon!
We have 3 versions of Kinesis in AWS
Kinesis Data Streams — Used for real-time data capture.
Kinesis Data Firehose — Used for "near real-time"
Kinesis Data Analytics — Used for analysis of real-time data
Let's say you have an application and you need to analysys the logs generated in real time to get some actions. This application is not on AWS, let's say that it's on-premise in some Linux server you have.
First thing, let's get the "Fake Apache Log Generator" into Linux. You can use your application also, feel free to use any source.
Second thing you need to do is create the Kinesis Firehose Stream to get the logs and ingest into S3. Let's use AWS portal for this (you could use CLI, but let's go in some easy way, as we're only testing one stream). Go to Kinesis page in AWS portal and select Kinesis Data Firehose
Step 1: Name and Source: Just insert the name. Note that this name will be used into our server. I'm using server-log-stream
Step 2: Process records: Keep default values.
Step 3: Choose a destination: Select S3 and choose your bucket. In my case, I'm creating a new one called "myapp098123". Remember that S3 bucket is global unique identified, so you need to create your own with a different name.
Step 4: Configure settings. This is the most important step page in a real scenario, where you will configure based on your requirements. For this case, let's keep the defaults, and only thing here is select the IAM role with the permissions. i suggest you to create a new one only for this purpose.
Step 5: Review. Just check and create the stream!
Great, we have now the stream! Let's back to the linux server and install the Kinesis agent. Kinesis provides a lot of ways to send information to streams (KPL, Kinesis SDK, 3rd Party connection, …). let's use the Agent for this purpose.
To configure, you need to edit the /etc/aws-kinesis/agent.json file. This file should include your credentials and all information about your stream. Few notes here:
1. Note the firehose.endpoint. You should put YOUR region.
2. Access and Secret keys, you should put yours (of course)
3. deliveryStream, you should put the name that you choose.
Ok, start the agent and check into the logs if everything is fine.
Let's now generate some fake log and check if the agent is sending to Kinesis
Great! So simple and we have our logs being sent to Kinesis! Now let's create the Elasticsearch service. Let's create a new domain
For this Step, I just put the name and change the instance to t2.small
For Step 2, select your VPC, Subnets and SG. In "Access policy", choose "Allow open access to the domain".
Review and click on finish for this configuration. Now let's create a new stream to work with the aggregated data. The process it's the same as before, but now you will choose as destination, Elasticsearch. Also, you need to change the Linux agent to point now to this stream (agent.json, remember?)
Choose the domain that you create and put as index, on this case, "request_data"
You need to specify the IAM role for this. You will need to give more permission for this role, as Elasticsearch is a managed service. Probably EC2 and the Elasticsearch itself should be fine.
Now let's go to Kinesis Analytics. Click on create and select the name you want
Click on connect streaming data source and choose the Firehose stream you just create
Just go ahead and create! You will now be able to access the SQL Editor and you will be able to start the analysis!
Well, that's it. Kinesis it's a real great tool for Big Data from Amazon. You can just create streams for consumers and publish data into!!!!