Ingesting data in Kinesis Data Streams using KPL for On-premise application

Carlos Andres Zambrano Barrera
Globant
Published in
4 min readAug 22, 2020

Within Amazon Kinesis we could find different versions of the service (Kinesis Data Firehose, Kinesis Data Analytics, Kinesis Data Streams), in this case, I worked with Kinesis Data Stream which helps us to process data in real-time and is widely used when we have data from applications, logs, social media feeds, IoT, among others. I had an on-premise JAVA application which generated a lot of information from the behaviour of the users and all that information had to be consulted in AWS. So the information flow was:

On-premise → Kinesis Data Stream → Kinesis Data Firehose → S3 → Query from Athena

Working with this scenario I had some challenges, the idea was to send data from an on-premise application to a Kinesis Data Stream, but all of this traffic needs to go through a VPN tunnel between the customer and AWS.

Architecture

Challenge # 1 — Install the KPL on-premise

To achieve this I use the AWS documentation.

However, we had some inconveniences such as the handling of the access key ID and Secret Access Key in the on-premise application so that it could publish data in kinesis.

The following keys must be set in the on-premise server.

export AWS_ACCESS_KEY_ID = …
export AWS_SECRET_KEY = …

And in the JAVA library, SampleProducerConfig.java configure its Stream and the region where it is located:

public static final String STREAM_NAME_DEFAULT = "MyStream";public static final String REGION_DEFAULT = "MyRegion";

Challenge # 2 — VPC Endpoint Configuration

Security Group

At the security group level, authorize incoming traffic from the VPN IP for all traffic.

VPC Endpoint Policy

At the policy level of the VPC endpoint, it should be left as follows, you need to specify in the resource section the ARN of the Kinesis Stream:

{
"Statement": [
{
"Sid": "AccessToSpecificDataStream",
"Principal": "*",
"Action": "kinesis:*",
"Effect": "Allow",
"Resource": "arn:aws:kinesis:us-west-1::stream/MyStream"
}
]
}

Challenge # 3 — Testing and Troubleshooting

The first thing from the on-premise instance should be run the following commands:

dig kinesis.us-west-1.amazonaws.com

It must be verified that the IPs in the ANSWER SECTION are those assigned to the ENIs of the VPC Endpoint:

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.amzn2.0.4 <<>> kinesis.us-west-1.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49713
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kinesis.us-west-1.amazonaws.com. IN A
;; ANSWER SECTION:
kinesis.us-west-1.amazonaws.com. 60 IN A PrivateIP_1
kinesis.us-west-1.amazonaws.com. 60 IN A PrivateIP_2
;; Query time: 2 msec
;; SERVER: 172.31.0.2#53(172.31.0.2)
;; WHEN: Thu Aug 20 15:00:44 UTC 2020
;; MSG SIZE rcvd: 92

Use telnet to test:

[ec2-user@ip-172-31-29-23 ~]$ telnet kinesis.us-west-1.amazonaws.com 443
Trying 172.31.20.159...
Connected to kinesis.us-west-1.amazonaws.com.
Escape character is '^]'.

Check the Kinesis Data Stream and verify the incoming data — Bytes in the monitoring tab, so you could see some data in the graph:

Configure VPC Flow logs and see in cloudwatch the traffic:

1- Create the log stream en Cloudwatch.

2- Create a role with a trust policy for VPC Flow logs.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"vpc-flow-logs.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}

3- Create the VPC Flow Logs with the ALL TRAFFIC and check in the cloudwatch status to check the incoming traffic to the ENIs.

Learned lessons

  • For security best practices, it is recommended in these architectures that the traffic goes through a VPN or a Direct Connect.
  • Check in the on-premise app that the DNS resolution to the kinesis endpoint is addressed by the VPN tunnel.
  • Before configuring the on-premise app, it should be tested with the KPL test app.
  • Once the example application works on-premise, now it is time to spend customizing the application.

--

--

Carlos Andres Zambrano Barrera
Globant

AWS x10, Tech Director en Globant con más de 7 años de experiencia en AWS.