Write Streaming data to multiple Data Stores- AWS Kinesis

Giridharsm
AWS-Learning-through-experiments
4 min readJul 3, 2020

I have realized that some of you might be a little (like I was) confused on how streaming data works. How this data is read/processed and how this can then be loaded into multiple destination data stores.

The “multiple destination” data stores concept was confusing cos I always thought streaming data (though persisted durably and securely in a data stream) can be read only once(like once you read the latest record in a stream, its no longer available for processing a second time).

So in my mind, if this data is to be read only once, then should I have a lambda function or some such to duplicate and pass it to multiple destinations? Then that would defeat the purpose of having multiple consumers for a stream application isn’t it?

I know this sounds stupid but coming from a data warehousing side of the IT field, and having worked on persisted data stores all my IT life, wrapping my head around streaming data and Kinesis in general was a little difficult in the beginning.

So why am I even talking about this?

As part of my Data Analytics Certification preparation, I was going through some AWS ‘This is my architecture’ videos. This was to understand the logic behind architect-ing massive data lakes on-premise and on-cloud and how AWS’s offerings can be leveraged for different use cases.

In one particular video that describes Viber’s(a popular messaging service before WhatsApp came into the picture) data lake architecture, they were talking about using kinesis data streams to read data from a producer(on-premise that batches and sends events) and loading this data into apache Storm application sitting on a fleet of EC2 instances and also into a firehose delivery stream that further processes these events and puts them into an S3 bucket for storage and analysis. There is a lot more to this architecture but on a high level and to support my musings, I am limiting the description to this.

So, while the above “watching the video” event was happening in my life, I thought about how I used to be confused about kinesis, streaming data in general and how I was not sure if kinesis data streams could be used to load records (the same records) into multiple destinations(consumers).

This thought process(above) led to me creating a demo streaming application (in Kinesis- why? Cos this is a post about Kinesis) (and .. not really “streaming” cos I demo-ed only one record into the stream. Ha ha.. I am so lazy )

Architecture of my Demo Streaming App:

  1. So I went about creating a Kinesis Data Stream for my demo:

2. Two Kinesis Firehose Delivery Streams:

3. Two S3 Buckets to persist the data:

So once all these were done, I went about creating an amazingly simple code that calls the put_record API using the boto3 library for python.

As you can see, it returned a HTTPStatusCode 200. Which means the put was successful into the kinesis data stream.

So I went ahead and checked the S3 bucket and lo and behold, the records were inserted into both the buckets through my kinesis firehose streams.

I verified the content and they were exactly the same (as what I passed into the kinesis data stream)

So now you know how its done. Or hopefully a little less confused now on how things work in the streaming world.

Quiz:

1. It took some time for the records to show up in my S3 buckets. Why?

Ans: Cos the records are buffered in the kinesis firehose stream before its delivered (that’s why its near-real time). The min buffer time is 1 min and min buffer size is 1 MiB.

By default its as below.(I set it to 1 MiB and 60 seconds). Also, if you are using PutRecords using KPL, and utilizing its batching feature, you will see a similar latency in the data movement. )

References:

This is my architecture video talked about in this article:

Boto3 Documentation for Kinesis:

--

--

Giridharsm
AWS-Learning-through-experiments

I am a Data Engineer by profession. When I am not working on something or learning something new, I am probably reading a book- (only Fantasy Fiction)