Introduction to AWS Kinesis

Ulaş Kılıç
Insider Engineering
4 min readDec 26, 2022

In today’s world, data is almost everything. At the same time, data sources are growing a massive rate. In that scope, a “How we will handle this amount of data” question is coming to our minds. Data streaming solutions are one of the answers. In this article, we will talk about the AWS Kinesis service.
Before going forward, firstly we need to understand the concept of streaming data.

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). Streaming data includes a wide variety of data such as log files generated by customers using your mobile or web applications, eCommerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices or instrumentation in data centers.
Read more

Today Kinesis has three different types of data streaming.

Kinesis Data Streams

Collect gigabytes of data per second and make it available for real-time processing and analysis.

In this type, the data retention period is quite a bit longer when we compare it with others. Also in that model, produced data is immediately delivered to consumers. More suitable for log, event, mobile, gaming, and IoT data collection.

High-level flow for data streams

Kinesis Data Firehose

Prepare and load real-time data streams into data stores and analytics tools.

In this type, the data retention period is short. More suitable for storing data in data stores, analytics tools, etc.

High-level flow for a firehose

Kinesis Data Analytics

Get actionable insights from streaming data in real time.

This type is more suitable for real-time analytics or monitoring.

So, let’s create a pipeline from real life.
Let’s assume, we have an e-commerce website. We want to collect some user interactions like which products are more visited. Depending on popularity, we want to make discounts on our products.

First of all, we need a service for handling product visit events.

{
"userId": "123456",
"product": "foo",
"tags": ["technology", "mobile", "iphone"],
"eventTime": 1652225241,
"pageURL": "https://foo.bar/product/foo"
}

Lambda is a good solution for that. Because this event will send almost every product page visit.

const AWS = require('aws-sdk')
const kinesis = new AWS.Firehose()

exports.handler = (event, context, callback) => {
let payload = event.body

return new Promise((resolve, _) => {
kinesis
.putRecord({
Records: payload,
DeliveryStreamName: 'my-kinesis-delivery-stream',
})
.promise()
.then(() => {
resolve({
success: true,
})
})
.catch((err) => {
console.log('Kinesis putRecords Error:', err)
resolve({
success: false,
})
})
})
}

The next step is storing/analyzing this data. For storage purposes, we will use S3. For data analysis purposes, we will use Athena.

Depending on the requirement storage/analysis part can be changed like storing data in Elasticsearch or MongoDB etc. It can be doable with another lambda.

End of the day, our overall flow will look like this:

Sample use-case of Firehose

So, let’s go step by step:

Creating delivery stream on firehose:

Step 1: Create a delivery stream

Make configuration for saving our events to S3.

Step 2: Configuration of the delivery stream

For the destination part, we will store data with dynamic partitioning. Whenever a new event is consumed, it will store to s3 pattern like this:

s3://test-space/product-visit/{{year}}/{{day}}/{{month}}

This will be reasonable for aggregating specific date ranges.

Step 3: Destination settings
/product-visit/!{partitionKeyFromQuery:year}/!{partitionKeyFromQuery:month}/!{partitionKeyFromQuery:day}/
Step 4: S3 partition settings

Now our pipeline is ready.

The next step is how we will query and analyze it.

In Athena query editor:

Step 5: Load data into Athena

After that step, we need to provide our S3 path and column specification for preparing our data to query:

Step 6: Create a table in Athena
Step 7: Column specification

After those steps, we can aggregate our results with basic SQL queries

Hopefully, this article gives some insight into AWS Kinesis usage.
See ya next articles :]

--

--