AWS Data Analytics — Kinesis Part-1

Kemalcan Bora
BilgeAdam Teknoloji
3 min readMay 23, 2020

How to move data on AWS ?

3 diffent category

1) Real time

  • Kinesis Data Stream
  • Simple Que Service
  • IoT

2) Near-Real time (Reactive action)

  • Kinesis Data Firehose
  • Database Migration Service

3) Batch - History Analysis

Note: This usually called when you want yo move large amount of data

  • Snowball
  • Datapipeline

AWS KINESIS OVERVIEW

  • Alternative to Kafka
  • It’s great if you want to gather data such as application logs metrics IoT data or click streams.
  • It integrates with a lot of processing framework for string pressing frameworks such as Spark or NiFi

Kinesis Streams: Low latency streaming

Kinesis Analytics: Perform real time analytics on Stereams using SQL

Kinesis Firehose: Load Streams into s3, Redshift, ES, Splunk

Architecture Kinesis

Step by step;

Source: ClickStreams, IoT, Metric and Logs

Source goes to Amazon Kinesis Streams if you want to make analyse, compute metric or make a alert you should be use Amazon Kinesis Analytics so if want to store or make realtime dashboard you have to use Amazon Kinesis Firehose and what does Kinesis firehose? Actually it can be deliver data to S3, ES, Redshift or Splunk.

Kinesis Steams Overview

Steams are divided in ordered shards and partitions. Shards meaning is equivalent of partition.

For example: Producer -> 3 * Shard (shard-1, shard-2, shard-3 )->Consumers

Consumer read data with shards. Kinesis Sterams does not store your data forever it just store 24 hour by default so you basically store just 1 day. But if your data is so critical you can store up to 7 days.

  • Kinesis have ability reprocess, replay data.
  • Multiple application can consume the same stream
  • It’s not a database!
  • Once data is inserted in kinesis , it can’t be deleted or immutability it’s append only stream

Kinesis Streams Shards

  • One stream is made of many different shards or partitions.
  • İmportent case is billing! so many shards, many dollar, many billing. You get billed per shard provisions.
  • Batching available or per message put.
  • Number of shards can evolve over time so you can rashard or merge in these operations.
  • Records are ordered per a shard

Producer -> 3 * Shard (shard-1, shard-2, shard-3 )->Consumers.

What our producer do these shards?

Kinesis Streams Records

  • Our producer records made of data blob. Data blob serialized as bytes up to 1MB and represent anything.
  • Record Key: Helps kinesis know to reach shard to send that data to. It’s like user ID.
  • Sequence number: It’s not something the producer sends is something that get added by kinesis after ingestion. Unique identifier for each records put in shards.

LIMITS!

  • Producer can only 1 mb or 1000 per message at write per shard. For example if you have a 10 shards you get 10mb per a second or 10k message.
  • If you over that limit you called ProvisionedThoughputException.

Kinesis have two type consumer.

  1. Consumer Classic
  • 2 mb to sec. to read per a shard across all consumer.
  • 5 API call per second per shard.

2. Consumer Enhanced Fan-Out

  • No API call
  • 2 mb to sec. to read per shard, per enhanced consumer

--

--