AWS SAA-27: Kinesis Data Streams

Overview on Kinesis Data Streams

Kiran Chhablani
3 min readOct 25, 2023
  • Amazon Kinesis Data Streams is a real-time data streaming service provided by AWS that enables you to ingest, process, and analyse streaming data at scale
  • It is designed to handle high-throughput, real-time data from various sources, allowing you to react to and gain insights from the data as it arrives
  • It is made of multiple shards and they are numbered 1,2,…..,N, you have to provision ahead of time
  • Data is split across all shards
  • Shards defines the stream capacity in terms of ingestion and consumption rate
  • Producers produces data and they rely on the SDKs at very very low level (data as in record)
  • Consumers receives records
  • Retention between 1 day to 365 days
  • Ability to reprocess (replay) data
  • Once data is inserted in Kinesis, it can’t be deleted(immutability)
  • Data that shares the same partition goes to the same shard (ordering)
  • Producers: AWS SDK, Kinesis Producers Library (KPL), Kinesis Agent
  • Consumers: write your own Kinesis Client Library AWS SDK, managed: AWS Lambda, kinesis data firehose, kinesis data analytics

Capacity Modes

  1. Provisioned Mode:
  • You choose the number of shards provisioned, scale manually or using API
  • Each shard gets 1MB/s in (or 1000 records/s)
  • Each shard gets 2MB/s out(classic or enhanced fan-out consumer)
  • You pay per shard provisioned per hour

2. On-demand Mode:

  • No need to provision or manage the capacity
  • Default capacity provisioned (4MB/s in or 4000 records/s)
  • Scales automatically based on observed throughput peak during the last 30 days
  • Pay per stream per hour and data in/out per GB

Security

  • Control access/authorization using IAM policies
  • Encryption in flight using HTTPS endpoints
  • Encryption at rest using KMS
  • You can implement encryption/decryption of data on client side
  • VPC endpoints are available for Kinesis to access within VPC
  • Monitor API calls using CloudTrail
  • The capacity limits of a Kinesis Data Stream are defined by the number of shards within the data stream
  • The Limits can be exceeded by either data throughput or the number of reading data calls
  • Each shard allows for 1MB/s incoming data and 2MB/s outgoing data
  • You should increase the number of shards within your stream to provide enough capacity

--

--