AWS SAA-27: Kinesis Data Streams
Overview on Kinesis Data Streams
3 min readOct 25, 2023
- Amazon Kinesis Data Streams is a real-time data streaming service provided by AWS that enables you to ingest, process, and analyse streaming data at scale
- It is designed to handle high-throughput, real-time data from various sources, allowing you to react to and gain insights from the data as it arrives
- It is made of multiple shards and they are numbered 1,2,…..,N, you have to provision ahead of time
- Data is split across all shards
- Shards defines the stream capacity in terms of ingestion and consumption rate
- Producers produces data and they rely on the SDKs at very very low level (data as in record)
- Consumers receives records
- Retention between 1 day to 365 days
- Ability to reprocess (replay) data
- Once data is inserted in Kinesis, it can’t be deleted(immutability)
- Data that shares the same partition goes to the same shard (ordering)
- Producers: AWS SDK, Kinesis Producers Library (KPL), Kinesis Agent
- Consumers: write your own Kinesis Client Library AWS SDK, managed: AWS Lambda, kinesis data firehose, kinesis data analytics
Capacity Modes
- Provisioned Mode:
- You choose the number of shards provisioned, scale manually or using API
- Each shard gets 1MB/s in (or 1000 records/s)
- Each shard gets 2MB/s out(classic or enhanced fan-out consumer)
- You pay per shard provisioned per hour
2. On-demand Mode:
- No need to provision or manage the capacity
- Default capacity provisioned (4MB/s in or 4000 records/s)
- Scales automatically based on observed throughput peak during the last 30 days
- Pay per stream per hour and data in/out per GB
Security
- Control access/authorization using IAM policies
- Encryption in flight using HTTPS endpoints
- Encryption at rest using KMS
- You can implement encryption/decryption of data on client side
- VPC endpoints are available for Kinesis to access within VPC
- Monitor API calls using CloudTrail
- The capacity limits of a Kinesis Data Stream are defined by the number of shards within the data stream
- The Limits can be exceeded by either data throughput or the number of reading data calls
- Each shard allows for 1MB/s incoming data and 2MB/s outgoing data
- You should increase the number of shards within your stream to provide enough capacity