CloudWatch Logs vs. Kinesis Firehose
No, I’m not putting-out timber fires.
Logging infrastructure in AWS is awesome, but how do the different options stack-up against each other and what are the appropriate scenarios in which to use each? Let’s see how two popular choices, AWS CloudWatch Logs and Kinesis Data Firehose compare:
None of the current AWS offerings allow us to start sending log records without first setting-up some kind of resource. This is reasonable, of course, because AWS needs to have some data structures in place before messages arrive to ensure they are properly handled.
CloudWatch requires a log group and log stream to exist prior to sending messages. The group serves to collect streams together and provide a single place to manage settings (retention, monitoring, access control) across them, and a stream is a sequence of log events from the same source.
Creating a log group requires only a name, but it’s a good idea to employ the optional tagging to keep things organized:
Likewise, creating a new stream is a simple one-liner:
And, once created, the description of the stream is concise and straightforward:
Trying to create a log group or stream using a name that already exists results in a clear error:
Finally, removing the group and stream can be accomplished in a single command:
That’s a nice, easy API.
A Kinesis Data Firehose delivery stream is designed to take messages at a high velocity (up to 5,000 records per second) and put them into batches as objects in S3. Firehose requires a delivery stream to exist and be active before messages can be sent, and creating one requires a IAM role and an S3 bucket.
To demonstrate, first comes the bucket (unless you already have one for logs):
Next, the IAM role (see the documentation for more detail on how to setup a access for Firehose):
And, to give the role access to the bucket we need to attach an appropriate policy:
And, finally, we can create the delivery stream:
Yeah, that hurt.
Next note that it takes some time for a new stream to reach
ACTIVE state, at which point it can receive messages. A quick check using
describe-delivery-stream will let us know when it’s ready:
And, when we’re done with it we can remove the delivery stream with a one-liner as well (but don’t forget to remove the policy, role and bucket, too):
Setting-up and cleaning-up Firehose takes a lot more work than for CloudWatch Logs.
For example purposes, let’s look at simply sending the tail of some local log event file to CloudWatch. A concrete example of this might be something simple like pushing PiHole events from a Raspberry Pi so you can be alerted when blacklisted domains are being requested inside your network (a sign of malware, perhaps).
When sending messages to CloudWatch Logs you must know if it is the first message, or a subsequent message. This is important because with the first message we must not provide a sequence number, but we are required to provide a sequence number for all subsequent messages.
In addition, we must also provide a timestamp for each message and those timestamps must be in increasing order over the messages included in the command:
The sequence number returned must be provided when sending the next message(s):
tail lines requires keeping track of the sequence token between calls. One way to do that is in an environment variable:
Painful, and brittle. We need to sync the system clock, and set the sequence token via
describe-log-stream when the system reboots (etc.) or messages will fail to send. This seems like extra work since it’s almost certain that the log messages contain a date already.
The command to send a message is very simple and only uses one parameter: the name of the delivery stream:
tail the logs to Firehose is simpler than for CloudWatch logs since we don’t need to provide a timestamp or a sequence token:
Easy, and reliable.
Getting messages into a logging system is necessary but not sufficient. Getting them out is just as important. We need to use them as event triggers, via search or archival/audit (likely all three).
Search over log groups and individual streams is provided in the AWS Console, with no setup needed. The search is basic (keyword and date/time range only) and sometimes quite slow, but it’s easy.
You’ll also find the console provides ways to create custom metrics and send filtered log stream events to Lambda for open-ended processing in small batches. From an operations perspective, the retention period can be configured from “forever” to as short as a day for a log group.
Search over single log files is provided in the S3 Console under the “S3 Select” moniker, but searching over multiple files isn’t. S3 Select is very capable, being based on a SQL-like syntax.
As for event sourcing, when Firehose creates new objects in S3 events can invoke Lambda for efficient, open-ended large batch processing.
For high volumes with long retention and deep analysis, Firehose supports RedShift as a destination. RedShift is a sophisticated, scalable and cost-effective analysis solution. Firehose and CloudWatch logs are also commonly used with ElasticSearch for an easy to use analysis option, but with limited scale and comparatively high cost.
In S3, the log events are stored cheaply, and support random access by time (the key prefix includes the date and hour) and are subject to S3’s powerful data retention policies (send to Glacier, expire, etc.).
Economics matter — but only if you grow out of the free tier. Both CloudWatch and Firehose charge on a metered, pay only for what you you basis so they are genuinely serverless in that regard.
- $0.50 per GB ingested**
- $0.03 per GB archived per month***
- $1.00 per-million-custom-events-generated*****
** There is no Data Transfer IN charge for any of CloudWatch.
*** Data archived by CloudWatch Logs includes 26 bytes of metadata per log event and is compressed using gzip level 6
- $0.029 per GB, Data Ingested, First 500 TB / month
- $0.018 per GB, Data Format Conversion (optional)
- $0.023 per GB, S3 Standard Storage
For quick setup and simple searching of messages, CloudWatch Logs is a winner. The somewhat tedious and sprawling dependencies backing a Firehose means there is more to learn (IAM) and maintain.
From an operations point of view, sending messages with Firehose is as simple and reliable as can be. The limits on Firehose mean each account may have only 50 delivery streams (more by request) and that each delivery stream can accept up to 5,000 records per second. Other limits on the size of individual records and number of records per batch are important, but they are quite high. And, at high volumes the price differences really add up. For example, with 1TB of logs per month with 90 day retention:
- CloudWatch Logs: 1,024 x $0.50 + 3 x 1,024 x $0.03 = $604.16/month
- Firehose: 1,024 x $0.029 + 3 x 1,024 x $0.023 = $100.35/month
Winner? I prefer Firehose. Get past the setup and enjoy the cost-effective, reliable and never-worry-about-scaling option. Unless you like it easy.