AWS DevOps — Part 2 Auditing and Messaging

Jakob Essig
Nov 5 · 5 min read

In the previous tutorial you learned about how developers can utilize AWS services, CodeCommit, CodePipeline, CodeBuild, and CodeDeploy to mature code. This tutorial you will learn about monitoring, auditing, and messaging services that allow developers to track issues and events. These services allow engineers to react in realtime.

Monitoring

CloudWatch

CloudWatch has the ability to collect metrics and generating logs, trigger alarms based on metrics, and perform actions based on events. Developers can check on different components simultaneously to ensure performance and functionality.

Metrics measure a variable, logs retain that information for future analysis. Metrics can be applied to every AWS service, metrics are held in a container called the namespace allowing for the monitoring of multiple applications simultaneously. Dimensions are key/value pairs that define what variables are being measured, monitoring can be increased to 1 second for higher resolution. The AWS SDK will send logs to Cloudwatch or S3 for archiving.

Alarms trigger notifications based on metrics. There are 3 states that an alarm can be in ok — no issues, INSUFFIENT_DATA — not enough information available, or Alarm — the threshold has been met. Events perform an action based on a schedule (cron job) or event pattern in which a metric’s threshold is met and the action is performed.

CloudTrail

CloudTrail provides governance, compliance and auditing of how AWS resources are being used. User activity can be monitored allowing for security personnel to react to incidents in real time. API calls and function executions are also monitored to identify potential threats or malfunctions. Logs are stored in CloudWatch.

Auditing

X-Ray

Application performance and functionality can be traced with X-Ray. Performance bottlenecks can be identified as well as dependencies. X-Ray can be used with Lambda, Elastic Beanstalk, Elastic Container services, API Gateway, and EC2 or app servers.

X-Ray traces requesting from initialization to termination, segments can be defined for granular analysis of a request, as well as annotation. Daemons monitor traces by sending UDP packets to the X-Ray API on port 2000, the daemon is automatically configured for Lambda and Elastic Beanstalk.

Kinesis

Kinesis allows for real time streaming of big data, useful for stream logs, metrics, Internet of Things (IoT), and click streams. Kinesis is automatically replicated in 3 availability zones. The main features of Kinesis are streams, analytics, and firehose, all three features compound to make powerful, dynamic applications.

Data is streamed in packets of data known as shards. One shard is capable of producing 5 transactions per second able to perform 2MB reads per second or 1,000 writes per second up to 1MB. Partition keys are used to group transactions together, this is why it is important to create highly unique partition keys to prevent overloading one partition. Shards automatically scale when demand increases or decreases.

Streams are the basic feature of Kinesis. Streams collect and distribute data between applications or services such as DynamoDB or Lambda. Stream data is stored for 24 hours, but can be increased to 7 days, when data is ingested it cannot be deleted. Data is consumed from a stream using the AWS CLI, HTTPS, SDK, or Kinesis Client Library (KCL), the KCL is a Java library that reads from a stream, each stream can be read by 1 KCL instance and able to be split up to 6 shards — known as resharding.

Kinesis Firehose is a managed service that processes streams. Firehose scales automatically eliminating the need to provision capacity. Firehose has many of the same features as streams, but provides granular control over how the data is captured and stored.

Analytics allow for continuous querying of data coming through a stream. SQL is used to query data and can analyze data in near real time.

Messaging

Messaging services allow for communication within or between applications. Messages, can be synchronous (app to app) or asynchronous (app to queue to app), asynchronous apps allow for greater scaling when demand increases. AWS has two types of messaging services Simple Queue Service (SQS) and Simple Notification Service (SNS).

SQS

SQS allows for communication between application components. Messages can be sent, received, and stored. SQS has unlimited throughput, capable of scaling from 1 to 10,000 messages per second, with no limit to the amount of messages in a queue. Messages have a maximum size of 256 KB with a default retention of 4 days and can be increased to 14 days, message delivery can be postponed for up to 15 minutes although the default is 0.

Messages consist of a body that is less than 256KB (possible to extend with S3 and an extended client) in size with option attribute metadata or delay, messages are identified by an identifier and MD5 hash. Messages are consumed by pulling up to 10 messages from the SQS, deletion is handled with messaging id and receipt handle.

When a consumer polls a message, the message is not accessible to other consumers for a defined period known as the visibility timeout. The default timeout is 30 seconds, but can range from 0 to 12 hours; if the timeout period is too long and a message fails it is a long time before it can be read again, too low and the message will be partially re-read potentially multiple times. After a message is read, it will be deleted. Messages that are not consumed after multiple attempts will go to the dead letter queue based on the redrive policy, messages in this queue are useful for debugging, but must be processed before the defined expiration date.

SQS can utilize long polling to prevent unnecessary API calls. Long polling allows the queue to wait for a message to appear before repolling the queue. Queues can be first in first out (FIFO), FIFO queues process message in order to the consumer. FIFO messages are only sent once and are checked for deduplication by checking the SHA-256 hash of the body. The sequence of a FIFO queue is defined by the group id. FIFO queues have a throughput of 3,000 per second.

SNS

SNS coordinates the flow of messages from publishers to subscribers. Publishers can push to one or many subscribers. Each topic is capable of serving 10,000,000 subscriptions with a limit of 100,000 topics. Subscribers include SQS, HTTP/HTTPS, Lambda, email, SMS, or mobile push notifications.

SNS and SQS used in conjunction product a fan out pattern of distributing messages. SNS can push to multiple SQS receivers. This allows for a fully decoupled application design, resulting in no data loss. Further, SQS receivers are able to added later, allowing a delay in processing and retries.

Importance of Auditing, Monitoring, and Messaging

The exchange of information is essential to producing an effective, responsive, functional application. CloudTrail and CloudWatch allow technicians to monitor events from an application or AWS infrastructure. X-Ray and Kinesis provide auditing services that monitors application functions or manage incoming data. SQS and SNS are messaging services that allow for communication within or between application components.

Jakob Essig

Software Engineer based in Denver, CO. Bring technology to people through tutorials, reviews, and opinions. Enjoys playing guitar, hiking, and photography

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade