Logs for Service Monitoring

Unleashing the Hidden Power of Logs!

Muhammad Izzuddin Al Fikri
Haraj Tech
4 min readJul 12, 2024

--

Photo by Simone Hutsch on Unsplash

In Haraj, we use microservices architecture. While it offers numerous benefits, it also comes with challenges. One of them is monitoring and ensuring each service is running well as expected.

One of our strategies is utilizing structured logs and the “plain old” AWS CloudWatch. In this article, we will share how we did it.

Structured Logs

Implementing structured logs in our services is the most fundamental building block for unlocking the full potential of AWS CloudWatch.

What are structured logs, and why do we need them?

Structured logging is an approach where logs are stored in a consistent and structured format. Here are some key reasons why structured logging is essential for monitoring microservices-based services:

  1. Consistency and Ease of Analysis: A consistent log format makes it much easier to analyze, debug, and identify issues quickly using log management tools like AWS CloudWatch Logs.
  2. Automatic Alarms with Custom Metrics: Structured data makes creating custom metrics from logs easy. For instance, we can create metrics to count the number of failed requests due to specific errors or corrupted data in a workflow process. These metrics can trigger alarms, allowing us to detect problems sooner.

We use JSON for our structured logs because it is widely used, flexible, and most importantly, developers are familiar with it.

Below is an example of a structured log in JSON format for a REST API service. The sample code for the service is available here.

{"time":"2024-07-19T08:54:28.205679162Z","level":"INFO","msg":"server running on port 8000"}
{"time":"2024-07-19T08:54:28.960020621Z","level":"INFO","msg":"request processing time: 53.581µs","req_method":"GET","req_path":"/transactions/1","req_query":""}
{"time":"2024-07-19T08:54:31.958675787Z","level":"INFO","msg":"request processing time: 58.101µs","req_method":"GET","req_path":"/transactions","req_query":""}
{"time":"2024-07-19T08:56:54.438630545Z","level":"INFO","msg":"request processing time: 50.111µs","req_method":"POST","req_path":"/transactions","req_query":"anonymous=true"}
{"time":"2024-07-19T08:57:13.401942813Z","level":"INFO","msg":"request processing time: 28.86µs","req_method":"GET","req_path":"/transactions/3","req_query":""}
{"time":"2024-07-19T08:57:23.510930706Z","level":"WARN","msg":"transaction not found","transaction_id":4}
{"time":"2024-07-19T08:57:23.510956746Z","level":"INFO","msg":"request processing time: 29.21µs","req_method":"GET","req_path":"/transactions/4","req_query":""}
{"time":"2024-07-19T08:58:21.137593838Z","level":"ERROR","msg":"transaction has empty user ID","payload":{"user_id":0,"name":"Order for Galaxy Journal Book","items":["Journal Book Galaxy Series","Cover for Journal"],"amount":120000}}
{"time":"2024-07-19T08:58:21.137633259Z","level":"INFO","msg":"request processing time: 77.242µs","req_method":"POST","req_path":"/transactions","req_query":""}

Three primary fields will appear in every log entry. These fields are primarily used to develop alarms:

  1. time. The time when the log was written.
  2. level. The severity level of the log.
  3. msg. The general message is conveyed in the log.

Other fields are optional and can be included as needed.

AWS CloudWatch

By leveraging structured logs, we create custom metrics in CloudWatch Logs, build dashboards for quick insights and analysis, and set up CloudWatch Alarms for immediate notifications if something goes wrong with the services at Haraj.

Our simple strategy involves reading specific levels and messages from the logs to create a metric and then setting up alerts if the value of that metric exceeds our defined safe thresholds.

For implementation, we create these metrics and alarms using AWS CloudFormation. By using CloudFormation templates, we achieve complete visibility over the monitoring system. This includes logging within the application, creating metrics, and setting up alarms, all of which are thoroughly documented.

Here’s an example of creating a metric to check the user ID in each incoming transaction request. If a transaction is made without a user ID, a notification will be sent to our team channel at Haraj.

First, we’ll create a custom metric from the log.

MetricFilterForTransactionEmptyUserID:
Type: AWS::Logs::MetricFilter
Properties:
LogGroupName: "/payment-system-log-group"
FilterPattern: '{ $.level = "error" && $.error_message = "transaction has empty user ID" }'
MetricTransformations:
- MetricName: TransactionEmptyUserID
MetricNamespace: PaymentSystem
MetricValue: "1"

Here in the FilterPattern, we extract metric data from ingested log events. Whenever a log with an error level and the “transaction has empty user ID” message is detected, the corresponding metric increases by 1 point.

Additionally, we set up an alarm to notify us if there are more than five requests without a user ID within 1 minute.

AlarmForTransactionEmptyUserID:
Type: AWS::CloudWatch::Alarm
DependsOn: MetricFilterForTransactionEmptyUserID
Properties:
AlarmName: "AlarmTransactionEmptyUserID"
AlarmDescription: "Alarm when there's a transaction with empty user ID. Please check the log manually"
ComparisonOperator: GreaterThanOrEqualToThreshold
EvaluationPeriods: 1
MetricName: TransactionEmptyUserID
Namespace: PaymentSystem
Period: 60 # one minute
Statistic: Sum
Threshold: 5
ActionsEnabled: true
TreatMissingData: notBreaching
AlarmActions:
- "arn:aws:sns:us-east-1:123456789012:AlarmTeamChannel"

In today’s increasingly complex technological landscape, logs play a crucial role as the eyes and ears of our applications, primarily when they serve a large user base. Structured logging provides significant advantages in consistency, visibility, and data analysis, all essential in the microservices architecture.

With log management tools like AWS CloudWatch, we can proactively monitor and respond to issues, ensuring our applications remain reliable and perform optimally.

Understanding and implementing structured logging is about more than enhancing technical capabilities; it’s also about delivering the best user experience, maintaining system reliability, and ensuring we can adapt quickly to changing needs and challenges.

As developers and system maintainers, we must prioritize the quality and reliability of our applications, and structured logging is a pivotal step toward achieving these goals.

So, let’s not underestimate the logging, folks! 😉

Happy coding, and let’s keep building for a better world!

--

--