Unlocking the Power of Amazon CloudWatch Anomaly Detection for Secrets Manager

Published in

CyberArk Engineering

8 min readDec 13, 2023

Line graph with multiple peaks and troughs, representing data over time. There are two lines on the graph, one in red and another in white, indicating two different sets of data being compared or analyzed. Five specific points/areas on the graph are circled in red, highlighting significant peaks or patterns worth noting.

AWS Secrets Manager is a robust and secure service that simplifies the management of sensitive information by offering centralized control, automated rotation of credentials and integration with other AWS services for enhanced security and ease of use. However, it is also known to deliver minimal visibility features, limiting detailed insights into access and usage patterns.

To discover which of your secrets is used the most, not used at all or used in a suspicious way, you’ll probably need to dig into your cloud environment’s audit services and then store the audit logs somewhere for further analysis.

While relying on audit logs is an essential step in understanding secrets usage patterns within your cloud environment, it’s only part of the equation since these patterns may change over time.

Here’s where Amazon CloudWatch Anomaly Detection steps in as a powerful ally.

How CloudWatch Anomaly Detection Leverages Machine Learning Algorithms

Beyond the historical data provided by audit logs, CloudWatch Anomaly Detection uses machine learning algorithms to determine the “normal” range of secrets utilization, continuously adjusting this range with every new audit log being added.

When CloudWatch Anomaly Detection notices something unexpected, like a sudden spike in secrets usage or an unusual access pattern, it raises an alert.

This proactive approach lets you quickly respond to possible security risks and irregular operational activities, making it an essential tool for managing secrets and keeping your cloud environment secure.

To gain a comprehensive view of our secrets usage patterns, we had to bridge the gap between AWS Secrets Manager and CloudWatch Anomaly Detection.

A flow diagram with four main elements connected by arrows, illustrating a process in security management. The first element is an icon labeled “Secrets Manager”. An arrow leads from the Secrets Manager to a question mark, representing an unknown step in the process. Another arrow leads from the question mark to an icon labeled “Anomaly Detection” which features symbols representing analytics or data processing and a final arrow leads to another icon labeled “Alarm”.

Before we get into how we did it, let’s go over the two main types of anomaly detection approaches.

Threshold-Based Anomaly Detection vs Machine Learning Anomaly Detection

There are several approaches to anomaly detection. The most famous one is threshold-based anomaly detection, a method of identifying data points that deviate significantly from a predefined threshold or range of acceptable values. It is a simple yet effective technique for detecting prominent or static issues with devices or applications.

Let’s illustrate it with an example. If the user normally creates or modifies about ten files per day, but suddenly creates or modifies a hundred files in one day, that could be a point of anomaly that warrants further investigation.

Threshold-based anomaly detection relies on predefined static thresholds and is suitable for these types of simpler, well-defined scenarios.

However, it may miss complex anomalies and require constant manual adjustment.

On the other hand, machine learning-based anomaly detection uses advanced algorithms to learn and adapt to changing data patterns. It offers greater flexibility, sensitivity, and adaptability, making it well-suited for handling complex and dynamic data.

For example: Where would you set a threshold in this graph?

To answer this question, let’s explore the moving parts of a system that allows us to continuously monitor and detect complex anomalies.

Meet the Amazon CloudWatch Family

Amazon CloudWatch is a monitoring and observability service that helps you collect, store, and analyze data from various resources within your infrastructure that is hosted on AWS, as well as from applications and services that you run on AWS. Amazon CloudWatch provides insights into the performance, operational health and resource utilization of your resources and applications on AWS.

Key features include:

Metrics
Dashboards
Alarms
Logs
Events
Insights
Synthetics
Anomaly Detection

I highlighted the ones that we’re going to talk about in this post that are related to Anomaly Detection.

Amazon CloudWatch Metrics

CloudWatch collects and stores data in the form of metrics, which are numerical data points representing various aspects of your resources’ performance. These metrics can include CPU utilization, network traffic, disk usage and more.

Amazon CloudWatch Alarms

CloudWatch enables you to set up alarms based on predefined or custom thresholds. Alarms can be triggered when a metric crosses a certain threshold, such as CPU usage going above a certain percentage. When an alarm is triggered, you can configure actions to be taken, such as sending notifications or performing automated actions using AWS services like AWS Lambda.

Amazon CloudWatch Logs

CloudWatch Logs allows you to collect, store and analyze log data from your applications and services. You can search and filter logs, create metric filters to extract specific information from logs and set up alarms based on log patterns.

And last but not least…

Amazon CloudWatch Anomaly Detection for Metrics

CloudWatch Anomaly Detection uses machine learning to automatically detect and alert you about unusual behavior in your metrics. It continuously collects and analyzes metric data over time to establish patterns and trends. By understanding these patterns, the system can accurately determine what is considered ‘normal’ behavior. When there’s a deviation from these established patterns, CloudWatch Anomaly Detection raises an alert, signaling a potential anomaly. This approach allows it to identify and notify about unusual activities, ensuring timely responses to security threats or potential issues related to secret usage and beyond before they cause significant impact.

Finding a Metric that Emphasizes Secret Utilization

With the knowledge that Amazon CloudWatch Anomaly Detection relies on CloudWatch metrics, we turned to look for a Secrets Manager CloudWatch metrics that emphasize secrets utilization. Secrets Manager currently provides only a single CloudWatch metric, which is the “resource count” metric. While this metric offers a basic level of insight into the number of secrets, it presents limitations when it comes to providing detailed information on secrets utilization. In practice, monitoring secret usage effectively demands metrics that go beyond a simple count.

So we had to broaden our scope.

What Emphasizes Secret Utilization?

We realized that the standard resource count metric wasn’t up to the task of giving us insights into how secrets were actually being used. To solve this problem, we needed a new metric that could provide a clearer picture of secrets utilization. In our search for a solution, we turned to AWS CloudTrail, which monitors activities across all Amazon services, including Amazon Secrets Manager, and discovered that Amazon EventBridge can match Secrets Manager CloudTrail events, which was a game-changer for us.

Recently, Amazon published a blog post introducing support for read-only management events in Amazon EventBridge where they discuss how to enable read-only management events from CloudTrail, how to create an EventBridge rule for read-only management events and how to detect anomalous Secrets Manager GetSecretValue API Calls. However their suggested anomaly detection was based on regex (rule-based), filtering only Secrets Manager CloudTrail events with “userAgent”: “aws-cli/*” for further investigation by a security team.

But this approach can be leveraged into a much wider range of use cases. In order to identify unknown anomalous secrets usage patterns we need to measure how secrets are utilized. Thankfully, one of many Amazon EventBridge targets is AWS CloudWatch Log Group, so let’s start by streaming our AWS Secrets Manager CloudTrail events to it and see where that leads us.

A flow diagram with six main elements connected by arrows, illustrating a process in security management. The first element is an icon labeled “Secrets Manager”. An arrow leads from it to a Cloud Trail icon. An arrow leads from it to an Event Bridge icon. An arrow leads from it to a question mark, representing an unknown step in the process. Another arrow leads from the question mark to an icon labeled “Anomaly Detection” and a final arrow leads to another icon labeled “Alarm”.

Log Secrets Manager AWS CloudTrail Events

We created an EventBridge rule to match Secrets Manager CloudTrail events (We chose the default event bus because it’s the only one that allows us to match AWS service events from AWS CloudTrail log entries). This configuration seamlessly routes Secrets Manager CloudTrail events into any target supported by Amazon EventBridge.

Since our destination is CloudWatch Anomaly Detection which relies on CloudWatch metrics, we chose to target the Secrets Manager CloudTrail events to a designated CloudWatch Log Group.

With this integration in place, we could finally see the Secrets Manager CloudTrail events streaming into our designated CloudWatch Log Group.

A flow diagram with seven main elements connected by arrows, illustrating a process in security management. The first element is an icon labeled “Secrets Manager”. An arrow leads from it to a Cloud Trail icon. An arrow leads from it to an Event Bridge icon. An arrow leads from it to a Cloud Watch icon. An arrow leads from it to a question mark. Another arrow leads from the question mark to an “Anomaly Detection” icon and a final arrow leads to an to an icon labeled “Alarm”.

Now that we have a stream of Secrets Manager CloudTrail events into a designated CloudWatch Log Group, all we have to do is to set up a CloudWatch Metric and feed it to the CloudWatch Anomaly Detection.

CloudWatch Logs Metrics for the Win

Since each log entry in our CloudWatch Log Group originated from Secrets Manager CloudTrail events, it’s an indication of a secret usage.

We could use a CloudWatch custom metric, but if we think about it we can also use the IncomingBytes CloudWatch Logs default metric as an indication of the volume of log events at a given time in that log group.

Now all that’s left is to set the CloudWatch Anomaly Detection on the IncomingBytes metric for that CloudWatch Log Group. Each Secrets Manager CloudTrail event that’s being ingested to EventBridge counts as an increase in the IncomingBytes. The sum of these events at a given period of time indicates the volume of secrets utilization.

Now that we can detect anomalies, let’s look at how we can configure an alarm.

Anomaly Detection-Based Alarm Configurations

The CloudWatch Anomaly Detection-based alarms allow you to set a value for the anomaly detection threshold. This threshold is tied to standard deviations and works according to the 68–95–99.7 rule. A standard normal distribution is a normal distribution with mean μ = 0 and standard deviation σ = 1.

A normal distribution. The histogram is symmetrical and bell-shaped, There are percentage labels: 99.7%, 95%, and 68%, that marks the proportion of data within certain numbers of standard deviations from the mean in a normal distribution. There are also specific percentages labeled at each bar of the histogram to indicate the proportion of data each bar represents. The x-axis is labeled with “µ -30”, “µ -20”, “µ -σ”, “µ”, “µ +σ”, “µ +20”, and “µ +30” indicating standard deviations from the mean.

Source: Wikipedia

Here are examples of different anomaly detection thresholds (standard deviation values) and how they affect when an alarm would be triggered:

A standard deviation of σ = 1 (e.g., the range of μ-1 and μ+1). This means variations in metric value for that log group would not trigger an alarm if the deviations fall within what is seen 68% of the time. If the deviation’s magnitude is greater than what is typically seen 68% of the time, it triggers the alarm.
A standard deviation of σ = 2. This deviation only triggers an alarm if it is greater than what has historically been seen 95% of the time.
A standard deviation of σ = 3. This deviation only triggers an alarm if it is greater than what has historically been seen 99.7% of the time.

Zero Lines of Custom Code

So there you go, a real time Secrets utilization Machine learning anomaly detection ingestion pipeline with nothing but service-to service-integration!

Thanks to AWS’s service-to service integrations, we could focus entirely on finding the right metric that emphasizes Secrets utilization, and even then it is a default CloudWatch Log Group Metric. By the way, I focused on Secrets utilization, but using this pipeline you can pretty much create an anomaly detection for anything you want.