One of the big questions we get when we talk about monitor security metrics is, “How can we collect metrics from multiple services at zero cost?”
You might be tempted to say that it’s not possible, but there is a way. The open-source way.
You could argue that monitoring isn’t at zero cost if some part of the project includes engineer hours to build and maintain the project, and that’s fair.
However, the combination of:
- an open-source search engine like ELK
- open-source software for conformity checks like Prowler
- incident response like Cloudcustodian
- security monitoring solution like Wazuh
give you the ability to get security metrics with a high level of scalability, flexibility, customization, integration, and monitoring at a low cost.
Usually, SIEMs (Security Information and Event Management systems) such as Splunk and other commercial products like Aqua Security or Cloud Conformity are very expensive.
Then, how can we address the second AWS security pillar with minimum resources without loss of any functionality?
The answer is open-source and manpower.
There are multiple variables to verify before adopting open-source software.
Some of them are:
- Active repository: The project should be supported by a solid company or an established developer so you have an assurance of maintenance of the software over the long-term.
- Highly customization: Most open-source software is very customizable but not all of them provide an easy way to integrate new tools and interact with other software.
- Support: The availability of support is another variable to check before adopting open-source software. An available team of dedicated experts gives the team confidence in problem resolution, adapting the solution to a specific use-case
In order to provide security in the cloud, Groupon runs multiple technologies.
Considering the reasons exposed above, the partial list and main tools adopted to protect and monitor AWS services by Groupon are:
- AWS native services
How do they interact with each other?
The interaction between the components is complex and there are several steps to make the following diagram a reality.
- Native AWS services
In order to enable AWS security services the products should be evaluated according to the environment.
To establish a mature environment capable of protecting the customer information we decided to enable the services listed on the diagram using terraform/cloud formation.
A good DevSecOps approach to enable the software to be delivered efficiently and securely is crucial.
All the valuable outcomes that the services mentioned produce are stored in S3 buckets thanks to CloudWatch event rules.
2. Wazuh Agent (prowler)
The prowler project is invaluable, it produces a huge amount of interesting events about your account configuration and lets you know about deviations from the CIS benchmarks.
In our particular case, Prowler runs on a server that stores JSON events in a file that is collected by the Wazuh agent, that in turn, sends it securely to the Wazuh manager.
Wazuh managers evaluate the event and produce an alert based on custom rules prepared for Prowler.
Note: We recommend running Prowler and the Wazuh Master using IAM roles instead of IAM users.
3. Wazuh agents Host IDS, historically Wazuh agents were designed to monitor hosts regardless of the environment (on-premise/ AWS/ etc).
In order to distribute the agent for all accounts, the best approach for our environment was to introduce the Wazuh agent on a Base AMI and share the base AMI with the agent installed on it.
“It uses a stateless rules engine for policy definition and enforcement It integrates tightly with serverless runtimes to provide real time remediation/response with low operational overhead.”
A good way to implement custodian policies is by code. Unfortunately, there isn’t native support for CloudCustodian via HashiCorp’s Terraform. However, there are many ways to develop your own code and manage CloudCustodian policies as code.
Using Terraform for all policies provisioning with the end goal being a fully immutable infrastructure.
5. Wazuh managers Wazuh Managers play an important role in this project, they are the managers of parsed data that is stored in S3 buckets and also generate alerts based on rules for Prowler and Wazuh agents.
Because the amount of Wazuh agents for our scope might increase very quickly, we needed a way to scale the Wazuh managers automatically. Accordingly, the data collected and stored on ELK might be huge, so we have to scale the ELK nodes as well.
The answer for your problems is Kubernetes.
Running our own implementation using autoscale for Wazuh managers and Elasticsearch nodes solves the problem. The Wazuh Manager and Elastic clusters are the central pieces of this project and both should run in autoscale. This and the AWS native services are the tasks that will demand most of your time. Make sure to pick the right instance types for your workers, calculate the right space on disks needed for your workers, etc. There is good documentation here and here that you can follow to accomplish this and that help you develop the clusters in an efficient way
6. Backup Process. Finally, it is important to mention a backup process (not in the diagram). After your hard work, you would hate to break something and not be able to restore it. Therefore, a good backup process should be ensured in order to act in case of disaster. Velero is an excellent open-source software that will help you to backup your Kubernetes Cluster and create snapshots of your instances according to your schedule.
What can I see?
You will need talented employees not only able to enable Security AWS services (as code via terraform) but able to manage a Kubernetes cluster, deploy a Network LB, manage ELK cluster, Manage Wazuh, create custom Wazuh rules, etc. Besides that, Kibana skills are required to develop valuable visualizations and dashboards.
Just to open your mind, in our use case, the following visualizations were created:
- Root access: this panel will show you root account access (secured with Virtual MFA)
- GuardDuty new: All GuardDuty events.
- GuardDuty: types of events (IAM/Brute Force/ etc).
- IAM users from non authorized IPs: User access from non-authorized IPs.
- Console login: Data table with Geolocation, account, and user identity.
- Unauthorized API calls: This panel reports any API call with one of the following errors, Client.OperationNotPermitted, Client.UnauthorizedOperation, AccessDenied, and Client.AuthFailure.
- Console login map: Same as console login but in a geolocation map.
- Sources: AWS event source in a PIE graph.
- Network ACL changes: who modify the ACLs.
- IAM management: Role / user creation.
- Macie: Macie data security and privacy alerts.
- Incident Response: CloudCustodian remediations; lambda invoke.
- VPC Flow logs (inbound): Ingress traffic, accepted traffic by internal host.
- VPC Flow logs (outbound): Egress traffic, accepted traffic by external host.
- VPC Flow logs (dst addresses — rejection): Egress traffic, rejected traffic by external host.
- VPC Flow logs (src address — rejection): Rejected traffic by internal host. Shows the attacker IP.
- VPC Flow logs (dst port — rejection): Rejected traffic by internal host. Shows the port attacked.
- Prowler alerts: Only alerts that fail the check.
- Inbound rejected traffic (timeline): Rejected traffic by hour
- Average bytes to Internet: Traffic to Internet (hourly)
Let’s see a specific use case
We’ll start this case with a prowler alert.
There are several prowler alerts. Make sure that you alert only by rules that are interesting for your use-case.
Because the perimeter is the first line of defense, the perimeter of your company should be protected. In order to minimize the attack surface, your resources should not be exposed to the Internet. Therefore, one of the alerts configured in Groupon to trigger (by Slack or email) is the check rule for “Internet facing EC2 instances”.
As you can see in the following screenshot, there is an alert about an instance i-00… with external IP 42.XXX.XXX.XXX from the account 012… exposed to the Internet.
A quick check in the Amazon console shows the internal IP address of the instance exposed. Note: Alternatively, you can use Kibana to find the internal IP.
The next step requires using some Kibana filters. We will filter by a non-compliant IP (or instance ID) and we’ll see the interactions with this instance.
Finally, Let’s go back to our dashboard, after filtering by the internal IP of the instance (2) and other optional filters like accountID (1) the visualizations created previously allow us to see the following:
Even GuardDuty is not able to alert you about this, but since you have full visibility of the environment, other security metrics cover the gap. This monitoring redundancy follows the concept of defense in depth where a multi-layered approach with intentional redundancies increases the security of a system as a whole and addresses many different attack vectors.
As we can see above, the attack was fully explained using the dashboard for vpcflow logs and prowler.
With this information, you might decide whether to terminate the instance, notify the owner, or collect the attackers' IP to create your own threat intelligence, etc.
This example might be extrapolated to GuardDuty alerts, Macie alerts, or any other service.
Other cool analysis
There are multiple correlations that you can query using Kibana, for example:
- You can see what roles are running instances,
- See root access
- Geolocate console logins.
Tip: Based on this rule, use your expertise to create useful rules like 2 or more successful logins from 2 different locations in a short timeframe.
- Being able to perform investigations using discovery capabilities for more detailed information. The correlation of data is possible thanks to the discovery module.
- Integrate Wazuh agents to monitor EC2 instances or on-premise services.
- Easily integrate any security tool (like Falco) that is able to produce JSON events. Falco, an open-source project to monitor containers at runtime push JSON events into S3 buckets which are ingested by Wazuh. Dashboards for container protection might be created as well.
Many services, tools, and resources are being monitored by Wazuh using ELK as a search tool at a minimal cost. The integration of new solutions makes Wazuh as the central monitoring tool the perfect fit.
These wonderful open-source tools allow us to get telemetry from multiple technologies, correlate events, and run investigations from a central management interface. The customization and integration with multiple tools are just some of the advantages of open source tools.
Kubernetes provides scalability for ELK and Wazuh managers allowing them to autoscale if the ELK servers are not able to handle the volume of data or the managers are not able to handle a large number of agents.