Incident detection and continuous monitoring in AWS

Published in

Empathy.co

7 min readAug 30, 2018

Introduction

The General Data Protection Regulation (GDPR) is now in effect, aiming to oversee people’s Fundamental Rights regarding personal data processing, protection and portability.

Art. 32 outlines the obligation of introducing technical and organizational measures to guarantee the confidentiality, integrity, availability and resilience of the services and systems that process personal information, as well as the capacity to quickly restore access to personal data in the event of a physical or technical incident.

Taking into account these new regulations and that EmpathyBroker’s products provide services for eCommerce retailers and therefore impact consumers, we undertook a review to look at the potential risks derived from the treatment of users’ personal data. This was especially important for us as our products are offered as SaaS inside the Amazon Web Services (AWS) infrastructure.

As a result, we’ve implemented a set of administrative, operational and logical safeguards or countermeasures. However, even though all of these are necessary, we’re only going to discuss here the security controls relating to incident detection and continuous monitoring, as per Art. 32:

“…a process for regularly testing, assessing and evaluating the effectiveness of technical and organizational measures for ensuring the security of the processing.”

This article will begin with a brief depiction of the Shared Responsibility Model and the concept of continuous monitoring and will then focus on the essential AWS services for achieving those goals: GuardDuty, CloudTrail and VPC Flow Logs.

Incident Detection

Security incidents can be detected by applying detective controls, those that “involve the use of practices, processes, and tools that identify and possibly react to security violations.” (Official (ISC)2 Guide to the CISSP Exam)

Although some controls are common for both on-premise and cloud environments the latter ones are dictated by the shared responsibilities between providers and customers, as in the case between EmpathyBroker and AWS.

AWS Shared Responsibility Model

This model is based on the premise of having a secure global infrastructure and cloud services which can be used as a basis to design an Information Security Management System (ISMS).

So, while AWS manages the security of the following assets: facilities, hardware, network infrastructure and virtualization infrastructure, at EmpathyBroker we are responsible for, among other things, the operating systems, data in transit and at rest, storage and access credentials. In the same way, we need to also take care of continuous security monitoring.

Therefore, our responsibilities in accordance with Art. 32 of the GDPR regard the protection of consumer’s assets and data in AWS and the achievement of business requirements related to information security. Among these must be those related to the treatment of personal data specified by the regulations.

Continuous Monitoring

At the beginning of the document “NIST Special Publication 800–137” continuous monitoring is described as “maintaining ongoing awareness of information security, vulnerabilities and threats to support organizational risk management decisions.” To do this, several elements need to be considered and as such the following require special attention:

Collecting, correlating, and analysing security-related information;
Providing an actionable communication of security status across all tiers of the organization.

In addition, the document “AWS Security Best Practices” also mentions that security monitoring starts with answering some questions about the collection, measurement and storage of security-related information and defines “What do I need to log?” as the most important one.

Tools and Services

The answer to the above question involves three primary sources of logs: GuardDuty and CloudTrail services and VPC Flow Logs.

Let’s see how each of these is related to the detection of security incidents and what other elements we need to provide alerting facilities and actionable communications.

GuardDuty

Classified by AWS as a continuous monitoring service, GuardDuty analyses and processes the following data sources: VPC Flow Logs, AWS CloudTrail and DNS records. It uses threat intelligence information with IP addresses and domain blacklists along with machine learning in order to identify anomalous activities and unauthorized accesses to the AWS infrastructure and/or environments.

By default, “when a potential threat is detected, the service delivers a detailed security alert to the GuardDuty console and AWS CloudWatch Events.” These security events named findings are identified by an unique ID common to the first and subsequent occurrences.

As a first step, we enabled the service across all of our AWS accounts adding the convenience of a centralise management in one of them. We then created a CloudWatch Events Rule and two Targets to:

Store the security events in S3 through Kinesis Firehose for audit purposes and as forensic evidence for a fixed period of time;
Send each finding occurrence through SNS to a couple of Lambda functions in order to integrate with the existing alerting and logging capabilities.

We used Terraform for the infrastructure provisioning with a similar outcome as shown in Figure 1 below. As can be seen, the alerting and communication capabilities are provided by a Hight Availability Alertmanager cluster that forms part of the EmpathyBroker monitoring infrastructure, alongside Prometheus and other components. One Lambda function transforms the findings into alerts and sends them to each of the cluster members through the Alertmanager API. The Alertmanager then dispatches notifications to the corresponding communication channels (Slack, PagerDuty, email, etc..) and we can take further actions to remediate the problem. Figure 2 shows an example of Slack’s notifications with different priorities.

Figure 1. High level implementation details.

The other Lambda function store the findings in ElasticSearch to simplify the search, analysis and correlation with the other sources mentioned below. It’s important to note that a finding can have multiple occurrences and it’s possible to update the same document using its finding ID or to index a new one with a different document ID; we choose the second approach.

CloudTrail

CloudTrail allows “governance, compliance, operational auditing and risk auditing” and also to “log, continuously monitor and retain account activity” for actions performed in AWS. It also provides an event history that “simplifies security analysis, resource change tracking and troubleshooting.”

This service is enabled in AWS accounts with the proper configuration to allow events to be captured in all regions, stored in S3 and indexed into ElasticSearch to achieve the objectives mentioned earlier. These events and the others mentioned below are also processed before indexation to include additional information such as IP address geolocation.

An important consideration to keep in mind here is that events are sent to S3 or CloudWatch every five minutes. This can be inconvenient, for example, in the case of a suspicious Sign-in event where a notification should be sent as soon as it happens.

Sign-in Events

These events are generated each time a user accesses any of the AWS accounts. It’s important to note that authentication can happen in any region and that the event will be registered for that given region.

It’s possible to process these events as soon as they occur through CloudWatch and SNS; which means it’s necessary to set up CloudWatch Events Rules and Targets in every account-region pair. Once the appropriate permissions and policies are set-up, a Lambda function should be added to the SNS channels to perform the processing on the designated account.

When there is more than half a dozen accounts it’s advisable to create an alias for the Lambda function in each region and perform the SNS subscription using that alias, due to existing limitations in the SNS subscription policy size of the Lambda function. This part can be a little tricky to do if it’s done with Terraform and will require some scripting to generate “.tf” files for each account-region pair.

VPC Flow Logs

Although not a service by itself, VPC Flow Logs are “a feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC.”

This type of record is very important from a security standpoint and its monitoring allows us to evaluate the effectiveness of the rules associated with the security groups or access control lists. Two cases can arise:

Restrictive rules that block legitimate traffic; and
Insufficient controls that allow undesired traffic.

In addition to geographic information, we checked if the public IP addresses of these events belong to AWS and consequently tagged them for filtering.

Conclusions

This article has highlighted some of the elements relating to the security of EmpathyBroker’s products, particularly the security events detection inside the AWS infrastructure.

We use the logs provided by GuardDuty, CloudTrails and PC Flow Logs; pre-processing them to enrich them with useful information for indexing in ElasticSearch, this also eases their analysis and stores them in S3 as forensic evidence for a fixed period of time.

We receive notification about security events identified by GuardDuty; executing the necessary actions to remedy the incident. Also, we can deepen the analysis with ElasticSearch and investigate the findings that correlate with the other store events, of which it’s possible to obtain added value in an independent way.

Finally, with the main elements for continuous monitoring present, at EmpathyBroker we are committed to keep checking and systematically evaluating the effectiveness of the measures taken to guarantee the security of data processing and ensure we are offering our customers the highest level of protection.