Creating a Serverless Malware Scan Solution with ClamAV on AWS

Keep secure all files in your S3 buckets

Alejandro Castañeda Ocampo
Globant
5 min readJul 31, 2024

--

Image source: Unsplash

In today’s digital age, ensuring the security of files stored in the cloud is paramount. Many organizations rely on AWS for their storage requirements, but how can you ensure that uploaded files are clean from viruses? In this post, Let’s go through the process of creating a serverless anti-malware scanning solution using AWS Lambda, S3, and ClamAV.

Prerequisites

Before diving into the solution, ensure you have the following prerequisites:

  • AWS Account: You require an active AWS account with the necessary permissions to create and manage AWS resources.
  • Basic Knowledge of AWS Services: Familiarity with AWS Lambda, S3, EFS, VPC Network configuration, and CloudFormation.

Key Features

  • Serverless Architecture: No need to manage servers; AWS Lambda handles the execution.
  • Scalable: Automatically scales based on the number of files uploaded.
  • Cost-Effective: Pay only for the compute time you consume.
  • Automated Virus Definition Updates: Ensures that the virus definitions are always up-to-date.

Overview of the Solution

The solution leverages the power of the serverless architecture. Its implementation is a wrapper and modified solution of the following AWS labs repository:

Diagram of the implementation.

We can see a basic network infrastructure, a VPC with a private segmentation (subnets), where our lambda functions are deployed. In the middle of the diagram is the EFS, the shared file system that allows the lambda functions to persist the information, especially the database information for Clam AV.

On the right side, we can see the AWS Event bridge event rule, running a schedule every 6 hours to execute the clam-av-database lambda function. The clam-av-database lambda function updates the database Clam AV definition files and persists them in the shared EFS storage.

The virus-scan-clam-av lambda function contains the logic to scan all files and output the scanning status, finally putting a tag to the scanned object in S3 with the output status (INFECTED or CLEAN).

The ECR, how the lambda functions will deploy using the container definition approach, we need a registry to save the docker images that will allow the lambda creations.

Finally, the users can configure an S3 event notification in each bucket to dispatch the scan function on each event creation.

Setting Up Implementation Using CloudFormation Template

To streamline the deployment process, use the following CloudFormation template. This template sets up the necessary AWS resources, including the VPC, EFS, Lambda functions, EventBridge rule, and S3 event notifications:

The following is the code of the clam-av-database lambada function:

The following is virus-scan-clam-av lambda function:

How the lambda functions work using a container image the following is the Dockerfile definition to build the images:

How to Activate S3 Bucket to Trigger the Scan Activity

Once the stack is deployed on the desired AWS account, we must configure our S3 buckets to trigger the Lambda function to start the scan analysis. In our case, we’ll configure one bucket, but in real scenarios, be free to configure all the necessary buckets based on your security requirements.

In the AWS console, navigate to the S3 module and select the desired bucket, then select “Properties” scroll down to the “Event notifications” section, and click on “Create event notification”:

Create an event notification section.

To create an event notification, type the name and what kind of event that you want to receive or forward the notification:

Configuration example for all created events.

Then configure the Lambda function to receive the notifications, remember that the lambda name would be virus-scan-clam-av, then execute the “Save changes” action:

Choose the lambda function as the listener for the S3 events.

Testing The Solution

After our S3 bucket(s) are configured, we’ll be able to upload a file and verify the tags section, to validate the status of the scanning process.

Once an object is in the S3 bucket, the lambda function is immediately executed, marking the object with the specific tag “scan-status”:

Initial status when the analysis starts.

When the lambda function ends, the analysis immediately changes the value of the tag:

The final status was marked as clean, and the object was analyzed.

The virus-scan-clam-av lambda function saves all the logs in AWS CloudWatch, and we’ll be able to review its logs:

Example of logs in AWS CloudWatch.

What are the Allowed Values for The Tag

  • IN PROGRESS: When the lambda function receives the notification of a new object being added, it triggers the malware detection process.
  • CLEAN: Status when an object is free of malware.
  • INFECTED: Status for all the objects with malware detected.
  • ERROR: Status when the lambda function is broken in their execution.

N/A: This status is displayed when Clam AV does not support the file type that I want to analyze.

Handling Infected Files

When a file is determined to be infected, it is crucial to take immediate and proper actions to protect your system and data. This depends on the client’s requirements. One of the approaches would be to move the infected file to a quarantine S3 bucket specifically designated for infected files. This helps prevent the spread of the virus and allows for further inspection in a controlled environment.

Another way to enhance protection would be to create a resource-based policy on your S3 bucket that denies all actions on objects tagged as anything besides “CLEAN.” This policy ensures that infected files are inaccessible, preventing any accidental or malicious usage:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::your-bucket-name/*",
"Condition": {
"StringNotEquals": {
"s3:ExistingObjectTag/scan-status": "CLEAN"
}
}
}
]
}

Conclusions

This article illustrated how to set up a serverless antivirus scanning system with AWS Lambda, S3, and ClamAV to safeguard the files stored in S3 buckets. By leveraging a serverless architecture, the solution is scalable, cost-effective, and requires minimal maintenance. For enhanced security, an optional step to create a resource-based policy on S3 was provided to restrict access to infected files. By implementing this solution, organizations can significantly enhance their security posture, ensuring that all files in their S3 buckets are clean from malware.

References

--

--