S3 Antivirus Scanning with Lambda and ClamAV by Dennis Webb — AWS Cloud Expert and Slack Comedian
There have been many stories over the past months about S3 buckets being left unsecured. Terabytes of sensitive data available for the whole world to download. CyberSecurity 101 teaches, “Don’t leave private data open to the public.”, but somehow many major companies have let this one slip through the cracks.
Another lesson that everybody knows is, “Never open a file that has not been scanned for viruses.” So how do you protect yourself and scan the files stored on S3? Amazon doesn’t have a built-in antivirus tool for the task. At Upside, we’ve created our own solution with Lambda and S3 events.
Using S3 Event Notifications, a Lambda function is invoked to scan the newly uploaded file. The function will download the object from S3 and scan it for viruses using the open-source antivirus software ClamAV. Once scanning is complete, the function will add 2 tags to the S3 object,
av-status can have a value of either
INFECTED. S3 bucket policies prevent anybody from reading a file where the status is
Working with ClamAV and Lambda
If you’ve ever worked with Lambda, you’re probably wondering how we got ClamAV installed. Lambda functions are simply a ZIP archive containing your source code (Python for our scanner) and any other files required by your function. When we are making our deployment archives for the antivirus function, we also include the binaries for ClamAV. To know the binaries will work on Lambda, we leverage the amazonlinux Docker image to download the RPMs using
yum. After downloading, we only include the required files:
We do not include the definition files in our Lambda archives as those are over 100MB in size and become outdated quickly. Instead, we store the most current version of the definitions in a separate secure S3 bucket for fast downloading. The definitions are updated using
freshclam from a separate Lambda that runs every hour from a CloudWatch Event. Another benefit of a bucket for definitions is all 4 of our AWS accounts use a single bucket for retrieving definitions.
There are 2 configuration steps for S3. One is the event to invoke the scan, and the other is a bucket policy that prevents accessing files tagged as infected.
S3 events can be configured in the AWS S3 console under bucket properties. We scan on all ObjectCreate events. AWS will handle assigning permissions for S3 to invoke the antivirus lambda function.
Configuring bucket policies is also done in the S3 console. We add the following to all of our policies to prevent anybody from accessing files tagged infected. It is also recommended to not allow any of your users to modify the object tags with a name of
"Action": ["s3:GetObject", "s3:PutObjectTagging"],
The Lambda has optional support for SNS notification of scan results for every file scanned. This has many uses such as notifying a service that a new file has been received and marked as clean. Another use is triggering another Lambda function to handle infected files, such as moving to a quarantine bucket or deleting. We created the scanner with a single purpose and decided that different buckets would want different actions taken after scanning. SNS allows us to separate these responsibilities.
DataDog metrics and events are published if the
DATADOG_API_KEYenvironment variable is configured. DataDog is able to alert our NOC when infected files are received and to alert them to an attack if many infected files are being uploaded in a short period of time.
Our back-end customer support software relies on the antivirus status tags when sharing files and photos with customers. During chat conversations, both customers and our support staff, referred to as Navigators, can share files to assist in communication. These attachments are placed in a S3 bucket and are only made accessible to the users after being scanned and confirmed to be safe. For the record, users upload and download the files from S3 by using pre-signed URLs. This allows access to the specific files for a limited time and prevents others from accessing the content.
Large Object Limitation
There is one limitation with this setup. AWS Lambda limits your function to 512MB of
/tmp storage space. Since the S3 object must be copied locally before being scanned, if the object is > 400MB (ClamAV definitions are over 100MB in size), then the scan job will fail. For our use case, this size limitation has not been an issue.
Upside Travel is releasing this solution under the Apache 2 license. Full source is available from GitHub. You will find more detailed installation instructions there to help with deployment into your own environment. If you have an improvement you’d like to contribute, please submit a pull request.
Learn more at www.bluesentryit.com