Fortune 500 Financial Company Improves Data Onboarding and Threat Detection with Anvilogic — Powered by Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readSep 15, 2022

The real-world security consequences of data evolution and how to evolve

The following is a guest blog post by Tim Frazier, Senior Solutions Engineer, Anvilogic.

TL;DR: A technical walk-through and a Fortune 500 customer case study of how fast and straightforward it can be to onboard massive amounts of data with Anvilogic Powered by Snowflake (specifically in this example, Crowdstrike FDR into Snowflake). While quickly and easily deploying quality out-of-box, but not black box detections, to gain visibility and better detection coverage across hybrid-cloud and a security data lake.

The traditional paradigm of trying to consolidate all of your security data into one place is no longer practical –whether it’s cost-prohibitive or organizationally/politically impossible. The larger your organization is, the less likely you can get everything consolidated into your Security Information & Event Management (SIEM) solution, making it difficult to achieve an ideal state of correlation across multiple data feeds. This has real-world consequences because without the data feeds and the ability to correlate them together at scale, it will be impossible to detect the attacks, breaches, and ransomware that can cause business disruption and cost your organization real money. Is there a way to fix this? (Would we be writing this blog if there wasn’t?)

Anvilogic has partnered with Snowflake to provide a modern security stack that addresses your concerns around growing data volumes, cost, scalability, and detection coverage. With Snowflake as your security data lake, you can eliminate data silos and run fast queries on petabytes of data at a fraction of the cost of traditional SIEMs. And with Anvilogic’s pre-built security detections, you can quickly run detections on all of your data in Snowflake and other data stores. We enable you to correlate detections across those otherwise disconnected data repositories without having to know the specific query languages for these data repositories. For the security professional, this is key because you don’t have to bring the tool/language specific knowledge in addition to the security domain expertise. With Anvilogic and Snowflake, just bring your domain expertise of “what” you want to look for and we will take care of the “how”. We take care of the “how” by providing you the individual searches and the normalization necessary to combine individual detections together in correlated scenarios in a performant and easy manner. But enough of the intro; let’s get into this and we’ll break down a real example of what we have done.

The Proof is in the Pudding: Fast set-up, data onboarding & detection deployment

With less than 1 hour of initial setup time and no investment of detection engineering or SIEM engineering from our customer, over the course of only 7 days Anvilogic helped to:

Onboard 7TB of Crowdstrike FDR (Falcon Data Replicator) logs into Snowflake
Apply 75 of Anvilogic’s pre-built detection rules against that data which produced 8.1M “warning signals”
Leverage Anvilogic’s machine learning-based ATD (Automated Threat Detection) to look for patterns and correlations across those 8.1M signals
Identified and brought 2 unannounced red team tests they were doing to simulate real-world attacks to the SOC’s attention

In essence, with only an hour of their time over a 7-day period, Anvilogic effectively detected 2 real-world style attacks. It’s important to call out that the organization did this significantly cheaper (40–60%) than would have been possible with their existing legacy logging solution and toolset. Other customers we have worked with have spent over $1M on consulting engagements and software license fees, along with 5000+ hours in attempts to onboard this specific data source (Crowdstrike FDR logs) into their existing tooling and write detection rules against that data.

Let’s get Technical…

You might be asking “How did you do this, exactly?” Well, I’m glad you asked. If you would prefer to watch us “do it live” on a webinar we did on September 1, go to this link and check it out. (If you want to skip straight to the technical setup portion, go to the ~6:40 mark in the video). If you would prefer to read about it, venture forward here. In a nutshell, the following diagram shows the architecture of what we set up:

The basic steps are as follows:

Get your data feed turned on (request Crowdstrike FDR, enable AWS Cloudtrail, VPC flow logs, etc.).
Get your feed into a usable state in your own AWS cloud infrastructure.
Use Anvilogic’s well-documented instructions and methodology to get an automated data pipeline into Snowflake.
Use the Anvilogic platform to enable pre-built detection content for these data sources in Snowflake.
Watch your “SOC Maturity Score” go up in Anvilogic as you onboard these data feeds and enable the pre-built detections that are mapped to the MITRE ATT&CK framework. You will be able to show tangible and objective measured improvement to your overall visibility and detection coverage for the platforms and threat groups your organization cares about.

The following will focus on steps 2–5 since we are talking about well-known sources that the supplying vendors have documented how to turn on. We will use Crowdstrike FDR logs as the data source for this example.

Step 2. Get Useable Crowdstrike FDR Logs into AWS

We’ll use this project on GitHub to set up the Cloud infrastructure: https://github.com/anvilogic-forge/aws-falcon-data-forwarder. This project will facilitate moving the Crowdstrike data from the Crowdstrike-owned S3 bucket in AWS to an S3 bucket that you own as shown in this diagram:

The main project page has good instructions on what you need to do. First, you will need a few prerequisites that are listed on the page:

Tools:

go >= 1.11
aws-cli https://github.com/aws/aws-cli

Your AWS resources:

AWS Credential for CLI (like ~/.aws/credentials )
S3 bucket for log data (e.g. my-log-bucket )
S3 bucket for lambda function code (e.g. my-function-code )
Secrets of Secrets Manager to store AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for data replicator.
IAM role for Lambda function (e.g. arn:aws:iam::1234567890:role/LambdaFalconDataForwarder)
s3::PutObject for my-log-bucket
secretsmanager:GetSecretValue

In the webinar, I used Terraform to create the above prerequisite AWS infrastructure, but you can create it manually in the AWS console just as easily. Once those resources are built, you need to create your JSON configuration file as shown on the README page:

{
    "StackName": "falcon-data-forwarder-staging",
    "Region": "ap-northeast-1",
    "CodeS3Bucket": "my-function-code",
    "CodeS3Prefix": "functions",

    "RoleArn": "arn:aws:iam::1234567890:role/LambdaFalconDataForwarder",
    "S3Bucket": "my-log-bucket",
    "S3Prefix": "logs/",
    "S3Region": "ap-northeast-1",
    "SqsURL": "https://us-west-1.queue.amazonaws.com/xxxxxxxxxxxxxx/some-queue-name",
    "FalconRegion": "ap-northeast-1",
    "SecretArn": "arn:aws:secretsmanager:ap-northeast-1:1234567890:secret:your-secret-name-4UqOs6"
}

After that, just run the basic command and the code will take care of the rest:

$ env FORWARDER_CONFIG=myconfig.json make deploy

Step 3. Follow along to get an automated data pipeline into Snowflake

For this step, it’s recommended that you create a dedicated Snowflake account specifically for the purpose of this evaluation. You can follow along with our docs here for the Snowflake implementation steps:

Below are a couple of screenshots from the docs that show a few of our very simple, copy-and-paste code blocks that you can execute directly in a Snowflake worksheet.

[These docs are not publicly accessible and you will need to reach out to us to get full access. Running a free trial with us has no obligations on your end and is a very low effort to get going so I highly encourage you to reach out to us at Anvilogic or your Snowflake rep. We are happy to help.]

Step 4. Enable pre-built Anvilogic detection content for Snowflake

After following our instructions, you will have your Crowdstrike data flowing into Snowflake in a fully automated fashion. At this point, you will want to log into Anvilogic to review recommended use cases and detection packs (again, you’ll have to engage with us for this, but an evaluation is completely free).

As of this writing, we had 691 Threat Identifier use cases, 249 Threat Scenarios overall in the Armory, but we tell you exactly what is relevant for you by scrolling down. In the bottom left of the 2nd screenshot, there is the “Snowflake — Crowdstrike Windows Endpoint” detection pack that includes 50 rules specifically for Crowdstrike FDR data in Snowflake. This next screenshot is what you will see once you click into that detection pack:

Here we show why we are recommending these rules/searches, specifically because they match your prioritized MITRE ATT&CK techniques, they match “Windows” as a prioritized platform and you have EDR logs coming into your Snowflake account. You can also see in the bottom left that deploying all the rules in this detection pack will increase your coverage an additional 32% up to a total of 35% in this case. This is a prime example of how Anvilogic shows you exactly how to improve your threat detection coverage and thus, your overall SOC Maturity score, which is the last piece below.

Step 5. Map data feeds and detection coverage against the MITRE ATT&CK Matrix: Gain better reporting and overall visibility of your coverage & SOCs maturity

Anvilogic provides a scoring framework, methodology and objective measurements for tracking the maturity of your SOC. By monitoring your data feeds, detection coverage against MITRE ATT&CK, and SOC productivity, you can finally objectively show progress to your management, auditors and SOC team members that proves you are getting better at protecting your organization against relevant threats.

The joint solution with Snowflake and Anvilogic allows you to scale your security operations program with ease by leveraging Snowflake as the back end for data storage, transformation and compute while using Anvilogic as the front end for security workflows and content.

If you’re interested in learning more, reach out to your Snowflake team or sign up for a free trial of Anvilogic. Let us show you how we can improve your SOC maturity in just two weeks!