This is Why You Should Use Snowflake for Security Analytics

chris herrera
Hashmap, an NTT DATA Company
6 min readJun 7, 2019

I will admit it, when asked what you should use as the core technology to understand your organization’s cybersecurity threats, a data warehouse is not usually the first tool that comes to mind.

Typically your instinct will push you towards traditional SIEM (Security Information and Event Management) systems, after all these systems are purpose built to do this type of work for you.

However, let’s take a step back for a second. SIEMs are great at funneling data in and providing you with a base set of rules that work for everyone, but what ultimately ends up happening is that you get lost in an ocean of false positives that your poor SecOps team now has to filter through, instead of doing more important stuff like ensuring real problems are addressed, and improving your overall security footing.

Why is this though? What is missing that is causing all of these false positives? Context.

Context

The problem with traditional SIEMs is that they are missing context. For example, is this a Sandbox machine, is that user a DevOps user who is meant to run unknown applications, is this security group supposed to be open because it’s part of a DMZ?

Adding this context to a traditional SIEM is a challenge to maintain as generally the SIEM is managed by a different group than the one who is using AWS, for example. This leads to inconsistencies in context as viewed from the two groups, thus leading to more false positives, and unclear remediation paths (and sad faces).

If only I could join my asset data, to my log data, to my organizational and HR data. This is starting to sound like a data problem!

Security is a Data Problem

Like everything else in the data world, the volume of security data is increasing, and its value is multiplied when it is joined with other contextual data sets. This is really where a data warehouse becomes much more relevant in the security analytics space. However, this has traditionally been a daunting task with the need to model the events up front, integrate all of the data sources, and be able to deal with the semi-structured nature of most of the events. There is additional complexity when you are trying to analyze across SaaS and Cloud platforms, as their data is generally API driven rather than log file driven.

Enter Snowflake Cloud Data Warehouse

With Snowflake’s native ability to handle semi-structured data, it is a natural choice for analyzing your security data. However, the feature that makes this even more powerful is Snowpipe with auto ingest, streams and tasks. Setting up CloudTrail from AWS now becomes a 5 minute exercise. To set this up the solution would look like the figure below:

Snowflake data pipeline for CloudTrail

In the diagram above, CloudTrail is configured to write to an S3 bucket, a pipe is created with auto ingest to write the data to a landing table. Because the source data is heavily nested JSON, a stream is used to capture the changes from the landing table and a task is utilized to extract key values for indexing and flatten the JSON. This task makes read much more efficient by:

  1. Allowing Snowflake to prune depending on common predicates that are extracted by the task
  2. Removing a lateral_flatten step for each row read

So with out any external tooling to manage, a robust ELT pipeline is created to analyze cloud trail logs. The asset information can be pulled in in much the same manner, however, the AWS configuration API’s need to be queried with a set of lambda functions using Boto3 (more on this in the next post), and written to an S3 bucket where the same approach can be followed. Now you might be thinking about the EM side of SIEM, enter SnowAlert.

SnowAlert

We have talked about SnowAlert before, but I wanted to put it in the context of this SIEM replacement strategy. SnowAlert allows a user to create alerts and violations. Alerts are activities that require attention at this minute, or said another way, events that were triggered as the result of some action. Examples of alerts could be:

  • Modifications made to a CloudTrail configuration
  • Multiple access denied events within a specified time frame
  • Login by a terminated employee

Violations, on the other hand, are configuration issues. These are situations that increase your surface area of attack, or reduce your overall visibility. These could be issues like:

  • EC2 instance not shipping logs
  • Wide open security group
  • Cloud account not shipping logs
  • EC2 instance not tagged according to policy

SnowAlert allows you to make use of all of the data you are collecting by acting on it. While this is good, because you can join across all your contextual data to generate these violations and alerts; one very important feature is that you can suppress alerts and violations as well. Examples of this would be:

  • Wide Open Security group on 443 is fine for a DMZ for web hosting
  • Unknown package installed is fine for an EC2 instance tagged as sandbox
  • Admin without MFA is fine for a service account that does not have that capability

The best part about all of this is that the rules (violations, alerts, and respective suppressions) are defined in SQL, a clean and clear way to define, describe and model your rules. Additionally, SnowAlert integrates with Slack and Jira for event management as well.

Data Sharing

One key piece of contextual information that has not been talked about yet, is how do I integrate with curated lists of IOC’s (Indicators of Compromise), where do I get them from.

This is where Snowflake Data Sharing and Data Exchange come in. You can now get a list of IOC’s as if you were getting an App from the AppStore. The best part is it just shows up as another table, that gets updated without you having to do anything. This means you don’t have to stay on top of the latest security campaigns, so long as you have a share setup from a source that does.

While pulling in data is fantastic, being able to have experts analyze your data for weak signals, and threats that you might be missing is a huge piece to the puzzle. Creating a data share with a service like Hunters.ai allows you to not only centralize your security analytics, but ensure you have some of the most advanced threat hunters in the world ensuring your infrastructure is secure.

Conclusion

Security, now more than ever, is a data problem. Especially in the cloud-native space, the volume and variety of data can become overwhelming. Using Snowflake and SnowAlert as an SIEM replacement, not only allows you to analyze and alert on your security data in context, thus reducing false positives; but it allows you to ensure that you can build your context from the world leaders in threat intelligence and threat hunting.

The unique features in Snowflake that allow for simple data pipelines, and controlled sharing (both incoming and outgoing), truly create an impressive replacement option that is both powerful and cost conscious.

While this post was AWS focused, the true value of this solution is that it is Cloud/SaaS provider agnostic. It can be used to monitor AWS, GCP, Azure, GitHub, Slack, Okta and many more services in the same manner.

At Hashmap, we are well versed in using Snowflake and SnowAlert for security analytics because we do it not just for our customers but for ourselves as well. If you would like to have a discussion on what your next-generation security analytics strategy should look like, don’t hesitate to get in touch!

Feel free to share on other channels and be sure and keep up with all new content from Hashmap.

Chris Herrera is Chief Technology Officer at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers, partners, and the community. You can tweet Chris @cherrera2001 and connect with him on LinkedIn and also be sure to catch him on Hashmap’s Weekly IoT on Tap Podcast for a casual conversation about IoT from a developer’s perspective.

--

--