Security Analytics with Wiz and Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readFeb 7, 2024

If you’re looking for a high value, low effort way to boost your security program, analytics on cloud risk data is a fantastic contender. If you have Snowflake and you use Wiz then look no further than this article.

Snowflake and Wiz have partnered to create an integration that can automatically push Wiz-identified cloud security issues to Snowflake so you can analyze and accurately report on your cloud security metrics.

This article will dive into data ingestion, walk through building out visualizations and conclude with ideas for follow on analytics projects.

Why do security analytics?

When we bring security data into Snowflake and build a security data lake we find ourselves with a whole that is greater than the sum of its parts. Wiz data can be used to enrich other data sources such as threat detections or even make better the applications running on the data lake layer. By bringing data into Snowflake, analysts have access to native functionality such as python support, streamlit applications and cortex ML functions such as native anomaly detection. Furthermore, data in Snowflake is available indefinitely at cloud storage prices, this means that historical data can be queried hot, long after the retention period offered by Wiz.

Getting the data in

The simplest way to bring reports into Snowflake is through the native Wiz integration. Installation takes just a few minutes and configuration will happen in the standard reports view. Alternatively, its possible to bring in data either directly from the API, a cloud storage stage or via several third party applications such as a connected siem partner or security ETL tool. The rest of this article will use the schemas provided by the official integration.

Screenshot of Wiz tables in Snowflake, described below

We start by configuring the integration by following the official Documentation on the Wiz portal.

Once data is ingested via the integration we see that wiz provides us several different tables. At the time of writing we have the following reports, exposed as tables

Audit Log — Information about activity in Wiz itself
Cloud Resources — Assets in Cloud environments
Compliance — Detailed reporting containing every resource and how it relates to each Wiz compliance policy
Configuration Findings — Findings generated by cloud configuration rules
Host Configuration Findings — Host CIS assessment results per check and per resource
Issues — Detailed information about all issues in your account
Network Exposure — Information about resources that are exposed or can be exposed to the public internet
System Activity — Progress and status of actions performed in Wiz
Vulnerabilities — Details on resolved and unresolved vulnerabilities found in scanned environments

We also have a meta table wiz_report_runs that gets created automatically and contains the status on report runs and status.

“Latest” and “Historical” data models

Since Wiz is adding data to Snowflake on an incremental basis, we can create two main models to query data. The “latest” data model reflects the current data in the Wiz portal and would be used as a base or source for enrichment, for reporting on current activity or for ingestion and automation in other downstream systems. The simplest way to query this model is to create a view that shows only entries that have been added by the latest report, which we do by joining the source table with the wiz_report_runs meta and selecting the latest entries.

Here’s an example of a view that selects the latest report from the issues table

-- Issues from the latest report
create or replace view issues_latest as 
with latest_issue_report as (
    select wrr.report_run_id as run_id from issues join wiz_report_runs wrr on issues.report_run_id = wrr.report_run_id order by wrr.start_time desc limit 1
)
select issues.* from issues join latest_issue_report on issues.report_run_id = latest_issue_report.run_id;

-- Sample of the results
select * from issues_latest limit 10;

Be aware this approach is simple and generally performant, but for edge cases or extreme scale, it may be worth investigating other architectures to improve performance.

The second model for “Historical” data takes a different approach, instead of using data from the last report, it will take a look at the “latest” version of each individual entry (i.e. asset, issue, vulnerability…). This means that it will include the information from the “latest” model but will also include historical data such as issues closed over 90 days ago. This is useful for reporting on KPIs, or doing an investigation on an incident that happened in the past. Since Snowflake will need to look through more data to complete queries, using this model will be less performant than the “latest” one. Additionally, if reporting was paused at any point, there may be gaps in the narrative (such as an issue that was open, then no longer appeared, obscuring the details around its resolution).

It’s possible to create a view for this model by using a window function to look for the latest version of each entity. For example:

-- the most recent version of all issues including those no longer being reported on
create or replace view issues_historical as
select * from
(
select
    issues.*,
 row_number() over (partition by issues.issue_id order by wrr.start_time desc) as row_number
from
    issues
    join wiz_report_runs wrr on issues.report_run_id = wrr.report_run_id
) where row_number = 1;

-- Sample of the results

select * from issues_historical limit 10;

Running Queries, building dashboards

Once we have our models setup, it’s fairly easy and intuitive to start building dashboards and visualizations. Snowflake supports integrations with all major BI tools and provides native tooling for visualization, data applications and SQL and Python notebooks. To keep it simple, lets focus on writing a few basic queries and putting them into a Snowsight dashboard

If we want to drill into our current environment and break up the results by severity and the resource platform (i.e. cloud). We can write a SQL Statement like so:

select 
       severity,
       resource_platform, 
       count(issue_id) 
from issues_latest
WHERE resource_platform IS NOT NULL AND resource_platform != ''
and status = 'OPEN'
GROUP BY severity,resource_platform;

This will return a table of data which can be formatted as a heatgrid directly from the result Snowflake, then added to a dashboard

Similarly we may want to take a look at what historical time to remediation. Either all time

select
    avg(datediff(day, to_timestamp(created_at), to_timestamp(resolved_time))) as avg_days_to_remediation
from
    issues_historical
where
    resolved_time is not null
    and created_at is not null
;

Or based on what was closed each month

select
 severity,
 date_trunc('month', to_timestamp(resolved_time)) as resolution_month,
 avg(datediff(day, to_timestamp(created_at), to_timestamp(resolved_time))) as avg_days_to_remediation
from
 issues_historical
where
 resolved_time is not null
 and created_at is not null
group by
 severity,
 resolution_month
order by
 severity,
 resolution_month
;

Put it all together and you can make something like this

A dashboard of Wiz analytics data in Snowflake

Next Steps

Ad-hoc queries and reporting are often the first use cases for bringing Wiz data into Snowflake. Once you get started though there’s plenty more you can do

Add more sources: For many customers, posture management is the first ingredient in a security data lake. Adding more sources as you need them and joining them together. You can even use Snowflake as a backend to augment your siem bringing alerts, findings and historical or auxiliary data into Snowflake for ad hoc investigations, reporting or out of siem threat detection.

See: Best practices for Security Log Ingestion and Normalization white paper

Connect security applications directly to your Snowflake account. Connected applications can do all sorts of things including threat detection, risk quantification and compliance. They run directly on top of our data lake and within your perimeter, making your life easier and adding a significant level of performance and cost optimization.

See: Next Generation of Cyber Applications ebook

Aggregate, enrich and prioritize your vulnerabilities. At Snowflake we don’t just stop at ingesting our Wiz data. We combine it with our CMDB and prioritized assets, enrich the data with additional intelligence and create a process for prioritization and downstream automation.

See: Enriching investigations with ServiceNow Asset Data

Security Analytics with Wiz and Snowflake

Written by Jake Berkowsky