Simplifying Security Data Ingestion: Recent Snowflake features minimize cost and complexity

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

2 min readSep 18, 2023

As the industry continues to embrace the concept of a security data lake, the first thing on many practitioner’s minds is how to get all that data in. Recently, Snowflake has released two features that can create a significant reduction in both the cost and effort required for ingestion. Let’s dive in.

Snowpipe Streaming

Snowpipe is Snowflake’s primary ingestion mechanism for ingesting files from a cloud storage location. As files get added, Snowpipe will continuously copy them in, reducing latency to a matter of seconds. While this pattern is generally acceptable, for workloads such as threat detection where latency is critical or where ingestion involves large numbers of small files, utilizing Snowflake streaming can both decrease latency and reduce cost.

A diagram, on top kafka streams use the client SDK to move data on a row by row basis into a staging table in snowflake. The bottom depicts Snowpipe with auto ingest file by file transfer. After the staging table the data is moved through a transformation step eventually landing in “refined tables”. Everything after “staging table” is contained in Snowflake. — Snowpipe Streaming

Snowpipe Streaming ingests data on a row by row basis, this means that customers can take advantage of Kafka or the Java SDK to ingest into Snowflake without needing to first stage that data in cloud storage. This saves both time and cost. For use cases with large amounts of small files, Snowpipe Streaming eliminates the per file overhead that is often seen with Snowpipe. For most customers this overhead is negligible, however for those specific sources like XDR and network logs this can have a significant impact, reducing ingest costs by up to 50%

Learn more about Snowpipe Streaming from the documentation and this quickstart

External Access (Preview)

While big data was busy embracing ETL, exports and data sharing. Security lagged behind in the world of closed platforms and custom connectors. Though this is changing, there still exist a number of security tools in which polling the API is the only reasonable way to get at their logs. This means that until recently, customers looking to ingest data from an API were left to either use cloud functions or a 3rd party tool.

A visualization showing various services surrounding a Snowpark Icon representing the ability to connect externally — External Access

Recently, Snowflake has launched capabilities to allow UDFs to reach out to external endpoints. This allows customers to authenticate to APIs and pull the data into Snowflake without needing an external function or staging area. This reduces complexity and allows centralized management of ELT jobs.

The following is a copy pasteable example of pulling JSON data directly from an API. For production use-cases, Snowflake has also expanded on its secrets management capabilities, allowing API keys to be stored securely and separate from business logic.

Learn more about External Access from this article or the official documentation

Simplifying Security Data Ingestion: Recent Snowflake features minimize cost and complexity

Snowpipe Streaming

External Access (Preview)

Written by Jake Berkowsky