Simplifying Security Data Ingestion: Recent Snowflake features minimize cost and complexity
As the industry continues to embrace the concept of a security data lake, the first thing on many practitioner’s minds is how to get all that data in. Recently, Snowflake has released two features that can create a significant reduction in both the cost and effort required for ingestion. Let’s dive in.
Snowpipe Streaming
Snowpipe is Snowflake’s primary ingestion mechanism for ingesting files from a cloud storage location. As files get added, Snowpipe will continuously copy them in, reducing latency to a matter of seconds. While this pattern is generally acceptable, for workloads such as threat detection where latency is critical or where ingestion involves large numbers of small files, utilizing Snowflake streaming can both decrease latency and reduce cost.
Snowpipe Streaming ingests data on a row by row basis, this means that customers can take advantage of Kafka or the Java SDK to ingest into Snowflake without needing to first stage that data in cloud storage. This saves both time and cost. For use cases with large amounts of small files, Snowpipe Streaming eliminates the per file overhead that is often seen with Snowpipe. For most customers this overhead is negligible, however for those specific sources like XDR and network logs this can have a significant impact, reducing ingest costs by up to 50%
Learn more about Snowpipe Streaming from the documentation and this quickstart
External Access (Preview)
While big data was busy embracing ETL, exports and data sharing. Security lagged behind in the world of closed platforms and custom connectors. Though this is changing, there still exist a number of security tools in which polling the API is the only reasonable way to get at their logs. This means that until recently, customers looking to ingest data from an API were left to either use cloud functions or a 3rd party tool.
Recently, Snowflake has launched capabilities to allow UDFs to reach out to external endpoints. This allows customers to authenticate to APIs and pull the data into Snowflake without needing an external function or staging area. This reduces complexity and allows centralized management of ELT jobs.
The following is a copy pasteable example of pulling JSON data directly from an API. For production use-cases, Snowflake has also expanded on its secrets management capabilities, allowing API keys to be stored securely and separate from business logic.
Learn more about External Access from this article or the official documentation