Snowpark Protection Through Java/Scala and Python Isolation

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readNov 3, 2022

Learn how the Snowflake Data Cloud provides strong security by leveraging network egress proxying, namespaces, seccomp-bpf, and ptrace to isolate workloads.

Introduction

The most secure door is a wall. It eliminates unauthorized access, but it can hinder authorized entry as well. A wall is easy to design. It’s harder to design a door that is both secure and useful. Snowflake is all about the data: easily enabling governed access to near-infinite amounts of data with cutting-edge tools, applications, services, and security. However this security is of limited value if customers find it’s more of a wall than a door.

Until recently, Snowflake provided a relatively constrained set of programming primitives: SQL queries and a fixed set of built-in functions. A potential attacker couldn’t run the latest exploit in Snowflake because running that code simply wasn’t a service Snowflake provided until Snowpark arrived. As we continue to make new workloads available in Snowflake’s Data Cloud, we’ve had to increase the surface area to accommodate them. Customers wanted a door. This is why we created Snowpark.

Snowpark allows Data Cloud users to pull in third-party Java and Python libraries to build their user-defined functions (UDFs), user-defined table functions (UDTFs), and stored procedures. That naturally opens the door to additional risk. To compensate, we’ve gone back to our security principles, we’ve isolated workloads, and we’ve engineered multiple independent layers of defense.

Potential Threats

Snowpark’s third-party libraries represent untrusted code which could pose a very real threat to security. If that code finds a way to connect to a hostile endpoint on the Internet, it could pull down additional malicious payloads, perform command-and-control, and exfiltrate sensitive data. If malicious actors are able to observe or manipulate platform-level processes, they could corrupt data or perform a denial-of-service (DoS) attack.

Snowflake’s Security Architecture

Snowflake secure sandbox illustration showing a secure sandbox that uses 5 components that offer protection: namespaces and cgroups, seccomp-bpf, chroot filesystem, ptrace, and threat detection. Inside the secure sandbox there is a language runtime. Inside the language runtime is a user-defined function. Outside of the secure sandbox there is a query engine. There is a bidirectional arrow between the query engine and the language runtime inside the sandbox.

As shown in the figure above, Snowflake deploys a multi-faceted security architecture to isolate potentially malicious workloads. These mechanisms protect the kernel, the network, the host filesystem, and the workload orchestration processes. This architecture includes a number of different tools and solutions. They include:

namespaces and cgroups — namespaces are used for process isolation.
seccomp-bpf — Seccomp BPF (SECure COMPuting with filters) is used to restrict system calls.
chroot — chroot isolates the files accessible locally by processes.
ptrace — ptrace watches system calls processes make.

Snowflake’s first line of defense is the language runtime built by Snowflake that complicates attempts to perform architecture-level attacks. This either hinders or prevents the application of a variety of potential tactics, techniques, and procedures (TTPs).

The language runtime executes inside a chroot filesystem that has a minimal set of shared libraries and other dependencies necessary to run the UDF. It also runs inside its own set of namespaces including network, user, mount, PID, IPC, UTS, CGroup, and time. Namespace isolation is a primary mechanism employed by popular container-based solutions.

All processes in the sandbox environment are subject to a seccomp-bpf filter that minimizes the kernel system call surface area to include only what UDFs require to execute. Snowflake further governs system calls with ptrace, which we use as part of our threat detection capabilities to detect when system call utilization may indicate malicious activity.

Egress Traffic Control

An illustration with a compute node with a secure sandbox and a query engine inside. The secure sandbox and query engine have a bidirectional arrow connecting them. To the right of the compute node there is an egress proxy. There is a unidirectional arrow from the query engine to the egress proxy.

All network traffic between the compute clusters and the Internet traverses an egress proxy that enforces access control policy and performs monitoring.

Snowflake treats all network traffic from compute clusters as untrusted. Traffic to internal services is limited to a set of authenticated endpoints. Traffic to external networks traverses an egress proxy that enforces access control policies and monitors for unexpected network activity.

The egress proxy blocks attempts to access unauthorized endpoints and reports any such attempts to the Snowflake incident response team.

Attack Scenario

Suppose a threat actor who we will refer to as “Malory” seeks to gain unauthorized access to a compute node within Snowflake to try and compromise customer data. She writes a Python script that attempts to access arbitrary memory and storage on the compute node, but she finds that isolation enforcement mechanisms including namespaces, process isolation, and a chroot filesystem prevent her attempts to access resources outside of the restricted sandbox environment in which her script runs.

Malory’s next strategy is to attempt to break out of the sandbox environment, so she starts looking for ways to apply container-escape tactics. These include capability-based breakouts, mountable devices, control sockets, and exploiting vulnerabilities such as CVE-2019–5736 or CVE-2020–15257. She finds a thoroughly locked-down environment in which the opportunity to introduce such techniques simply doesn’t exist.

Let’s further assume Malory is an advanced threat actor who applies a zero-day exploit to successfully break out of the sandbox environment and take control of the compute node. She finds the compute node is walled off from most Snowflake services via network security groups that apply to its VPC. She discovers some endpoints she can access, such as one that provides status information for in-flight queries in the account. However, she needs credentials to pass internal authentication checks, and the only credentials she has are scoped for the specific account for the compute node that she has compromised. Because she is using that same account to launch her attack in the first place, she gains no access to customer data outside of her own account.

Malory figures that at least she can try to disrupt Snowflake operations by performing a denial-of-service attack, but she finds that a rate-limiting mechanism prevents degradation of service and notifies Snowflake’s threat detection team of potentially malicious activity in the network.

At this point, Malory gives up on directly attacking Snowflake’s infrastructure and instead attacks a target user’s software supply chain, introducing trojan code into a Python dependency in an attempt to exfiltrate the target user’s data over the network. Snowflake’s egress proxy detects the trojan code’s attempt to connect to an unauthorized endpoint on the Internet, blocks the connection, and alerts Snowflake’s threat detection team of the unauthorized network event.

Despite all of Malory’s attempts, she has been thwarted and/or detected before she can carry out a number of attempts at infiltrating the system and wreaking serious havoc.

Conclusion

This combination of security-first design, on-host access control, isolation mechanisms, and an egress proxy offers Snowflake and its customers a high degree of protection from potentially malicious code that may find its way into UDFs. Much of this involves applying established best practices, and the Data Cloud has inspired the Snowpark team to continue developing advanced technology to secure your data. Although Snowpark was an ambitious platform to create, the extremely positive feedback about the type, size, and diversity of workloads customers already run with Snowpark tells us it was worth the effort. We feel we’ve created something uniquely useful and secure.