Migrating Snowflake’s Metadata with No Downtime

Every aspect of our technology leverages metadata, ranging from zero-copy cloning, time travel, and modern data sharing, to how Snowflake’s revolutionary elasticity allows customers to scale instantly and near-infinitely. Snowflake’s Engineering team recently performed surgery on FoundationDB (FDB) — the “heart” of its metadata store, making it even more capable of enabling exciting new features for the Data Cloud.

Snowflake has been relying on FDB as its metadata store since before the Data Cloud was commercially available. Then, in 2018, our engineering team embarked on a carefully-planned, two-year mission to migrate from a proprietary version of FDB to the newly open-source version. After a successful migration, the performance and reliability improvements to our metadata system created amazing opportunities, but the accompanying challenges were just as significant. This blog series details how Snowflake Engineering accomplished this feat.

The Challenge

Snowflake has depended on FDB to power its metadata since 2014. After FDB was bought in 2015, Snowflake created a team tasked with improving this beneficial but complex technology to enhance the Data Cloud.

In 2018, FDB was open sourced, creating a crucial opportunity for Snowflake. We were working on several large projects to improve the scale and stability of FDB which was at release 3 (FDB3). Open source FDB, which was at release 6 (FDB6) in 2018, contained a number of performance improvements which complemented ours. By merging our changes into open source FDB6, we would improve both codebases and establish a collaborative relationship with our partners in the FDB community.

But migrating the Data Cloud’s metadata would not be an easy task. We had more than 20 deployments of the Data Cloud across multiple regions in both AWS and Microsoft Azure, serving thousands of customers’ data. At the heart of each deployment was an FDB3 cluster with hundreds of CPUs, serving millions of transactions per second.

In addition, Snowflake’s SLAs meant we could not rely on scheduled maintenance windows, and our customers couldn’t experience any dip in performance during the migration. The Data Cloud also had to maintain strict ACID-transactional guarantees on its metadata. Our service constantly relies on high frequency reads and writes to its metadata at sub-millisecond low latencies. One lost or out-of-order read or write could cause long outages for hundreds of customers. Like a heart, FDB needed to be up and “beating” for the rest of the Data Cloud to function.

Snowflake’s project Pole Vault: migrating to open source FDB6

The codebase had diverged so much that FDB3 and FDB6 were essentially different systems. Updating FDB3 to be compatible with FDB6 would be an enormous and risky effort, so upgrading the Data Cloud’s metadata stores in place was not an option. Instead, Snowflake devised a custom process to migrate every deployment’s data from FDB3 to an FDB6 cluster. We called this migration “Pole Vault”, as we were vaulting the Data Cloud’s metadata database three versions at once.

Architects and senior management across the entire company, including our founders, reviewed the plan for the delicate migration. After several design and operational reviews, everyone agreed on the following success criteria:

Guaranteed Availability
The FDB metadata store couldn’t be unavailable for more than 30 seconds.
Guaranteed Consistency (No Data Loss, No Data Corruption)
The metadata must be logically identical when a Data Cloud deployment switches to the new metadata store.
Feature Parity
All features and improvements we added to FDB3 had to be merged into FDB6.
Improved Performance
FDB6 must perform at least as well, or better, then FDB3 for the workloads in Snowflake’s deployments.
Multi-cloud Solution
The Pole Vault migration needed to work seamlessly across all our cloud providers, preserving the Data Cloud’s cross-cloud capabilities. At the time, this included AWS and Microsoft Azure. Our deployments on Google Cloud Platform didn’t require migration because their Data Cloud deployments started out on FDB6.

These criteria were all essential, but availability and consistency were the most significant requirements of Snowflake’s SLAs.

The Difficulty of Guaranteed Availability: the 30-second window

Snowflake’s SLAs do not allow for scheduled maintenance windows. Our deployments had to remain available and continue to serve customer queries while the Pole Vault migration happened. We could not stop the heart of a deployment. But if FDB is the heart of the Data Cloud’s metadata, migrating a live deployment to a new FDB cluster would be similar to open-heart surgery.

The Data Cloud’s Global Services (GS) layer is heavily dependent on its metadata store. Customer queries cannot begin or complete without accessing the metadata store, so every GS node, sometimes hundreds of machines, constantly read and write to its FDB metadata store to stay in sync. GS was developed to be resilient to brief FDB outages, but if FDB is unavailable for more than a minute, customer queries may time out. Because longer outages would affect our customers’ queries, we gave ourselves a 30-second window for operations requiring FDB unavailability, such as consistency checks or switching every client to the new metadata store.

The Challenge of Guaranteed Consistency

We could not compromise on the consistency check. Each GS operation — scaling resources, locating encryption keys, cross-region replication, and more — all rely on their shared metadata being consistent. One missed or out-of-order transaction could take hours to repair, so the two metadata stores had to be identical when we switched GS to FDB6.

Ideally, we wanted a 100% guarantee that the two deployments would be identical before the switch to FDB6. But this would be impossible without hours of FDB downtime. To maintain transactional consistency, GS could only talk to one metadata store at a time. So, FDB3 had to stream new data to FDB6. This meant the two metadata stores would never be identical while the deployment was running traffic to it. To overcome this challenge, we devised a method to guarantee 100% consistency at a snapshot in time hours before the switch.

The engineering behind Pole Vault

Building a replication engine to stream metadata to the new database while retaining transactional consistency would be an enormous challenge. But before FDB became open source, we were already developing a high availability (HA) failover solution in FDB3 which we dubbed the “replication cluster”. A big reason we chose the Pole Vault migration strategy was because our HA solution required a similar replication engine, so we repurposed the replication cluster to push to a new database, effectively “failing over” to FDB6.

Pole Vault setup

Migrating Snowflake’s metadata safely without affecting the Data Cloud entailed many more engineering challenges. To guarantee point-in-time consistency, we built a feature that allowed both FDB clusters to take consistent disk snapshots. To complete the migration without downtime, we modified the Data Cloud service to orchestrate an atomic switch to a new metadata store across hundreds of machines in our 30-second window. We built new tools and test pipelines to guarantee FDB6 would perform at least as well as FDB3 for every customer workload. Adding to the challenge, the size of the Data Cloud continued to grow even as we stopped our development of FDB3 to focus on FDB6 and the migration.

Pole Vault’s Success

Pole Vault was one of the largest operations at Snowflake in recent history, clocking two years of development and operational effort involving many of our engineering teams. We maximized our confidence at every step before touching customer metadata. During development, we took full advantage of FDB’s unparalleled simulator, running automation to test each piece of the migration for thousands of simulated hours. (Watch here to learn about FDB’s simulation feature). Then we performed hundreds of migrations on real internal Data Cloud deployments by migrating back-and-forth between two FDB clusters, vastly improving our confidence in the end-to-end pipeline.

When we were ready to migrate our customers’ metadata, we ordered our Pole Vault deployments carefully by deployment size and client count across AWS and Azure, refining the process as we faced new scaling challenges, and continued to build our confidence before tackling our largest deployments.

Pole Vault counts and success rates

The migration operation on each of our larger deployments took several weeks to complete. While we focused on the Pole Vault migration, the Data Cloud continued to grow to the limits of what an FDB3 cluster could handle. Normal operations were already taking up most available throughput of the FDB3 clusters, so there wasn’t much left for the replication to the FDB6 clusters.

But the time and effort we put into development, safety guarantees, and testing paid off. When we completed the migration, our query performance was better across the board, the Data Cloud was no longer constrained by our metadata’s scale, and our customers were completely unaware of the operation. Today, our metadata is powered by the latest and greatest versions of FDB. We continue to collaborate with our partners in the open source community to improve FDB, unlocking exciting new features and abilities for Snowflake’s Data Cloud.

Please stay tuned for the next blog in this series, which will detail the development of the FDB Replication Service and how we repurposed it for the Pole Vault.

--

--