Snowflake and the Seven Challenges of Scaling Data Strategies

“Snow White, perform due diligence before you bite!”

In this not-so-fairy-tale the seven under-tall but very important helpers — Pushy, Sizey, Costy, Diversey, Designy, Stressy, and Sharey — show the seven challenges of scaling data strategies, and how Snowflake solves them!

Beware the pure heart of Snowflake, Grimhilde!

Pushy

Pushy wants to push all the data from disparate systems into one location to increase access, transparency, data mixity and therefore innovation for the whole community.

Pushy is looking for solutions that adhere to open standards like ANSI SQL.

He opposes the use of scheduled and unscheduled dynamite service windows as they disrupt on-going ingest, transformation and analytic activities.

Pushy enjoys cloud infrastructures that write data in triplicate for durability!

Snowflake offers the right fit with a single platform that can mix your corporate, sensor, social, and ecosystem data. With no service windows, Snowflake enables you to run data engineering, data lake, data warehousing, and cyber-security workflows. It also lets you apply one central governance model, or — for example — Data Mesh approach.

Pushy is wary about queens who have bundled and container-wrapped legacy systems to look like data spherical (get the reference?) shiny apples. He doesn’t like that he needs to use multiple systems for storage, or having to do complex transformations before landing the data. Or that the data is only structured.

Sizey

Sizey doesn’t like capacity planning in chunks, in hot, warm or cold storage. He likes one magic pot, sized to the amount of data gold he got that day. Sizey also does not like to manage partitions or indexes.

Snowflake lets you load as much or as little data as you need, and you don’t have to manage it.

Snowflake’s largest customers have between 25–40 PB of data in Snowflake, while many other customers have around 1 TB of data.

Snowflake provides computation clusters (virtual data warehouses in Snow-speak) of all sizes that are already running. Simply attach and go! Then stop, and stop paying. Or, keep going! It’s up to you. No need to plan capacity in advance.

Sizey watches out for magic mirrors that describe data in Gigabytes, and not Terabytes or Petabytes. He doesn’t like it when he has to stop his system in order to resize his storage or compute, and only once a month, or that he is not allowed to down-size if he needs to.

Costy

Costy leverages new technologies and cloud computing because of the economies of scale it provides, not because the Queen insists he must. Costy wants the flexibility of only paying for what one uses, like electricity or water utilities. He does his due-diligence, and when solutions are 10x cheaper he wants to know why.

Snowflake lets you pay on-demand and per second for computation power, plus the average price of storage used. Or contract with Snowflake to pre-buy credits for less.

Snowflake lets you adjust computation power on-the-fly or programmatically, depending on your needs. You only pay for what you use.

Costy watches out for poison apples that promise extreme speed across all workflows, but don’t have the variable storage and compute to back up the claims. He thinks these solutions must be over-dimensioned most of the time.

Costy is also wary of systems that he has to pay for 24 hours a day, seven days a week. He educates his children, or rather his children educate him about conservation. Computation conservation should be the rule, not the exception. He also doesn’t like complicated, confusing licensing models.

Diversey

Diversey makes sure that any new solutions accept all types of data, structured (database, CSV), semi-structured (JSON, XML, Parquet, etc.), and unstructured, and that they are treated with respect. That many languages can be spoken and can contribute.

Snowflake treats structured and semi-structured data as first-class passengers. Ingest and egress is easy, and Snowflake manages your unstructured data as well. Customers run complete data lakes with all sorts of data formats and shapes directly in Snowflake.

Snowflake runs Java, JavaScript, Scala, Python, and SQL Scripting code inside of Snowflake through Snowpark. Your organization likely has coders skilled in these common coding languages.

Diversy watches out for systems that require the right format for ingestion and store the data outside of tables where it can’t mix with the other data, or it just doesn’t accept different types of data, or you have to use proprietary libraries, and do not support open-source libraries.

Designy

Designy likes to understand the underlying architecture of a solution, because a solution that is well designed and fit for purpose runs well, with low maintenance, for the provider and consumer.

Solutions that have been cobbled together tend to have a lot of back-office elves keeping them running, and cannot reach economies of scale, either for the providers or their clients.

Snowflake is built in the cloud, using cloud-native services that guarantee instant elasticity and scalability. It’s not on-premise technology adapted for the cloud (which is what containers are usually for).

Designy watches out for solutions that are disguised, repackaged legacy on-premise solutions, that have proprietary protocols, and can run out of memory during large queries.

Stressy

Stressy likes to believe, but verify. In the new cloud world, systems can be instantiated and destroyed in seconds, or so the theory goes. Stressy puts systems to the test with real data sizes, queries, and concurrency. Better to test now than to have some nasty surprises later on.

Snowflake allows you to stress-test and concurrency-test by dialing up and down the amount of computation from XS to 4XL (compute cluster sizes). Storage sizing is done automatically. Put in as much data as you want. Storage is always hot and immediately available.

For example, load 100 TB of data into Snowflake (or use the 100 TB sample data set that is available with every account, instantly and free of charge) and run queries on it.

Run your tests with real, good-sized queries and a good number of users to see how production operations will work. Size the clusters to get an optimal response price point.

Want to test something against all your production data? Not a problem: Zero-copy-clone it at no extra storage cost.

Stressy hates waiting hours or days or even weeks to resize and configure systems correctly before he can test them.

Sharey

Sharey’s motto is “Sharing is Caring”, not “sharing is Copying”. He likes building ecosystems where the latency of exchange goes to zero but the quality increases because there is only one master copy and updates happen there immediately.

Snowflake provides customers with the option of sharing curated data without moving it, either point-to-point, point-to-multipoint, in a private exchange, or in a monetized public exchange.

Snowflake customers are building ecosystems of business partners to eliminate data exchange latency and manual effort. They use clean-rooms to share, but not expose, sensitive data to their partners. This is enabled because they all share the underlying Snowflake infrastructure.

Whole industries, from Healthcare to Finance, are building data sharing ecosystems on top of Snowflake Cloud.

Sharey watches out for pretend marketplace friends who aren’t really there, or are just ftp-ing around.

The Moral of the Story is…

Don’t be fooled by poisoned spherical apples or talking mirrors. Don’t take the solution just because Grimhilde says so. And don’t wait around for a prince!

Snowflake is an elegant solution in today’s modern data world. Just use the seven helpers–Pushy, Sizey, Costy, Diversey, Designy, Stressy, and Sharey–when you need to decide!

If you would like to learn more about cost optimization in Snowflake, check out this new ebook.

--

--

David Richert
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Before joining Snowflake I worked for SAP for 18 years in technical sales for their analytics portfolio. Snowflake fills the big data gap.