Navigating the Data Divide

Supporting Transactional and Analytic Workloads with Snowflake

For nearly four decades, the world of data management has been divided into two distinct realms: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing). During this time, we’ve witnessed the practice of extracting data from operational systems and transferring it to separate platforms optimized for specific tasks. OLTP systems were optimized to handle operational workloads, while OLAP systems reigned supreme in the realm of Data Warehousing. In legacy, on-premises environments, organizations had their preferred database vendors for each domain, creating a clear division. Some of the most popular choices were:

Some of these, like Oracle and Teradata, have some limited support for both BUT are clearly optimized for one side and only stretching into the other with highly specialized product offerings.

Why have we lived in this dual-platform world for so long? The answer lies in the inherent engineering tradeoffs between these two environments. Optimizations made for OLTP systems would often result in sub-optimal data warehousing, and vice versa. This created a separation, where improving one aspect of data processing often came at the expense of the other. A couple examples of where these tradeoffs occur include:

Column / Row Orientation

OLTP databases must be able to retrieve, update, or insert a row of data with extremely low latency — often requiring single digit milliseconds. For this to be possible, physics requires that the columns in a row be contiguous and co-located so that these operations can be performed with minimal IOs and minimal cycles for assembling rows from their individual columns. Many OLTP operations seek to retrieve an entire row or insert a single new row. Keeping the columns in these rows together makes for more efficient IO.

OLAP databases, on the other hand, have to deal with the fact that (unlike OLTP), there is typically a much higher ratio of reads to writes. So they will gladly pay the price of a bit more processing on writes if it makes read operations more efficient. These reads also tend to access a relatively small percentage of the columns of the underlying table while only rarely needing to retrieve an entire row. This has resulted in the leading OLAP databases opting for a more column-oriented storage architecture. A column orientation allows OLAP databases to more efficiently scan large volumes of data by accessing only the columns necessary to support a given query rather than bringing a larger number of unnecessary columns along for the ride.

Local / Object Storage

To achieve the very low latency goals of OLTP databases, its important to have a storage architecture that minimizes transfer time for individual IO operations. Here you often see in-memory and local SSD options that can meet these goals. Here, the storage costs per GB can be much higher but the volumes tend to be lower than some of the high end OLAP use cases.

On the OLAP side, the goal is more about IO throughput, cost, and scalability. The data volumes in modern OLAP databases can often extend to multiple petabytes and higher. The economics of storing these volumes becomes the paramount driver as well as the ability to scan and process huge amounts of data per query. Here, cloud object stores tend to be the best fit as they are managed separate from compute, have unlimited scalability, and are more financially viable as data volumes skyrocket. Also, OLAP use cases tend to not have extremely low latency requirements.

It is the purpose of this article to explore some of the engineering tradeoffs that exist between these two workloads and platforms and to demonstrate how Snowflake can help address the requirements of both analytic and transactional workloads — thereby reducing data silos.

The Gray Area: The Operational Data Store (ODS)

Amidst this clear division between OLTP and OLAP, a gray area emerged — a hybrid space where applications and workloads exhibited characteristics of both worlds. These workloads were designed to support decision-making, but they did so in a tactical, near-real-time capacity. Think of OLAP queries that run within interactive applications, focused on current data, and have strict performance goals, typically in the range of 1 to 2 seconds. Or analytic applications that need to react quicky to near real-time events and require higher data freshness rates. These workloads came to be known as Operational Data Stores or ODSs.

While the term “ODS” has somewhat fallen out of vogue in recent years, primarily due to the proliferation of transactional database options available in the cloud, there are still situations and workloads that demand the unique blend of OLAP support for large strategic queries and the ability to handle fast-moving transactions essential for OLTP workloads.

Technical Requirements of Transactional vs Analytic Workloads

Before exploring how Snowflake can support these hybrid workloads, let take a look deeper at some of the characteristics that distinguish them:

The above table is generalized and there are certainly “grey” areas between the two types of workloads, but it’s easy to see the contrast between them and to appreciate how challenging it is to support both in a single platform. Let’s now shift to discussing how Snowflake can help in navigating these tradeoffs. Since its inception, Snowflake has maintained a maniacal focus on enabling customers to run all of their workloads against a single copy of their data thereby eliminating data silos and all of the consequences that result from silos. These consequences include:

Snowflake set out to build a platform that eliminates many of the motivations for building silos in the first place like scale limits, technical restrictions, and the inability to support multiple workloads on the same platform (Fig1). Having separate platforms for OLTP and OLAP workloads represents even more data silos — although silos that many have come to accept as unavoidable. This divide will most likely never disappear completely when it comes to OLTP and OLAP, but Snowflake has made considerable investment in blurring the lines and bringing these two worlds closer together and providing a platform that provides world-class support for OLAP/Data Warehouse workloads while also supporting some popular use cases and scenarios in the OLTP arena.

Fig1: Snowflake’s Multi-workload Architecture

Unistore and Hybrid Tables

With the introduction Unistore (in Public Preview at the time of this writing) to Snowflake’s family of supported workloads, customers can now enjoy many of the benefits of both analytic and transactional workloads on the same logical table object. This is possible via a features called Hybrid Tables. With Hybrid Tables, Snowflake creates two underlying copies of the data: one supporting transactional requirements, the other supporting analytic requirements. This effectively resolves the hard engineering tradeoffs between these two workload types in a way that hides much of the underlying complexity from the customer. Hybrid tables can easily co-exist and can be joined to standard tables. The differences between hybrid and standard tables are described here:

While its important to note that Hybrid Tables currently will not meet the requirement of all OLTP workloads, a significant number of transactional use cases in modern data ecosystems will fall within its capabilities. The initial sweet-spot of Hybrid Table use cases will generally break down into three categories: Application State, Data Serving, and Lightweight Transactional Apps.

Key requirements for Operational Workloads, Why they Matter, and How does Snowflake support…

In this section, lets dive into some of the specific requirements for operational workloads, why these are important, and how Snowflake supports these requirements.

Multiple Data Model Support:

Why it Matters: Operational and Data Warehouse (DW) workloads often require different data models to meet functional and performance requirements. Where OLTP workloads can benefit from more normalized or 3NF data models that are indexed for fast, keyed access, OLAP workload generally prefer a more denormalized or dimensional model that minimizes joins — especially among very large tables.

Snowflake Support: Snowflake’s versatile platform seamlessly accommodates various data models, such as dimensional/Star Schema, normalized (Relational/3NF), and Data Vault, empowering organizations to tailor their operational workloads to specific requirements without constraints. Hybrid Tables enhance this capability by providing Primary/Foreign Key support, and both column and row orientation of the underlying data to support both transactional and analytical workloads without performance tradeoffs.

ACID and Multi-Statement Transaction Support:

Why it Matters: Transactional integrity is paramount for operational workloads, ensuring data consistency during complex write operations. ACID support guarantees transactional integrity and implies the need for complex transactions that support multiple statements with BT/ET or COMMIT/ROLLBACK functionality.

Snowflake Support: Snowflake’s ACID-compliant database and multi-statement transaction support guarantee the reliability and consistency of operational workloads, even when dealing with intricate transactions.

Lock Isolation & Deadlock Management

Why it Matters: When multiple transactions concurrently read and write to the same tables, maintaining transactional integrity without impacting throughput by causing excessive blocking or deadlocks is vital.

Snowflake Support: Snowflake’s Read Committed isolation level and advanced deadlock management ensure that readers and writers never block each other, and transactional integrity is consistently upheld. Deadlocks cannot occur with multiple concurrent auto-commit DMLs. However, deadlocks can occur with concurrent, explicit multi-statement transactions. In these cases, detection is always in place and tunable. In the event a deadlock does occur, the youngest transaction is identified and automatically aborted. Hybrid Tables can improve write throughput even further by supported lock granularity at the row level. This allows multiple writers against the same table with minimal blocking.

Workload Isolation

Why it Matters: Effective workload isolation is critical for the efficient operation for platforms supporting both transactional and analytical requirements. Isolation ensures that different workloads can coexist without negatively impacting each other’s performance and service levels as they compete for the same shared resources. This has historically been a problem for many on-prem platforms with fixed compute capacity. In these cases, admins must engage in the arduous task of managing and prioritizing these competing workloads. Not an easy job if all workloads claim to be of high importance!

Snowflake Support: Snowflake’s core architecture supports the decoupling storage from compute and allows workloads to be separated into multiple compute clusters or virtual warehouses. This isolation effectively guarantees that each workload will receive the resources it needs to meet its SLA without being impacted by others.

High Availability

Why it Matters: Operational environments often support business-critical applications that demand uninterrupted operation with zero tolerance for planned or unplanned downtime. Traditional mechanisms for backup or disaster recovery on some database products will either compete with other workloads for resources or place locks on underlying objects to ensure consistency. These can both be disruptive to normal business operations and put service level goals at risk.

Snowflake Support:

Snowflake’s high availability features and robust architecture ensure continuous service, eliminating the need for planned or unplanned downtime. Snowflake stores data persistently in highly durable cloud object storage, with up to 11 9s of durability, and supports multiple availability zones (AZs) up to 99.99% availability, providing exceptional availability with no downtime required for maintenance, backups, or disaster recovery. Traditional backups are unnecessary with Snowflake’s Time Travel. Replication processing for DR requires no online impact or disruption, requires no special locks, runs on isolated compute resources, and efficiently identifies only changed data to be replicated thereby minimizing data movement.

Low Latency

Why it Matters: Transactional applications must often meet very low-latency requirements which cannot be compromised when coexisting with other workloads. Transactions must operate at the speed of business and variability in transactional performance can have a direct and immediate impact on business metrics.

Snowflake Support: With Hybrid Tables, Snowflake can consistently deliver very low latency performance on many transactional workloads with support for indexing, primary keys, and row orientation. And the workload isolation described above ensures that these low latency requirements can continue to be met regardless of other workloads that may have entered the mix.

High Concurrency

Why it Matters: Operational workloads require high concurrency to support dynamic and real-time operational processes efficiently. Supporting platforms must be able to handle multiple users, requests, and transactions concurrently, enabling efficient decision-making, optimal resource utilization, and a responsive operational environment.

Snowflake Support: Snowflake’s architecture, features, and design principles are specifically tailored to scale to the highest levels concurrency in workloads. Snowflake’s design optimizes resource allocation, query execution, and compute separation, enabling it to handle multiple concurrent users and queries efficiently while maintaining performance. Hybrid Tables elevates this even further to support high concurrency on transactional workloads using a lower lock granularity at the row level.

Streaming Data Pipelines

Why it Matters: Continuous data ingestion and processing with low latency is crucial for operational workloads. A growing number of analytic workloads also demand this timely availability of fresh data. Often these applications must react to events happening in real-time or near-real-time. The time between a business event (order, call, cyber threat, etc…) and the time that event is available for analysis needs to be minimized. This leaves no time for the delays that result waiting for batches of files or tables to be processed. Ingestion solutions need to be streaming and continuous with minimal latency. Data Pipelines that transform data post-ingest must also be continuous and make the final data product available with the lowest possible latency.

Snowflake Support: With Snowpipe Streaming, customers can ingest data into Snowflake tables continuously without ever first landing the data in a file. This results in the very low latency ingest. Snowpipe Streaming can also be integrated with popular streaming and messaging technologies like Kafka and Kinesis Firehose. In addition, Snowflake’s Dynamic Tables allow customers to build declarative data pipelines with a simple SQL statement that specifies a lag parameter. This offloads the need to manage sometimes complex and extensive data pipelines with many steps and provides low latency and continuous delivery of incoming data to its point of consumption.

Conclusion

In the modern data landscape, the need for OLAP or Data Warehousing workloads to exhibit some characteristics of OLTP systems is important to a well constructed data architecture where the elimination of data silos, complexity, and redundancy are guiding principles. In this cases, Snowflake shines as an exceptional solution. The ability to combine advanced analytic capabilities with the near real-time operations and data freshness can empower organizations across industries to harness the power of their data at scale and to move at the speed of business. Snowflake’s ability to continuously monitor, analyze, and act on data in real-time can provide businesses with a competitive edge. It’s not just about storing data; it’s about leveraging it for actionable insights, proactive decision-making, and improved operational efficiency. With Snowflake, organizations can unlock the full potential of their data, adapt swiftly to changing conditions, and stay ahead in a rapidly evolving digital landscape.

I hope you found this article useful. If you enjoyed it and would like additional info, please follow me on my profile or hit me on social media at Linkedin. For more information on how Snowflake can support the types of workloads discussed here, please contact your Snowflake account team.

All information and opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

--

--

Tom Manfredi
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Scalable Data Warehouse/Lake Architectures, Snowflake, Teradata, Performance Optimization, Snowflake SnowPro Core Certification, Teradata Certified Master