Snowflake Unistore: Bridging Analytic and Transactional Databases

Traditionally, “transactional” databases are different from “analytical” databases, both from architecture and data processing perspectives. Transaction data is stored in row format for faster lookup and insertion, while analytics data is stored in columnar format to process large amounts of data efficiently. As a result, data are stored in different silos; however often time you need to do analytics on transactional data and the other way around i.e. some transactions on analytic data. As a result, need to set up a data pipeline to move data from one system to another systems. This brings up the problems associated with latency, cost, and security/governance.

Snowflake Data Cloud Platform is addressing this challenge with a new feature called Unistore, technically called Hybrid Table. This feature (public preview in AWS limited region) allows a single database that stores both types of data in a single place (transactional and analytical), which eliminates the need for ETL. This feature can also be used for application developers to develop applications that need a lightweight transaction workload.

SNOWFLAKE PLATFORM — UNISTORE

The hybrid table is a first-class Snowflake object that allows you to store data that are efficient in both transaction and analytical workloads without any architectural changes and all within the same Snowflake Platform. If you are doing transaction workload, such as single row lookup, or CRUD operation, on the hybrid table it utilizes row format. If you want to do an analytical query data is transformed on the node to a columnar representation and cached for future queries. In both cases, you will latest committed data. The best part is it is used as a single logical table and you do not worry about which type of query it is or which data store would be best for such a query. Snowflake handle these heavy lifting under the hood as a part of service.

Let’s see some of the characteristics of Hybrid Table:

  • Hybrid table is a first-class snowflake object that allows you to store data that are efficient in both transaction and analytical workloads.
  • Hybrid Table is optimized to support analytical and transactional workloads concurrently.
  • It has hybrid storage, table data is stored in the row format in high-speed storage and continuously replicated to the cloud’s BLOB storage (for analytical queries).
  • Supports transactional workloads such as fast, frequent selects, inserts, updates, and deletes with high concurrency, low latency (two-digit ms), and high throughput for fine-grain transactional point queries. (1000s of queries per second).
  • Supports analytical workloads such as complex queries, and fast large-scale scans, where the data is transformed on the node to a columnar representation and cached for future queries.
  • Hybrid table must have a primary key, and it could be a single or composite key.
  • You can have secondary indexes for non-primary keys for faster lookup and foreign keys for referential integrity constraints.
  • It supports ACID transactions with a Read Committed isolation level.
  • It supports multi-statement transactions, and cross-domain transactions within the same database (regular snowflake table and hybrid table)
  • It has Row-level locking for DMLs
  • You can use the same warehouse as you use today and data are isolated from the warehouse.
  • It supports data governance features such as Data Masking
  • If you want to convert an existing table to a hybrid table, use CTAS operation, see below for some command reference
CREATE OR REPLACE HYBRID TABLE yourtablename
(
employee_id int primary key,
department_id int,
name varchar,
email varchar constraint email_idx unique,
index secondary_idx1 (department_id, name)
) as
select * from db.schema.regulartable

SHOW HYBRID TABLES ;

SHOW INDEXES;

Use cases

Hybrid table should not be used to replace some heavy OLTP applications that require very low latency and transaction throughput. However, some of the use-cases are:

  • A lightweight Transaction application that. need to retrieve and update the data frequently and need faster response time.
  • Need data store that needs to update the current state of job or data pipeline that needs single row update or quick lookup.
  • Read-only data that is accessed through API that needs 1000s of queries per second.

Conclusion:

Hybrid tables make it easier (no learning curve) for you to work with all kinds of data, with one platform to manage, store, and process data. The real value you get out of a hybrid table is you have your data in one place, thereby, it enforces governance/security and reduces costs by eliminating infrastructures such as MySQL, Postgres, MongoDB etc., and costs for managing data pipelines. Furthermore, you get all the benefits of Snowflake Data Cloud Platform such as Cloud Agnostic, Multi-Cloud, Managed Service, Ease of Use, Security, and Governance.

Please note that today this feature is available only in AWS and in a few regions. Please read the document.

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

--

--