Snowflake Hybrid Tables
What are hybrid tables ?
Hybrid table is a Snowflake table type that is designed for OLTP or transactional application.Transactional applications need low latency and high throughput which is provided by Hybrid tables.
Hybrid table features
1. Primary Key(mandatory).
2. Unique key constraint.
3. Referential Integrity constraint.
4. Index.
We can create hybrid tables with the CREATE HYBRID TABLE <tablename> syntax.
Let us jump into a lab exercise to see hybrid tables in action.In this lab exercise we will review the features of hybrid tables
We start off by creating a database called UNISTORE and a schema called HYBRID in which we will create the DEPARTMENTS and EMPLOYEES hybrid table and populate them.We can use any name for the database and schema, I have chosen to use UNITORE and HYBRID.
CREATE OR REPLACE DATABASE UNISTORE;
CREATE SCHEMA HYBRID;
CREATE HYBRID TABLE DEPARTMENTS (
DEPARTMENT_ID NUMBER(4) PRIMARY KEY,
DEPARTMENT_NAME VARCHAR2(30) ,
MANAGER_ID NUMBER(6) NOT NULL,
LOCATION_ID NUMBER(4)
);
Hybrid tables must have a primary key else the table creation will fail.
We could remove the primary key in the syntax above and we would get the below error.
We can see Hybrid tables in the schema by using show Hybrid tables or show tables, SHOW HYBRID TABLES will show only the Hybrid tables where as SHOW TABLES will show all tables including Hybrid tables
SHOW HYBRID TABLES ;
SHOW TABLES ;
SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA<>'INFORMATION_SCHEMA'
We can determine if a table is a Hybrid table by selecting data from INFORMATION_SCHEMA.TABLES and checking the TABLE_TYPE.
Let us populate the DEPARTMENTS table with insert statements.Since the DEPARTMENT_ID data values in the insert statement is unique the insert statements will succeed without any issues.
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 10, 'ADMINISTRATION', 200, 1700);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 20, 'MARKETING', 201, 1800);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 30, 'PURCHASING', 114, 1700);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 40, 'HUMAN RESOURCES', 203, 2400);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 50, 'SHIPPING', 121, 1500);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 60, 'IT', 103, 1400);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 70, 'PUBLIC RELATIONS', 204, 2700);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 80, 'SALES', 145, 2500);
INSERT INTO DEPARTMENTS (DEPARTMENT_ID, DEPARTMENT_NAME, MANAGER_ID, LOCATION_ID) VALUES ( 90, 'EXECUTIVE', 100, 1700);
SELECT * FROM DEPARTMENTS;
Next we will create the EMPLOYEES table. When creating the employees table we will in addition to the primary key add a unique key and a foreign key to the table which will reference the DEPARTMENTS table DEPARTMENT_ID column. We will top that off by creating an index on the HIRE_DATE field.
CREATE OR REPLACE HYBRID TABLE EMPLOYEES
(
EMPLOYEE_ID NUMBER(6) PRIMARY KEY,
FIRST_NAME VARCHAR(20),
LAST_NAME VARCHAR(25),
EMAIL VARCHAR(25) UNIQUE,
phone_number VARCHAR(20),
HIRE_DATE DATE,
JOB_ID VARCHAR(10),
SALARY NUMBER(8,2) ,
COMMISSION_PCT NUMBER(2,2),
MANAGER_ID NUMBER(6) ,
DEPARTMENT_ID NUMBER(4) FOREIGN KEY REFERENCES DEPARTMENTS(DEPARTMENT_ID),
INDEX index_hire_date(HIRE_DATE)
);
We will insert rows into the EMPLOYEES table such that the primary key, unique key and FK constraints are not violated. We will have distinct EMPLOYEE_ID & EMAIL values and all DEPARTMENT_ID’s added to EMPLOYEES table will already be present in the DEPARTMENTS table.
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 100, 'Steven', 'King', 'SKING', '515.123.4567', TO_DATE('17-06-2003', 'dd-MM-yyyy'), 'AD_PRES', 24000, NULL, NULL, 90);
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 101, 'Neena', 'Kochhar', 'NKOCHHAR', '515.123.4568', TO_DATE('21-09-2005', 'dd-MM-yyyy'), 'AD_VP', 17000, NULL, 100, 90);
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 102, 'Lex', 'De Haan', 'LDEHAAN', '515.123.4569', TO_DATE('13-01-2001', 'dd-MM-yyyy'), 'AD_VP', 17000, NULL, 100, 90);
SELECT * FROM EMPLOYEES;
we will attempt to insert the employee ‘Alexander Hunold’ with EMPLOYEE_ID 100 into the table. This insert should fail as we have inserted EMPLOYEE_ID 100 as Steven King.
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 100, 'Alexander', 'Hunold', 'AHUNOLD', '590.423.4567', TO_DATE('03-01-2006', 'dd-MM-yyyy'), 'IT_PROG', 9000, NULL, 102, 60);
As expected the insert errors out with the primary key already exists error message which shows that the primary key in hybrid tables are enforced.
Next we will test the Unique key constraint. We will insert EMPLOYEE_ID of value 103 which is not present in the EMPLOYEE table and has an EMAIL of ‘AHUNOLD’. This will be followed by inserting EMPLOYEE_ID 104 which has the same email address as employee_id 103. Because we have a Unique constraint on email address we should expect this insert to fail.
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 103, 'Alexander', 'Hunold', 'AHUNOLD', '590.423.4567', TO_DATE('03-01-2006', 'dd-MM-yyyy'), 'IT_PROG', 9000, NULL, 102, 60);
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 104, 'Bruce', 'Ernst', 'AHUNOLD', '590.423.4568', TO_DATE('21-05-2007', 'dd-MM-yyyy'), 'IT_PROG', 6000, NULL, 103, 90);
As we see in the error message below we received a unique constraint error due to our attempt to load duplicate email address AHUNOLD.
Next we will test the Referential Integrity key constraint. We will insert EMPLOYEE_ID of value 105 which is not present in the EMPLOYEES table. We will use email address in this record with a value that does not exists in the target table ‘DAUSTIN’.
INSERT INTO employees (employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id) VALUES
( 105, 'David', 'Austin', 'DAUSTIN', '590.423.4569', TO_DATE('25-06-2005', 'dd-MM-yyyy'), 'IT_PROG', 4800, NULL, 103, 100);
However we are using DEPARTMENT_ID of 100 which does not exists in the DEPARTMENTS table and hence we should expect to receive a referential integrity error during the insert.
As expected we have the error message below. If we change the DEPARTMENT_ID to 90 which exits in the DEPARTMENTS table, the insert will be successful.
Lets take a quick look at indexes.We can create indexes during table creation and after table creation by using CREATE OR REPLACE INDEX .
In the example provided below we begin by creating an index on column phone_number followed by a composite index on first_name and last_name.
CREATE OR REPLACE INDEX INDEX_PHONE_NUMBER (PHONE_NUMBER) ON EMPLOYEES;
We can see all the indexes in the schema by running SHOW INDEXES.When primary key, unique key and Referential integrity constraints are created snowflake automatically creates indexes for those columns.So when doing SHOW INDEXES you will see indexes which you have not created.
SHOW INDEXES;
We can drop the indexes by running DROP INDEX
DROP INDEX INDEX_PHONE_NUMBER;
How can we drop constraints we have created
We cannot drop the PK constraint on a hybrid table. We are permitted to drop the Unique or referential integrity constraints. To be able to drop these constraints we need the constraint name which can be found in the INFORMATION_SCHEMA.TABLE_CONSTRAINTS.
SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS;
We can see the name of all the constraints on the hybrid table and we can use the name of the constraint in the drop command
ALTER TABLE EMPLOYEES DROP CONSTRAINT "SYS_CONSTRAINT_6a217315-c041-40b5-8137-ce5b500af03a";
Successful dropping of the constraint will provide us with a success message.
However referential integrity constraints are not found in TABLE_CONSTRAINTS and we need to go to INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS to locate them.
SELECT * FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS;
What is Unistore ?( Uni means one)
There are 2 types of workloads that a database can encounter
§ Analytical workload.
§ Transactional workload.
Different databases are used for analytical and transactional workloads as some databases are more suited for transactional workloads and some are more suited for analytical workloads.
Unistore is Snowflake bold claim that Snowflake cloud database with Hybrid tables is well suited for transactional workloads and Snowflake non-hybrid tables are well suited for analytical workloads making it a single store (uni.) for processing Analytical and transactional workloads
Snowflake hybrid tables and non-hybrid tables can be created in the same database and joined with each other. Since both type of workloads can be handled by Snowflake database , snowflake calls it Unistore. A single store that can handle both analytical and transactional workloads.
Snowflake Architecture and Hybrid tables.
Snowflake architecture has undergone change to accommodate hybrid tables.Lets first talk about what has not changed and then we will discuss what has changed
No change in the Cloud Services layer :We will continue to select data from snowflake as before the introduction of hybrid tables with snowsight and SnowSQL
No change in the Query processing layer:Which means that the queries will still be processed by Virtual warehouses, and they will consume same amount of credits as would have been needed for a non-hybrid table.
Architecture change:
Addition of row store: Snowflake has introduced row-store to store hybrid table data in row-oriented manner to support transactional workloads. This is needed as columnar storage that is used in non-hybrid tables is not suitable for transactional workloads and cannot meet the latency and throughput requirement.These row-oriented data benefit of secondary columnarization which means that the row-oriented data is then re-stored in columnar format to speed up joins with non-hybrid snowflake tables and are used for Analytical workloads
Some other things to consider
Hybrid tables consume more space that non-hybrid table:Hybrid tables consume more space than non-hybrid tables which makes sense since the row-oriented data has to be re-stored in columnar format for joins with non-Hybrid tables and this extra storage adds to the space consumed by the hybrid table.
Hybrid tables consume additional serverless credits:Hybrid tables consume additional serverless credits since compute is needed to turn row-oriented data into secondary columnar format. This will happen in the background, and we don’t have to do anything explicitly.
Indexes when created consume space:· Indexes when created consume space as they should because like all other databases indexes are stored externally to the table to help improve select performance.
Limitation of Hybrid table :
Hybrid tables do not support the following features. Please consider them carefully to determine if hybrid tables will work for you in spite of these limitations.
§ Clustering Keys
§ Time Travel/Data Retention Period/UNDROP
§ Cloning
§ Fail-safe
§ Dynamic Tables
§ Materialized Views
§ Query Acceleration Service
§ Search Optimization Service
§ Snowpipe
§ Streams
§ Data sharing
§ Periodic rekeying
§ Tri-secret secure encryption
§ Collations
Summary
§ Hybrid tables are designed to support transactional applications and workloads.
§ Hybrid tables must be defined with a PK and the PK is enforced.
§ Hybrid tables support unique key and referential integrity constraints.
§ Hybrid tables support indexes which help improve select performance.
§ Hybrid table data is stored is row-oriented for transactional performance.
§ Snowflake architecture has added row store to accommodate hybrid table data.
§ Unistore means a single data store that can support analytical and transactional workloads.
§ Hybrid table row-oriented data is re-stored in columnar format for analytical workload performance when joining with non-hybrid tables.
§ Hybrid tables consume additional storage and serverless credits for table maintenance.
§ Hybrid tables and non-hybrid tables can be created in the same database.
§ Hybrid tables suffer from several limitations including lack of time travel and fail-safe.
§ Hybrid tables are created by using HYBRID in the create table statement.
Check out my courses on Udemy.
Connect with me on Linkedin https://www.linkedin.com/in/sujithnaircloudda/