Snowflake Tables

9 Snowflake Tables- A briefing(as of 2023)

In this blog I would be discussing about various Snowflake tables and also some of the use cases which each of these tables can solve. Post reading this blog, I hope it should give some insights around each of the table type, how these are significant w.r.t multiple data workloads, where they can be used, etc.

Overall there are 9 different kind of table Snowflake has that are namely given as follows:

  1. Dynamic table
  2. Directory table
  3. Event table
  4. External table
  5. Hybrid table
  6. Iceberg table
  7. Permanent table
  8. Temporary table
  9. Transient table

Let us go though each one of them as mentioned below:

DYNAMIC TABLES

Dynamic tables, are fundamental units of declarative data transformation pipelines are dynamic tables. They substantially reduce the complexity of data engineering in Snowflake and offer an automated, dependable, and economical method of preparing your data for use.

A dynamic table allows you to specify a query and has its results materialized. You can define the target table as a dynamic table and specify the SQL statement that performs the transformation, saving you the trouble of creating a separate target table and writing code to transform and update the data in that table. This feature was renamed as Snowflake Dynamic Tables and is now available for all accounts. It was first introduced as “Materialized Tables” at the Snowflake Summit 2022, a name that caused some confusion.

The syndax for creating the dynamic table is:

CREATE OR REPLACE DYNAMIC TABLE DEMO_DYNAMIC_TABLE_CUSTOMER
TARGET_LAG = '1 minute'
WAREHOUSE = COMPUTE_WH
AS
SELECT
cst.c_name,
cst_addr.ca_street_number,
cst_addr.ca_street_name,
cst_addr.ca_street_type,
cst_addr.ca_suite_number
FROM
DEMO_DB.DEMO_SCHEMA.DEMO_CUSTOMER_ADDRESS cst_addr
INNER JOIN DEMO_DB.DEMO_SCHEMA.DEMO_CUSTOMER cst
ON cst_addr.ca_address_id=cst.c_address;
Dynamic table components

Benefits and use cases of Dynamic tables:

Let us now go through some of the benefits and use cases of the dynamic tables:

Some use-cases & benefits of dynamic tables

The best applications for Snowflake Dynamic Tables are those that require automated and straightforward data transformation. When handling massive amounts of data, where manual transformation would be laborious and prone to errors, they are especially helpful. For example we want to avoid writing code to avoid data updates/dependencies, avoiding the need to control the data refresh schedule, etc.

To know more about dynamic table please go through the documentation https://docs.snowflake.com/en/user-guide/dynamic-tables-about over here we get insights about cost, roles, managing the dynamic tables, how it can be pictorially viewed and governed, etc..

DIRECTORY TABLES

A directory table stores file-level metadata about the data files in the stage, and is conceptually similar to an external table and is an implicit object layered on a stage rather than a separate database object. There are no grantable privileges inherent to a directory table.Directory tables support both internal (Snowflake) and external (external cloud storage) stages. When a stage is created (using CREATE STAGE) or later (using ALTER STAGE), you have the option to include a directory table in it.

The event notification service for your cloud storage provider can be used to automatically update the metadata for a directory table. By performing a refresh, the metadata is brought into alignment with the most recent set of related files in the external stage and path.

/*Let us consider we have multiple files stored in Snowflake internal stage
Below is how we can see all the metadata details of the file.*/

SELECT * FROM DIRECTORY( @DEMO_INTERNAL_STAGE_01 ); --> This is how this table is accessed.

/*We can even query the table as given below */
SELECT * FROM DIRECTORY( @DEMO_INTERNAL_STAGE_01 ) where size>2000;

-- The output has the exact file URL where we do have our data stored.


/* Some more syntaxes with directory tables are as follows */

CREATE STAGE mystage
DIRECTORY = (ENABLE = TRUE) -->This is the keyword
FILE_FORMAT = myformat;

CREATE STAGE mystage
URL='s3://load/files/'
STORAGE_INTEGRATION = my_storage_int
DIRECTORY = (ENABLE = TRUE); -->This is the keyword
The o/p from the directory table command

Benefits and use cases of Directory tables:

Let us now go through some of the benefits and use cases of the directory tables:

Some of the benefits and use cases of directory tables

EVENT TABLES

Event tables are the ones which are specifically designed to capture the logs & events all natively within the platform. Snowflake’s telemetry APIs are what allows to collect and spread events. The telemetry APIs are supported across various languages for UDFs, UDTFs, and stored procedures. These are the primitives that can be utilized independently for formulating questions, as well as being utilized when creating Native Applications.

With event tables currently there are only specific operations that can be done with them which are mentioned as below:

Event table operations.

If we see over here we cannot directly update or insert the data to the event tables. Let us see now how they can be created within the platform.

--Step 1(Creating an event table)::
CREATE EVENT TABLE DEMO_DB.DEMO_SCHEMA.EVENT_TBL_V1;

--Step 2(Associate event table with an account)::
ALTER ACCOUNT SET EVENT_TABLE = DEMO_DB.DEMO_SCHEMA.EVENT_TBL_V1;

--Step 3(setting the log level)
ALTER SESSION SET LOG_LEVEL = INFO;

There are multiple ways through which the log level can be set the above example has the value as “INFO”, we can also have values like ‘trace’, ‘debug’, ‘info’, ‘warn’, ‘error’. Hence essentially to make the event table. A simple demo of the event table is given as below:

-- This use case is to track just the INSERT operations counts. Like how many records got inserted

CREATE OR REPLACE PROCEDURE DEMO_EVENT_TBL_PROC(OTPT VARCHAR)
RETURNS VARCHAR NOT NULL
LANGUAGE SQL
EXECUTE AS OWNER
AS
$$
DECLARE
dmlcount integer;
COL varchar;
QUERY STRING;
counts int := 0;
BEGIN
LET START_DATE TIMESTAMP := CURRENT_TIMESTAMP();
SYSTEM$LOG('INFO', 'Procedure started at: ' || TO_CHAR(START_DATE, 'YYYY-MM-DD HH24:mi:ss'));
--- To perform inserts
QUERY := 'INSERT INTO DEMO_DB.DEMO_SCHEMA.BKP_CUSTOMER SELECT * FROM DEMO_DB.DEMO_SCHEMA.DEMO_CUSTOMER LIMIT 10';
EXECUTE IMMEDIATE :QUERY;
SELECT $1 INTO counts FROM table(result_scan(last_query_id()));
IF (counts > 0) then
SYSTEM$LOG('INFO', 'Total number of records inserted are: ' || :counts);
END IF;
LET END_DATE TIMESTAMP := CURRENT_TIMESTAMP();
SYSTEM$LOG('INFO', 'Procedure completed at: ' || TO_CHAR(END_DATE, 'YYYY-MM-DD HH24:mi:ss'));
RETURN OTPT;
END;
$$
;

call DEMO_EVENT_TBL_PROC('A');
select * from DEMO_DB.DEMO_SCHEMA.EVENT_TBL_V1;
SELECT * FROM DEMO_DB.DEMO_SCHEMA.DEMO_CUSTOMER LIMIT 10;
select * from DEMO_DB.DEMO_SCHEMA.BKP_CUSTOMER;
The o/p from the event table

Benefits and use cases of Event tables:

Let us now go through some of the benefits and use cases of the event tables:

Event Table Use Cases

For more information please go through the docs: https://docs.snowflake.com/en/developer-guide/logging-tracing/tutorials/logging-tracing-getting-started

EXTERNAL TABLES

External tables allows us to query the data from the external stage as if the data is stored all inside the Snowflake. Over here it is important to not that Snowflake doesn’t store & manage the external stage hence these are purely managed by customers. External tables let you store (within Snowflake) certain file-level metadata, including filenames, version identifiers, and related properties. These are READ ONLY tables.

External tables can also be configured with event notifications(cloud provider like SQS from AWS), this is to ensure that any activity that happens within external cloud storage is captured correctly by this table.

We cannot perform any DML operations with these tables, but we can create views on top of this table and use this table along with other standard table of Snowflake for JOIN operations to get good insights.

Let us see the below method on how we can create the external table in Snowflake.

--Create the file format.
create or replace file format demo_csv_format
type = 'csv' field_delimiter = ',' skip_header = 1
field_optionally_enclosed_by = '"'
null_if = ('NULL', 'null')
empty_field_as_null = true;

--create the external stage via storage integration object.
create or replace stage demo_db.demo_schema.ext_stage_snow_demo url="s3://demostorageint/"
STORAGE_INTEGRATION=s3_external_table_storageint ---This is the storage integration object
file_format = demo_csv_format;


--Create the external tables.
create or replace external TABLE demo_customer_tbl (
CUST_SK varchar AS (value:c1::varchar),
CUST_ID varchar AS (value:c2::varchar)
)
with location=@ext_stage_snow_demo
auto_refresh = false
file_format = (format_name = demo_csv_format)


--Query the external tables.
select * from demo_customer_tbl;
The o/p of the external table.

There is also another column namely “metadata$filename” which stores the information about the file in the external tables. Hence external tables apart of having details about the exact data from the file do have metadata information as well.

Metadata columns of “External Table”

Benefits and use cases of External tables:

Let us now go through some of the benefits and use cases of the “external tables” :

Benefits of external tables

Some of the caveats of the “external tables”

Let us now go through some of the caveats of the “external tables” :

Caveats of external tables

More reads on external tables are given as below:

HYBRID TABLES

Hybrid tables are a new Snowflake table type powering Unistore. A key design principle is to have this table support all the transactional capabilities need. These are highly performant which is a need of any transactional application & support fast single row operations. They work on entirely new row-based storage engine. This is unlike other tables in Snowflake where data is stored in columnar way. Currently these are still in PrPr stage within Snowflake but once it is made available for general use it has got immense potential to unlock multitude of OLTP use cases.

Pic courtesy Snowflake → Hybrid Tables

Hybrid tables is going to enable the new workload of Snowflake i.e., “Unistore” which is a new workload that delivers a modern approach to working with transactional and analytical data together in a single platform. Hybrid tables were announced on Snowflake Summit 2022, hence this is also one of the feature which is eagerly awaited by all the data enthusiasts.

Below is how the tables would be defined for Hybrid, please note the use “PRIMARY KEY” & “FOREIGN KEY” constraints which would be enforced with the use of Hybrid tables.

CREATE HYBRID TABLE Customers (
CustomerKey number(38,0) PRIMARY KEY,
Customername varchar(50)
);


-- Create order table with foreign key referencing the customer table
CREATE OR REPLACE HYBRID TABLE Orders (
Orderkey number(38,0) PRIMARY KEY,
Customerkey number(38,0),
Orderstatus varchar(20),
Totalprice number(38,0),
Orderdate timestamp_ntz,
Clerk varchar(50),
CONSTRAINT fk_o_customerkey FOREIGN KEY (Customerkey) REFERENCES Customers(Customerkey),
INDEX index_o_orderdate (Orderdate)); -- secondary index to accelerate time-based lookups

Some of the key features/concepts of this tables are:

  1. Storage :: These hybrid tables data is stored as 2 copies one in the row storage & other in column storage. Hence a trade-off is that it would be a bit costly.
  2. Locking concept :: Locking happens at a row level for hybrid tables. This would enable higher concurrency execution of single row updates.
  3. Indexes :: Hybrid table uses B-Tree indexes both for primary key and secondary indexes. A good example is shown below see how table with secondary index defined skips the full table scan part from query profile.
  4. Compatible with connectors :: Hybrid tables are compatible with all connectors just like others.
  5. Composite key :: There are can be more than 1 column defined as primary key in hybrid table which would make it as composite key.
  6. Scalability :: Hybrid tables would have some size limit and it would be in some terabytes. This is unlike other tables in Snowflake.
  7. Pricing model :: It would be a bit costly considering the data would be stored in two forms(row and columnar) but as of today the pricing model is not yet finalized.

Benefits and use cases of Hybrid tables:

Let us now go through some of the benefits and use cases of the “Hybrid tables” :

Benefits & Use cases of Hybrid tables

More reads: https://www.snowflake.com/blog/introducing-unistore/

ICEBERG TABLES

Iceberg tables are the brand new tables which has been made available as public preview for all accounts only recently. Now, these tables are powered by “Apache iceberg open table formats” and the idea is to store the data & metadata all within customer managed storage and being able to use the standard Snowflake features like execute the DMLs, encryption. These tables has the ability to unlock a lot of “data lake” use cases.

Iceberg tables for Snowflake integrate your own external cloud storage with the query semantics and performance of standard Snowflake tables. They are perfect for data lakes that already exist and that you can’t or don’t want to store in Snowflake.

There are 2 important concepts to understand over here namely “Iceberg Catalog” && “External Volume

What is an Iceberg Catalog ?

The Iceberg table specification’s first architectural layer is the Iceberg catalog. It is the compute engine which can manage and load Iceberg tables. This supports:

  • Preserving the pointer to the current metadata for one or more Iceberg tables.
  • Updating a table’s current metadata pointer by carrying out atomic operations

Snowflake currently supports 2 Iceberg catalog options namely :

  1. Snowflake managed Iceberg Catalog.
  2. Externally managed Iceberg Catalog.(via catalog integrations).

The variations between these catalog options are summarized in the table that follows.

Comparison → Courtesy Snowflake

What is an External Volume ?

The external cloud storage identity and access management (IAM) entity is stored in an external volume, which is a named, account-level Snowflake object. In order to access table data, Iceberg metadata, and manifest files containing the table schema, partitions, and other metadata, Snowflake safely connects to your cloud storage via an external volume.

Iceberg table creation.

We need to designate an external volume and a base location (directory on the external volume) where Snowflake can write table data and metadata in order to create an Iceberg table with Snowflake acting as the catalog.

CREATE ICEBERG TABLE myTable
CATALOG='SNOWFLAKE'
EXTERNAL_VOLUME='myIcebergVolume'
BASE_LOCATION='relative/path/from/extvol/location/';

External volume configuration steps can be seen here: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume#label-tables-iceberg-configure-external-volume-s3

Benefits/Key points/Use cases of Iceberg tables:

Let us now go through some of the benefits and use cases of the “Iceberg tables” :

More reads:

https://www.snowflake.com/blog/build-open-data-lakehouse-iceberg-tables/

https://docs.snowflake.com/en/user-guide/tables-iceberg

PERMANENT, TEMPORARY & TRANSIENT TABLES

Lastly, let us throw some insights around most widely used tables i.e., Permanent, Temporary & Transient tables. These tables have been there since years now and hence is used by almost every other individual who has been associated with Snowflake.

What is Permanent table?

The typical, everyday database tables are the “Permanent Tables”. Snowflake’s default table type is permanent, and making one doesn’t require any extra syntax during creation. The information kept in permanent tables takes up space and gets added to the storage fees that Snowflake charges you.

In addition, it has extra features like Fail-Safe and Time-Travel that aid in data availability and recovery.

--The syntax for creating the permanent table is given as below:
create table student (id number, name varchar(100));

What is Transient table?

With the exception of having a very short Time-Travel period and no Fail-safe period, Transient tables in Snowflake are comparable to permanent tables. These work best in situations where the information in your table is not urgent and can be retrieved through other channels if necessary.

Like permanent tables, transient tables add to the overall storage costs associated with your account. Nevertheless, there are no fail-safe costs (i.e., the costs related to maintaining the data required for fail-safe disaster recovery) since transient tables do not use fail-safe.

create transient table student (id number, name varchar(100));

What is Temporary table?

In Snowflake, temporary tables are only available during the duration of the session in which they were created. Other users or sessions cannot see them. The data in the table is fully deleted and irretrievably lost when the session ends.

Temporary tables, like transient tables, have a very short Time-Travel period and no Fail-safe period. These work best for holding temporary, non-permanent data that is only needed for the duration of the creation session.

create temporary table student (id number, name varchar(100));

The Comparison

The differences between the three table types are outlined in the table below, with special attention to how they affect fail-safe and time travel:

The comparison of the Permanent/Temporary & Transient table

Benefits and use cases of Permanent/Temporary/Transient tables:

Let us now go through some of the benefits and use cases of the “external tables” :

SUMMARY:

There are a number of Snowflake tables and this blog is written to give insights around each one of them and what each table does. Using the right kind of table for each of the scenarios we handle shall bring the best out of this platform.

Please keep reading my blogs it is only going to encourage me in posting more such content. You can find me on LinkedIn by clicking here and on Medium here. Happy Learning :)

Awarded as “Data Superhero by Snowflake for year 2023”, click here for more details.

Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.

--

--

Somen Swain
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Snowflake Data Superhero 2024 & 2023 | 4XSnowpro Certified | AWS Solution Architect Associate | Cloud Computing| Principal-Data Engineering at LTIMindtree