Secret Snowflake data modeling features you need to know about

Discover virtual table columns and creative usage of comments and metadata functions.

Image by Canva AI

Greetings from the underground, my fellow developers. I am broadcasting this bulletin to alert my comrades of the secret tools hidden beneath your fingertips that can be of utmost importance in your struggle to develop efficient and cost-effective data models in Snowflake.

Perhaps these features have been deemed “too incendiary” by Big (data) Brother, or maybe the Data Polit-Bureau has established their ties to foreign data platforms — I know not. What I can say is that it is everyone’s duty to study the official party documentation and community updates to familiarize yourself with data modeling features that may have escaped your attention until now.

Let’s begin this dossier with Virtual columns and learn how they blur the line between tables and views.

Virtual columns

Those who have worked with Oracle might notice parallels to Snowflake syntax and functionality in things like Time Travel and the Connect By function. This is no great mystery, as Snowflake co-founders and architects Thierry Cruanes and Benoit Dageville cut their teeth in the Oracle world. But what other features can we uncover if we experiment?

Those who are familiar with Oracle will have heard of virtual table columns. Virtual columns allow a table to straddle the line between a physical table and a view. Similar to a view, a virtual column is defined through an expression (e.g., AS) and can incorporate business logic that can be calculated on the fly instead of stored physically. When might you want to use a virtual column?

Suppose a customer table contained area code and phone number as separate integer columns — great for compression, not so good for readability. We could concatenate these columns into a neatly formatted readable phone column with added parenthesis and dashes. However, that would result in duplicate data and be painful to reload if the format changed.

Virtual columns would be a good way to solve this problem, but they are not supported in Snowflake — or are they? While you won’t find any mention of virtual columns in the Snowflake documentation, you can go right ahead and declare them to see what happens:

As I wrote in Data Modeling with Snowflake, the closest analog to virtual columns found in Snowflake’s documentation is the DEFAULT column property. A default is similar to a virtual column, except it stores data physically and only kicks in when a NULL is encountered. Virtual columns, on the other hand, never store data and, therefore, only accept NULLs on DML operations.

Defaults are specified on a column through the DEFAULT keyword, while virtual columns use AS (just like expressions). Now that we’ve mastered virtual columns and understood their potential let’s explore techniques for adding logical details to physical models.

Relationship names and cardinality

Conceptual and logical models can convey relevant nuances that a physical database can’t express. Concepts such as subtypes and supertypes, cardinality (e.g., one-to-many, zero-to-one), and relationship names are unfortunately lost when transforming a logical model for deployment on a physical database.

The following conceptual diagram can tell us much about the relationships between the entities and how to query their contents.

TPCH dataset visualized in SqlDBM

For example, the line between LOCATION and CUSTOMER tells us that a customer is always (i.e., NOT NULL) assigned to a location, but a location can exist even though no customers are based there (i.e., one-to-many optional — crow’s foot symbol). Unfortunately, neither relationship names nor cardinality definitions can be described in a physical model — unless we get creative.

Reviewing the constraint properties in Snowflake’s documentation, we can work with what we’re given to achieve our goal. In my book, I introduced a technique of preserving logical relationship names through descriptive foreign key naming conventions. Including the names of the parent and child tables and their relationship in the FK name not only makes the FK easier to work with (in case of multiple) but also reintroduces the missing conceptual nuances.

The convention I suggest for foreign key constraints is FK_<child_table>_<relationship_name>_<parent_table>. In the given example, that would mean setting the FK name as FK_CUSTOMER_BASED_IN_LOCATION. But, unless we want to clutter the FK name beyond readability, we need to look for other options for declaring cardinality.

While the CONSTRAINT documentation doesn’t mention comments, they are alluded to in other pages. It turns out comments can be set on constraints in the same manner as they are on columns.

Using comments, we can add fully descriptive, bidirectional relationship cardinality details and inform the intention of how the data in these two entities is expected to behave. By studying the documentation and applying a little creativity, we generate the following code.

CREATE OR REPLACE TABLE customer
(
customer_id number(38,0) NOT NULL,
name varchar NOT NULL,
address varchar NOT NULL,
phone varchar(15) NOT NULL,
account_balance_usd number(12,2) NOT NULL,
market_segment varchar(10) NOT NULL,
location_id number(38,0) NOT NULL,
comment varchar COMMENT 'yo dawg, I heard you like comments...',

CONSTRAINT pk_customer PRIMARY KEY ( customer_id ),
CONSTRAINT FK_CUSTOMER_BASED_IN_LOCATION
FOREIGN KEY ( location_id ) REFERENCES location ( location_id ) RELY
COMMENT 'A customer is based in one and only one location. A location can have zero or many customers.'
)
COMMENT = 'Registered cusotmers'
;

Even without an ERD, users can review the physical definition and glean much of the intended logical details. Not bad for some naming conventions and liberal use of comments. Now that we have learned a couple of tricks for creating tables let’s have a look at the tools available for reviewing their details.

One DESCRIBE to rule them all

The Snowflake Cloud Services layer manages the metadata for all account and database objects. Thanks to the services layer, metadata operations such as getting object definitions (i.e., DDL), creating users, and managing roles and permissions can be performed without an active warehouse. The services layer even handles some queries without requiring a warehouse (and consuming credits), such as table statistics (e.g., max, avg, count), and retrieves previously cached results. Therefore, to help save time and money, it helps to familiarize oneself with the features and functions the services layer handles.

To get a peek at a table’s structure, users often run a SELECT * query accompanied by either a LIMIT or SAMPLE keyword. But, if it’s structure we’re interested in, interrogating the metadata will give us much more information and won’t consume credits (yes, even tricks to return no records like WHERE 1=2 require a running warehouse). For this, Snowflake offers the DESCRIBE <object>set of functions — used to display the details of any object, from query operations to account to database objects.

There are over 30 DESCRIBE functions, and more are added every time new objects (e.g., dynamic tables and masking policies) are introduced. That’s certainly great when it comes to completeness, but it can be daunting to keep track of. The documentation hits at a measure of overlap between similar DESCRIBE functions. For example, DESCRIBE VIEW is explained as “Describes the columns in a view (or table).” However, the documentation doesn’t go as far as to say “DESCRIBE TABLE” is the only one you’ll ever need.

Using DESCRIBE TABLE(or DESC TABLE, as the cool kids write it) works for any table-like object including: tables (internal and external), views (regular and materialized), dynamic tables, and event tables. Not only are these functions interchangeable (e.g., DESC MATERIALIZED VIEWwill work for a dynamic table and vice versa), but they also return the same columns. Allowing us to perform the following demonstration:

DESC functions return the same columns and can be used interchangeably

So, if you’re ever uncertain about the nature of the data source you’re describing or want fewer functions to memorize, DESC TABLE should be your go-to.

Conclusion

Snowflake’s technical documentation is among the clearest and most comprehensive of any tool I’ve come across. Invariably, some details and edge cases may escape the coverage of official documentation. This is why we must not forget that documentation is only one part of the Community of resources that Snowflake provides.

There is a wealth of knowledge and in-depth discussion happening on the Forums and links to dedicated StackOverflow and Reddit pages. There are User Groups for specialized topics and Education & Training resources (free and paid) to help you discover the techniques and best practices for achieving anything you set out to create with Snowflake.

Remember to check in regularly and stay on top of the latest release news and any “secrets” that the platform has to offer, and subscribe for more informative samizdat.

--

--

Serge Gershkovich
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

I am the author of "Data Modeling with Snowflake" and Product Success Lead at SqlDBM. I write about data modeling and cloud cost optimization.