Snowflake: Observability on DDL Transactions (Part 2)— Databases, Schemas, Tables, Views and Columns Affected

Path to getting Observability Insights in your Snowflake Data Platform

Nowadays, the different data architecture platforms require high needs about what is happening in their systems. Who is doing what, how and when? And finally we need to monitor with a comfortable dashboards that provide valuable metrics, KPIs, and overall the capability to provide all this information in real time. Observability play into the game, providing these and more features to our professionals in order to take the right actions to improve, correct and design the systems and their data.

Snowflake as a platform offers a wide of features about Observability. Between them we can mention:

Snowsight Dashboard:

  • Managing and monitoring Costs and Performance
  • Trackability about transactions

Data Objects: That provide valuable information about querys, logins, costs, timings, …

As a part of this Observability Article Series, now we will see how to achieve through these Data Objects information to check all the DDL transactions done on each tables on our Snowflake Database tables. In that way, we will want to list in a structured way, all the tables and their columns affected by each query, their type action, the rows inserted (if applies) and tracking the timestamp of the transaction and who did it what? For DML Observability Transactions series, you can check this previous one.

Hands ON

I will start checking the different objects that we on Snowflake, these ones are located on the SNOWFLAKE.ACCOUNT_USAGE schema.

QUERY_HISTORY data object

Here we find the QUERY_HISTORY view, that provides useful information about all the queries executed on the platform.

An example of this output view could be the below (note that I will only show the relevant information we will need on that article):

Previewing QUERY_HISTORY view

QUERY_ID: Identification ID of each query

QUERY_TEXT: The SQL query itself.

START_TIME / END_TIME: Starting and ending time of the query.

QUERY_TYPE: It can contain execution of great variety of different type queries: procedures, DML, DDL, Select data information, for details refer to the documentation.

USER_NAME: The user who executes the query.

ROWS_INSERTED: The number of rows affected if applies for the Creation tables with Select As option DDL queries.

You can see that this prior view, it doesn’t contain the table (in a structural way) that is affected by the query. This we will find in the next VIEW.

ACCESS_HISTORY data object

Here we find the ACCESS_HISTORY view, that provides useful information about the relationship between Queries and Objects modified on the platform.

An example of this output view could be the below (note again that I will only show the relevant information we will need on that article):

Previewing ACCESS_HISTORY view

QUERY_ID: Identification ID of each query (same above)

OBJECTS_MODIFIED_BY_DDL: The list of objects modified in the DDL query (in a detailed way): databases, schemas, table, views, columns, procedures, udfs, etc. As you can imagine, this data is stored in JSON semi-structured way.

It’s time to query:

  1. Join both queries
  2. Filter the type of the queries that interest us BY QUERY_TYPE. In that case the below:
  • ‘DROP’: It will drop a Database, Schema, Table or View
  • ‘CREATE’: It will create a Database or Schema
  • ‘CREATE_VIEW’: It will create a View, but it will affect to Column as well
  • ‘CREATE_TABLE_AS_SELECT’,’CREATE_TABLE’: It will create a Table, but it will affect to Column as well
  • ‘ALTER_TABLE_ADD_COLUMN’,’ALTER_TABLE_DROP_COLUMN’: — TABLE-COLUMN: It will Add or Drop a Column in a Table, so it will affect both ones.
  • ‘RENAME_DATABASE’,’RENAME_SCHEMA’,’RENAME_TABLE’,’RENAME_VIEW’,’RENAME_COLUMN’: It will rename all Objects; Database, Schema, Table, View or Column.
Joining ACCESS_HISTORY vs QUERY_HISTORY

Now, we have these information, the next step it will be extract the information of the table in a structural way. So in this case this information is on the field OBJECTS_MODIFIED_BY_DDL (as OBJECTS_MODIFIED). The use of the FLATTEN functionality will allow us to extract that information.

Our case we will use: LATERAL FLATTEN( OBJECTS_MODIFIED)

We will only have to filter the object itself, nor the details in JSON. In next steps, we would extract all these information retrive. But here it is important this filter: F1.KEY::VARCHAR = 'objectName'

Now, we are going to finally code our query and retrieve the list of objects affected by a DDL (and the details JSON format — important for next steps), rows affected, auditing information, and DDL executed:

Retrieving details from queries executed and objects affected all together

Once we have these information, it’s important to take into consideration that there are different DDL that stores the information in JSON format in different way. It will depend of the QUERY_TYPE that determines the DDL query command. So the different actions we will be the below ones: List of columns affected, Renaming Objects (the rest ones: Columns, Renaming Database / Schemas / Tables / Schemas).

List of columns affected

In this query, we will find for all the columns affected by any DDL query ID transactions. In addition, we retrieve the nested subOperation (in JSON format) due to in the case that DDL is renaming columns, it will be needed in the next step.

Retrieving COLUMNs affected by DDL

Renaming Columns

In this query, we will find all the columns renamed for DDL ALTER query_ID transactions. You have to pay attention, how this data is stored, so you can extract it in same way with FLATTEN function. Finally, we have the COLUMN_NAME as new column, and OBJECT_NAME_FULL_OLD as old column name.

Retrieving the renamed columns, new and old name by Query_ID

Renaming Objects

In this query, we will find all the rest of objects that are Database, Schemas, Tables and Views renamed for DDL ALTER query_ID transactions. Again, you have to pay attention, how this data is stored, so you can extract it in same way with FLATTEN function. Finally, we have the COLUMN_NAME as new column, and OBJECT_NAME_FULL_OLD as old column name.

Joining the differents objects DDL affected

In this point, we have to join the previous information by QUERY_ID. In some cases, we have to use the current column as well to reference with the old name column. We are going to check it out how the query would be, in this way so easy:

Query Resulset to get all objects affected by DDL transactions queries

Finally, we are interested in getting some information about the different objects themselves: Database, Schemas, Tables & Views. This information it will be extracted from the different ACCOUNT_USAGE metadata views:

Tables & Views— ACCOUNT_USAGE.TABLES

TABLE_OWNER: Owner Role of the table

TABLE_TYPE: Table type

ROW_COUNT: Number of rows prior dropping (aliased on next image as ROW_COUNT_BEFORE_DROP)

IS_TRANSIENT: Indicates if the object is transient

Databases —ACCOUNT_USAGE.DATABASES

DATABASE_OWNER: Owner Role of the Database

IS_TRANSIENT: Indicates if the object is transient

Schemas —ACCOUNT_USAGE.SCHEMATA

SCHEMA_OWNER: Owner Role of the Schema

IS_TRANSIENT: Indicates if the object is transient

We have to take care when we will join these views between them. These ones will give us the different state of a each object when it has been created, drop or last updated. So we rule all the last queries in one CTE named TDDL (as previous we have seen), and we will join with these last new Data Objects Account Usage views, taking into consideration the interval time about life cycle object life mentioned. This will be very easy, come with me and check this join:

SELECT  
TDDL.QUERY_ID, TDDL.QUERY_TEXT, TDDL.QUERY_TYPE, TDDL.TYPE_OF_OBJECT_AFFECTED, TDDL.OBJECT_NAME_FULL, TDDL.OBJECT_NAME_FULL_OLD,
TDDL.TABLE_CATALOG, TDDL.TABLE_SCHEMA, TDDL.TABLE_NAME, TDDL.COLUMN_NAME,
COALESCE(TB.TABLE_OWNER,SC.SCHEMA_OWNER,DB.DATABASE_OWNER) OBJECT_OWNER,
COALESCE(TB.IS_TRANSIENT,SC.IS_TRANSIENT,DB.IS_TRANSIENT ) IS_TRANSIENT,
TABLE_TYPE,
TB.ROW_COUNT ROW_COUNT_BEFORE_DROP,
TDDL.ROWS_INSERTED ROWS_INSERTED_DDL,TDDL.LAST_DDL_BY,TDDL.LAST_DDL
FROM OBJECTS_DDL_ALTERED TDDL
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."DATABASES" DB
ON DB.DATABASE_NAME = TDDL.TABLE_CATALOG
AND TDDL.LAST_DDL BETWEEN DB.CREATED AND COALESCE(DB.DELETED ,CURRENT_TIMESTAMP())
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."SCHEMATA" SC
ON SC.CATALOG_NAME = TDDL.TABLE_CATALOG
AND SC.SCHEMA_NAME = TDDL.TABLE_SCHEMA
AND TDDL.LAST_DDL BETWEEN SC.CREATED AND COALESCE(SC.DELETED ,CURRENT_TIMESTAMP())
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."TABLES" TB
ON TB.TABLE_CATALOG = TDDL.TABLE_CATALOG
AND TB.TABLE_SCHEMA = TDDL.TABLE_SCHEMA
AND TB.TABLE_NAME = TDDL.TABLE_NAME
AND TDDL.LAST_DDL BETWEEN TB.CREATED AND COALESCE(TB.DELETED ,CURRENT_TIMESTAMP())

At the end of our process, we can get our Monitoring table with the below summarized example and structure:

So we have finished! We only have to create a view in order to dispose of this new Custom Data observability Object for our needs. Here attach the desired full code:

CREATE OR REPLACE VIEW OBSERVABILITY_DB.MY_SCHEMA.DIM_DDL_AFFECTED_OBJECTS AS
WITH T AS (
SELECT AH.QUERY_ID,QH.QUERY_TEXT,QH.START_TIME,QH.END_TIME,AH.OBJECT_MODIFIED_BY_DDL OBJECTS_MODIFIED,QH.QUERY_TYPE,QH.USER_NAME LAST_DDL_BY,
QH.ROWS_INSERTED
FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY AH
INNER JOIN SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY QH
ON AH.QUERY_ID =QH.QUERY_ID
WHERE QH.QUERY_TYPE IN (
'DROP','CREATE','CREATE_VIEW','CREATE_TABLE_AS_SELECT','CREATE_TABLE','ALTER_TABLE_ADD_COLUMN','ALTER_TABLE_DROP_COLUMN',
'RENAME_DATABASE','RENAME_SCHEMA','RENAME_TABLE','RENAME_VIEW','RENAME_COLUMN'
)
),
TABLES_DDL AS (
SELECT DISTINCT
T.OBJECTS_MODIFIED,
T.QUERY_ID,T.QUERY_TEXT,
T.QUERY_TYPE,
UPPER(F1.THIS:"objectDomain"::VARCHAR) TYPE_OF_OBJECT_AFFECTED,
F1.THIS:"objectName"::VARCHAR OBJECT_NAME_FULL,
SPLIT_PART(OBJECT_NAME_FULL,'.',1) TABLE_CATALOG,
SPLIT_PART(OBJECT_NAME_FULL,'.',2) TABLE_SCHEMA,
SPLIT_PART(OBJECT_NAME_FULL,'.',3) TABLE_NAME,
END_TIME LAST_DDL,
LAST_DDL_BY,ROWS_INSERTED
FROM T
,LATERAL FLATTEN(T.OBJECTS_MODIFIED) F1
WHERE F1.KEY::VARCHAR = 'objectName'
), COLUMNS_DDL AS (
SELECT DDL.QUERY_ID,col.KEY COLUMN_NAME,col.THIS LOOKING_FOR_COLUMN_RENAMED
FROM TABLES_DDL DDL
,LATERAL FLATTEN(DDL.OBJECTS_MODIFIED) F1
,LATERAL FLATTEN(INPUT=> F1.VALUE:"columns") col
), RENAME_COLUMN_DDL AS (
SELECT DISTINCT DDL.QUERY_ID,DDL.COLUMN_NAME ,col.VALUE::VARCHAR OBJECT_NAME_FULL_OLD
FROM COLUMNS_DDL DDL
,LATERAL FLATTEN(INPUT=> DDL.LOOKING_FOR_COLUMN_RENAMED) F1
,LATERAL FLATTEN(INPUT=> F1.VALUE:"objectName") col
), RENAME_DDL AS (
SELECT DISTINCT DDL.QUERY_ID,col.VALUE::VARCHAR OBJECT_NAME_FULL_OLD
FROM TABLES_DDL DDL
,LATERAL FLATTEN(DDL.OBJECTS_MODIFIED) F1
,LATERAL FLATTEN(INPUT=> F1.VALUE:"objectName") col
), OBJECTS_DDL_ALTERED AS (
SELECT T.* EXCLUDE OBJECTS_MODIFIED, -- We NO need MORE the OBJECTS_MODIFIED json information details
C.COLUMN_NAME,
COALESCE(R.OBJECT_NAME_FULL_OLD,RC.OBJECT_NAME_FULL_OLD) OBJECT_NAME_FULL_OLD -- Here we are going to check if applies on renamed old object COLUMN or REST ONES
FROM TABLES_DDL T
LEFT JOIN COLUMNS_DDL C ON T.QUERY_ID=C.QUERY_ID
LEFT JOIN RENAME_DDL R ON T.QUERY_ID=R.QUERY_ID
LEFT JOIN RENAME_COLUMN_DDL RC ON T.QUERY_ID=RC.QUERY_ID AND C.COLUMN_NAME=RC.COLUMN_NAME
)
SELECT
TDDL.QUERY_ID, TDDL.QUERY_TEXT, TDDL.QUERY_TYPE, TDDL.TYPE_OF_OBJECT_AFFECTED, TDDL.OBJECT_NAME_FULL, TDDL.OBJECT_NAME_FULL_OLD,
TDDL.TABLE_CATALOG, TDDL.TABLE_SCHEMA, TDDL.TABLE_NAME, TDDL.COLUMN_NAME,
COALESCE(TB.TABLE_OWNER,SC.SCHEMA_OWNER,DB.DATABASE_OWNER) OBJECT_OWNER,
COALESCE(TB.IS_TRANSIENT,SC.IS_TRANSIENT,DB.IS_TRANSIENT ) IS_TRANSIENT,
TABLE_TYPE,
TB.ROW_COUNT ROW_COUNT_BEFORE_DROP,
TDDL.ROWS_INSERTED ROWS_INSERTED_DDL,TDDL.LAST_DDL_BY,TDDL.LAST_DDL
FROM OBJECTS_DDL_ALTERED TDDL
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."DATABASES" DB
ON DB.DATABASE_NAME = TDDL.TABLE_CATALOG
AND TDDL.LAST_DDL BETWEEN DB.CREATED AND COALESCE(DB.DELETED ,CURRENT_TIMESTAMP())
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."SCHEMATA" SC
ON SC.CATALOG_NAME = TDDL.TABLE_CATALOG
AND SC.SCHEMA_NAME = TDDL.TABLE_SCHEMA
AND TDDL.LAST_DDL BETWEEN SC.CREATED AND COALESCE(SC.DELETED ,CURRENT_TIMESTAMP())
LEFT JOIN SNOWFLAKE.ACCOUNT_USAGE."TABLES" TB
ON TB.TABLE_CATALOG = TDDL.TABLE_CATALOG
AND TB.TABLE_SCHEMA = TDDL.TABLE_SCHEMA
AND TB.TABLE_NAME = TDDL.TABLE_NAME
AND TDDL.LAST_DDL BETWEEN TB.CREATED AND COALESCE(TB.DELETED ,CURRENT_TIMESTAMP())
ORDER BY LAST_DDL DESC;

It’s time to rock!!

Now we are going to check what is happening in our dataplatform. What DDL are affecting our objects in our Snowflake platform along the time. We will see how many queries, database, schemas, tables or views and columns associated are being affected?

SELECT 
TO_DATE(LAST_DDL) DT_DDL
, TABLE_CATALOG
, QUERY_TYPE
, COUNT(DISTINCT OBJECT_NAME_FULL) NBR_OF_TABLES_VIEWS_AFFECTED
, COUNT(DISTINCT OBJECT_NAME_FULL||'.'||COLUMN_NAME) NBR_OF_COLUMNS_AFFECTED
, COUNT(1) NBR_OF_QUERIES
, SUM(ROWS_INSERTED_DDL) ROWS_INSERTED_DML
FROM OBSERVABILITY_DB.MY_SCHEMA.DIM_DDL_AFFECTED_OBJECTS
WHERE TYPE_OF_OBJECT_AFFECTED IN ('VIEW','TABLE')
GROUP BY ALL
ORDER BY 1 DESC,2,3;

An example what is happening in my account:

Other example, it would be check how many add or drop column are producing, and what table/view are being affected. We want to know When they are being altered? Yes! We will know it with our super Observability View! :)

SELECT 
LAST_DDL DT_DDL
, TABLE_CATALOG DB
, TABLE_SCHEMA SCHEMA
, QUERY_TYPE
, TYPE_OF_OBJECT_AFFECTED TYPE_OBJECT
, OBJECT_NAME_FULL
, COLUMN_NAME
, QUERY_TEXT
, LAST_DDL_BY
FROM OBSERVABILITY_DB.MY_SCHEMA.DIM_DDL_AFFECTED_OBJECTS
WHERE QUERY_TYPE IN ('ALTER_TABLE_DROP_COLUMN','ALTER_TABLE_ADD_COLUMN')
GROUP BY ALL
ORDER BY 1 DESC,2,3;

Now, it’s your time! You can practice, and extract more valuable metadata that we have used in our view! This metadata is waiting for you. Why are you waiting to discover it, these insights are there ready to be inspected!

Considerations

Take into consideration that we are using ACCOUNT_USAGE schema, this one has the special characteristic of high latency, from minutes to 1–2 hours, depending of the view you use. That can be changing on the time, so I recommend that check them out each view on Snowflake documentation, to take the last reference of latency times.

Conclusions

You have seen how Snowflake provides us a different Data Objects we can use in order to customize our Data Monitoring. We can improve that in multiple ways, for example adding new information on each step. In this scenario, it will depend on our current requirements.

An example, could be improving our observability:

Data Objects:

  • Adding new type actions of your query STATEMENTs. We have managed some specifically for some objects, but you could do this for monitoring when you alter comments, or you SET/UNSET tags, and so on.
  • Checking new objects, we are currently managing Databases, Schemas, Tables, Views and columns. But there are a lot of objects: Stages, File Format, Policies, and so on.

Dashboard Observability:

  • We could build our customized dashboards on our preferred BI tool, in order to monitor all the transaction in our Snowflake Data Platform. This could be done easily for example with Snowsight Snowflake UI.

Remember, we could also data load this information in a incremental way, in order to keep our own observability tables. This scenario will provide us the best approach to make a heavy queries on our metadata. For example, for ML and forecasting processes.

I have done this new article due to the people liked the previous one DML Observability Series. So, if this article has a good acceptance I will going on to deep on the Observability and Monitoring features that provide us Snowflake. Give me some feedback about what would be interesting for you. Thanks for reading.

About me: I am a Data Subject Matter on different Data Technologies, with more than 20 years experience on Data World Business. I am very passionate on Snowflake Technology, DataVault and the DataWorld itself. If you would like to be updated with my new articles, don’t forget to follow me on Linked-in here and Medium as well.

--

--

Cesar Segura
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

SME @ SDG Group || Snowflake Architect || Snowflake Squad Spotlight Member || CDVP Data Vault