Handling Deadlocks in Snowflake

Jalindar Karande
Hashmap, an NTT DATA Company
5 min readJun 17, 2021

Recently one of our Snowflake clients came to us for some help. They were facing a unique problem with some of their transformation jobs running in Snowflake. These jobs had been written a year back and had worked perfectly since that time. Suddenly, those jobs started running 100 times slower or even timed out on a random day. Manual re-execution of failed (timed out) jobs usually works fine. This behavior was not reproducible and no code changes had occurred recently. Increasing the size of the warehouse did not solve the problem and only increased the cost. This random mystery behavior started creating an impact on critical analytics that were being delivered.

We did a quick analysis of jobs, warehouses, database objects and schedules. What we found surprised us — there was a deadlock. Figure 1 shows Case 1 indicating a typical day and Case 2 represents the random day on which a deadlock occurred.

Figure 1. Typical cases in concurrent transactions on shared database objects in Snowflake

Case 1

  • Transaction T1 and Transaction T2, part of Job 1 and Job 2, both need access to Table 1 and Table 2 for DML operations.
  • In Case 1, Transaction T1 completes its execution just before Transaction T2 starts. There is no concurrent execution, and both complete the execution normally.

Case 2

  • On a random day, represented in Case 2, Transaction T1 takes a few more minutes for execution, which is normal.
  • Transaction T2 starts execution on its scheduled time, i.e., before T1 committed. Now, T1 and T2 are executing concurrently.
  • T1 needs a lock on Table 2 to complete its execution, but T2 already holds the lock on that table.
  • Further, T2 needs a lock on Table 1 to complete, which is not yet released by T1. This situation results in a deadlock.
  • In the example scenario presented in case 2, Transaction T1 timeout and releases locks.
  • Transaction T2 acquires lock after T1 timed out and completes its execution. This results in one job failure and another job taking too long, i.e., Transaction T2 takes more than 12 hours than the usual 5 minutes in the example scenario.

This analysis resulted in asking more questions about transactions in Snowflake from our client. We thought these are common questions many Snowflake architects are being asked to answer, so we put them in this blog post:

When does a deadlock occur Snowflake?

Deadlocks occur when concurrent transactions are waiting on resources that are locked by each other.

Which statement needs a lock?

The general rule of thumb for concurrent transactions running on Snowflake about locks is:

1. Select query does not need a lock

2. Copy into and INSERT does not require a lock

3. DELETE and UPDATE do need to acquire a lock

4. Merge statements involving update and delete statements do need to acquire the lock

Is it safe to execute select queries against the table still locked by other transactions/statements?

Yes, It is perfectly fine to execute select statements against the table locked by other transactions or DML statements. “READ COMMITTED” is the only isolation level currently supported for tables in Snowflake. With the “READ COMMITTED” isolation level, a statement reads only data that was committed before the statement began. Select queries never see uncommitted/dirty data. Moreover, they never wait for locks.

How can I check if some statements are blocked/waiting for locks?

Any user can see locks acquired/waiting for statements from her session using the following command:

show locks;

But this information is not enough to identify deadlocks as locks are global, i.e., across users and warehouses. You need ACCOUNTADMIN role to view all locks, transactions, and session using the following command:

show locks in account;

This command displays all locked tables, transaction ID, transaction start date, status(HOLDING/WAITING), lock acquired on(if the status is holding), and query id. This information is helpful to abort some transactions to resolve the deadlock manually.

How long does the transaction wait before it aborts?

The number of seconds to wait while trying to lock a resource before timing out and aborting the statement is set through the LOCK_TIMEOUT parameter. You can view this parameter using the following command:

show parameters like ‘lock_timeout’;

The values may range from “0” to any number, with the default value of “43200” (i.e., 12 hours). The value “0” indicates that the statement must acquire the lock immediately or abort.

You can set the LOCK_TIMEOUT using the following command:

alter session set lock_timeout=<new value in seconds>;

Recommendations to avoid/reduce the impact of deadlock

  • Avoid concurrent DMLs that involve delete or update against the same table
  • Avoid overly large transactions
  • Avoid grouping unrelated statements in transactions
  • Always set TRANSACTION_ABORT_ON_ERROR to true
  • Always keep AUTOCOMMIT = TRUE
  • Always Set LOCK_TIMEOUT value (default of 12 hours is too large in many cases)
  • Always Set STATEMENT_TIMEOUT_IN_SECONDS (default of 2 days is too large in many cases)

Closing Thoughts

The deadlock situation is rare, but it has a significant impact on the cost and accuracy of data analytics. This article elaborates on cases in which a deadlock can occur in Snowflake and provides information about monitoring Snowflake for deadlock possibilities and recommendations on the design that minimizes deadlock and its impact. I believe that these recommendations will help when architecting Snowflake for optimal results.

Need Help with Data Integration Into Snowflake?

Are you looking to gain a better understanding of what approaches, solutions, and tools are available in the data integration space and how to best address your specific integration requirements?

Hashmap’s Data Integration Workshop is an interactive, two-hour experience for you and your team where we will provide you with a high-value, vendor-neutral sounding board to help you accelerate your data integration decision-making process, and selection. Based on our experience, we’ll talk through best-fit options for both on-premise and cloud-based data sources and approaches to address a wide range of requirements. Sign up today for our complimentary workshop.

Other Tools and Content For You

Jalindar Karande is a Lead Data/Cloud Engineer with Hashmap, an NTT DATA Company providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

Be sure and connect with Jalindar on LinkedIn and reach out for more perspectives and insight into accelerating your data-driven business outcomes.

--

--