Getting Started with Snowflake Tasks

Photo by Renáta-Adrienn on Unsplash

In this blog, we are going to discuss how Snowflake Task can help you. A task in Snowflake is more of a scheduler like any other database or OS. Some similar examples are database scheduler, CRON job, etc. In Snowflake, you can schedule a single SQL statement or a Stored Procedure.

What is a task?

A Snowflake task in simple terms is a scheduler that can help you to schedule a single SQL or a stored procedure.

A task can be very useful when combined with streams to make an end-to-end data pipeline. The Snowflake task engine has CRON and NON-CRON variant scheduling mechanisms. The CRON variant should look familiar syntactically if you are a Linux guy.

Snowflake ensures only one instance of a task with a schedule (i.e. a standalone task or the root task in a tree of tasks) is executed at a given time. If a task is still running when the next scheduled execution time occurs, then that scheduled time is skipped.

When you create the task, it always gets created in suspended mode. So that means you have to explicitly resume the task to get things moving. You can use the below command to resume the task.

ALTER TASK <TASK_NAME> RESUME;

To avoid unexpected task executions due to daylight saving time, either:

  • Do not schedule tasks to run at a specific time between 1 AM and 3 AM (daily, or on days of the week that include Sundays), or
  • Manually adjust the cron expression for tasks scheduled during those hours twice each year to compensate for the time change due to daylight saving time.

Does a role that has only ownership privilege on a task is enough to execute a task?

No, a role having only OWNERSHIP privilege on a task is not enough to execute a task. It should also have EXECUTE TASK privilege. When you create a task you should have the privilege to execute the task if you don’t have an execute privilege you can get one from ACCOUNTADMIN or any privileged user as below:

GRANT EXECUTE TASK ON ACCOUNT TO ROLE <ROLE_NAME>;

Let’s see the action in the demo below:

How to schedule Snowflake Task in Cron Job Style?

You can schedule a task in Cron Job using the below sample syntax which is scheduled to run at 08:05 PM UTC every day:

You can see below the meaning of each * if are not aware of Cron jobs.

# __________ minute (0-59)
# | ________ hour (0-23)
# | | ______ day of month (1-31, or L)
# | | | ____ month (1-12, JAN-DEC)
# | | | | _ day of week (0-6, SUN-SAT, or L)
# | | | | |
# | | | | |
* * * * *

What is the simple tree of tasks in Snowflake?

In Snowflake, you can create a B-Tree-like task structure. You can have only 1 root task and all child tasks are linked to the root/predecessor tasks based on task dependency(i.e. Before or after). Root tasks will have scheduler defined, and all child tasks follow sequential execution as per their dependency defined.

A simple tree of tasks is limited to a maximum of 1000 tasks total (including the root task). An individual task in the tree is limited to a single predecessor task; however, a task can have a maximum of 100 children tasks (i.e. other tasks that identify the task as a predecessor).

All tasks must be owned by the same task owner in a simple tree of tasks and all objects should also be stored in the same database and schema.

A quick demo to explain what we read above:

How we can call Stored Procedure using Task?

Let’s see the same in action

What is Task Overlapping?

In Snowflake, a particular tree of tasks is allowed to run one instance at a time. Meaning, say if TASK 1 is scheduled to run every 5 min, and it has 2 child tasks. If for any reason one of the child tasks didn’t complete on planned time and overlapped the next run(after the next 5 min) in that scenario Snowflake make sure that the scheduled next run should be skipped at least once. This behavior is default and can be overridden by making the below parameter true during the creation of a root task to overlap and start the next run even if the first root task has yet to be completed. This is not recommended, but if you know that your execution has no dependency/issue with the previous run. The below parameter can be set during the creation or alteration of the root task.

ALLOW_OVERLAPPING_EXECUTION = TRUE | FALSE(Default)

To better align a task tree with the schedule defined in the root task:

  1. If feasible, increase the scheduling time between runs of the root task.
  2. Consider increasing the size of the warehouse that runs large or complex SQL statements or stored procedures in the task tree.
  3. Analyze the SQL statements or stored procedure executed by each task. Determine if code could be rewritten to leverage parallel processing.

What is Task Versioning for a run?

When we execute a task, it creates a version for itself, including all properties for all tasks in the tree. Root task runs under the same version of code and setup unless it is completed.

If we try to modify any of the child task code which is yet to run, it will not allow unless the root task is suspended.

But if you try to modify the root task, it will allow you to do so but it will be suspended after modification, hence all future scheduled runs of the root task are canceled. To see the reflection of changes in a task, the root task has to be resumed again.

If the definition of a stored procedure called by a task changes while the tree of tasks are executing, the new programming could be executed when the stored procedure is called by the task in the current run.

Let’s see what it's actually meaning in the demo below

What is the link between Predecessor & Child Task and how it behaves when root tasks are dropped?

Say you have 3 tasks. Task1 is the root task and Task2 is the child task for Task 1 and similarly, Task 3 is the child task for TASK2.

Scenario 1: If we remove TASK1 as predecessor from Task 2 what behavior changes will happen to TASK2.

If the predecessor for a child task is removed, then the former child task becomes either a standalone task or a root task, depending on whether other tasks identify this former child task as their predecessor.

Scenario 2: If we drop TASK1 from the database, then what will be the behavior of the immediate child.

If a predecessor task is dropped, or if ownership of a predecessor task is transferred to another role, then all former child tasks that identified this task as the predecessor become either standalone tasks or root tasks, depending on whether other tasks identify these former child tasks as their predecessor.

Let’s see both scenarios in action:

Things to Remember:

  • A brief lag occurs after a parent task finishes running, and any child task is executed.
  • The current owner of all tasks that compose the tree of tasks is dropped (using DROP ROLE). Ownership of the objects owned by the dropped role is transferred to the role that executes the DROP ROLE command.
  • Ownership of all tasks that compose the tree of tasks is explicitly transferred to another role (e.g. by executing GRANT OWNERSHIP on all tasks in a schema).
  • If the predecessor for a child task is removed (using ALTER TASK … REMOVE AFTER), then the former child task becomes either a standalone task or a root task, depending on whether other tasks identify this former child task as their predecessor.
  • If a predecessor task is dropped (using DROP TASK), or if ownership of a predecessor task is transferred to another role (using GRANT OWNERSHIP), then all former child tasks that identified this task as the predecessor become either standalone tasks or root tasks, depending on whether other tasks identify these former child tasks as their predecessor.

Hope this blog & YouTube video helps you to get insight on Snowflake Tasks. If you are interested in learning more details about Snowflake Task, you can refer to snowflake documentation. Feel free to ask a question in the comment section if you have any doubts regarding this. Give a clap if you like the blog. Stay connected to see many more such cool stuff. Thanks for your support.

You Can Find Me:

Follow me on Medium: https://rajivgupta780184.medium.com/

Follow me on Twitter: https://twitter.com/RAJIVGUPTA780

Connect with me in LinkedIn: https://www.linkedin.com/in/rajiv-gupta-618b0228/

Subscribe to my YouTube Channel: https://www.youtube.com/c/RajivGuptaEverydayLearning

#Keep learning #Keep Sharing #Everyday Learning.

References:-

--

--