SQL UDTF and Stored Procedure for Confidence Interval in Snowflake
In my previous blog, we went over a simple implementation of confidence interval using SQL in Snowflake.
Even though the below code isn’t complicated, it’s annoying to execute this code every time you want to compute the confidence interval of last 2 years’ average temperature in New York City. Let’s simplify it!!!
select
round(avg(avg_temperature_air_2m_f), 3) average_temperature,
round(stddev(avg_temperature_air_2m_f), 3) std_temperature,
round(average_temperature + (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_upper,
round(average_temperature - (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_lower,
min(date_valid_std) first_day,
max(date_valid_std) last_day
from
history_day
where
postal_code like '10060'
;
SQL UDTF(User-Defined Table Function) and Stored Procedure will help us to execute the code easily.
First, UDTF can be done by putting the above code in $$. One thing is you should define a table with data type as below.
create database experiments;
use database experiments;
use schema public;
create or replace function confidence_interval_95()
returns table(average_temperature number(24, 3),
std_temperature float,
confidence_interval_upper float,
confidence_interval_lower float,
first_day date,
last_dayfloat date)
as
$$
select
round(avg(avg_temperature_air_2m_f), 3) average_temperature,
round(stddev(avg_temperature_air_2m_f), 3) std_temperature,
round(average_temperature + (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_upper,
round(average_temperature - (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_lower,
min(date_valid_std) first_day,
max(date_valid_std) last_day
from
GLOBAL_WEATHER__CLIMATE_DATA_FOR_BI.STANDARD_TILE.HISTORY_DAY
where
postal_code like '10060'
$$
;
You can see that the function is created.
Using SELECT command, the UDTF is executed as below.
select * from table(CONFIDENCE_INTERVAL_95())
;
The result is the same as my previous blog. So we don’t need to execute the SQL query to query confidence interval. Just simply use the UDTF with SELECT command.
SQL stored procedure isn’t much different than UDTF. However it has more structures than UDTF such as LANGUAGE, DECLARE, BEGIN and END. Don’t be afraid of it. As any Snowflake feature, it isn’t that difficult. Please see the example.
create or replace procedure confidence_interval_95_proc()
returns table(average_temperature number(24, 3),
std_temperature float,
confidence_interval_upper float,
confidence_interval_lower float,
first_day date,
last_dayfloat date)
language sql
as
$$
declare
res resultset default (select
round(avg(avg_temperature_air_2m_f), 3) average_temperature,
round(stddev(avg_temperature_air_2m_f), 3) std_temperature,
round(average_temperature + (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_upper,
round(average_temperature - (1.96 * std_temperature/sqrt(count(1))), 3) confidence_interval_lower,
min(date_valid_std) first_day,
max(date_valid_std) last_day
from
GLOBAL_WEATHER__CLIMATE_DATA_FOR_BI.STANDARD_TILE.HISTORY_DAY
where
postal_code like '10060');
begin
return table(res);
end;
$$
;
The stored procedure we want to create is registered as we expected. Main code between 2 $$’s is the same as the UDTF.
Unlike UDTF’s, CALL command is used to execute the procedure. And, the result is what we expected.
call CONFIDENCE_INTERVAL_95_PROC();
Can we simplify it more? YES!!! Take advantage of the UDTF that we created together. That is, replacing the main SQL code with the UDTF CONFIDENCE_INTERVAL_95. The only difference is res resultset default (select * from table(CONFIDENCE_INTERVAL_95()));
. Isn’t it easy?
create or replace procedure confidence_interval_95_proc()
returns table(average_temperature number(24, 3),
std_temperature float,
confidence_interval_upper float,
confidence_interval_lower float,
first_day date,
last_dayfloat date)
language sql
as
$$
declare
res resultset default (select * from table(CONFIDENCE_INTERVAL_95()));
begin
return table(res);
end;
$$
;
The result is the same. No problem at all!!!
In this blog, we take a closer look at how to create a UDTF and stored procedure from a SQL query. It takes more time to create a UDTF and Stored Procedure than to execute a SQL query. However, if you are going to use the SQL query frequently, I recommend creating a UDTF or stored procedure. In addition, UDTFs can be easily converted to stored procedures such as select * from table(CONFIDENCE_INTERVAL_95())
.
I hope this blog helps your snowflake and data science journey ^_______^