SQL & Snowflake- New capabilities

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readApr 26, 2024

SQL has been one of the most powerful language used a across almost all the teams. And this language & its options to use has evolved over period of time. Snowflake is also ensuring that it makes the user experience while using SQL seamless by continuously improving and adding some new capabilities on how this can be used to enable the data workloads & also to improve the productivity.

In this blog, I would try to capture few of the important and latest announcements on SQL capabilities on Snowflake.

“SELECT *” with “EXCLUDE”
“SELECT *” with “RENAME”
REPLACE
MIN_BY & MAX_BY aggregate functions.
GROUP BY ALL
ILIKE
IF NOT EXISTS, IF EXISTS
ARRAY_SORT, ARRAY_MIN, ARRAY_MAX.
BANKER’s ROUNDING
“SELECT *” alternative “TABLE”

1. SELECT * with “EXCLUDE”:

When creating SQL, it’s usual practice to pick every column in a table except for a “select handful”. Naming each column in the SELECT clause is a typical workaround to accomplish this, but it can be time-consuming when there are a lot of columns. This is exactly where “EXCLUDE” keyword along-with SELECT can help :

---The below query gives all the columns--
SELECT * FROM demo_db.demo_schema.customer_01 LIMIT 10;

--The below query gives all columns except "C_CURRENT_CDEMO_SK" --
SELECT * EXCLUDE C_CURRENT_CDEMO_SK FROM demo_db.demo_schema.customer_01 LIMIT 10;

--The below query gives all columns except "C_CURRENT_CDEMO_SK" & "C_CURRENT_ADDR_SK" -- 
SELECT * EXCLUDE (C_CURRENT_CDEMO_SK,C_CURRENT_ADDR_SK)  FROM demo_db.demo_schema.customer_01 LIMIT 10;


Hence the last 2 are the examples of how EXCLUDE can be used in day to day 
usage of SQL constructs.

2. SELECT * with “RENAME”:

We can also rename individual columns in our SELECT statement using Snowflake. Consider it the older sibling of the previously discussed exclude feature; it operates similarly and the example is given as below:


--Renaming one column in the select statements.
SELECT * RENAME C_CUSTOMER_ID AS CUST_ID FROM demo_db.demo_schema.customer_01 limit 10;

--Renaming more than one columns in select statements.
SELECT * RENAME (C_CUSTOMER_ID AS CUST_ID, C_CURRENT_CDEMO_SK AS CDEMO_SK) FROM demo_db.demo_schema.customer_01 limit 10;


--Combining the RENAME & EXCLUDE keywords:
SELECT * EXCLUDE C_CUSTOMER_ID RENAME (C_CURRENT_CDEMO_SK AS CDEMO_SK) FROM demo_db.demo_schema.customer_01 limit 10;

3. REPLACE:

Another useful SQL feature in our Snowflake is called REPLACE, that allows us to easily change the values of certain columns within our statement. This is useful when we want to specifically adjust the values of one or more columns. Examples are given as below:

--Below query changes the values of C_FIRST_NAME to upper cases.
SELECT * REPLACE (UPPER(C_FIRST_NAME) AS C_FIRST_NAME) FROM demo_db.demo_schema.customer_01 LIMIT 10;

--Adjusting multiple column values using REPLACE keyword.
SELECT * REPLACE (UPPER(C_FIRST_NAME) AS C_FIRST_NAME, UPPER(C_LAST_NAME) AS C_LAST_NAME ) FROM demo_db.demo_schema.customer_01 LIMIT 10;

4. MIN_BY & MAX_BY:

These are the functions which allows to find the rows containing the minimum or maximum value of the column and returns the value of another column in that row. With this queries like SELF-JOINs or even SUB-Query can be avoided. Let us see the example, and please note over here we would see the comparison of traditional & modern way of writing the query.

If above is my data set, then if I am supposed to extract against each ORDER_ID what would be my latest status of the order, then how the query would look like traditionally & with modern method. We would see how simpler it is to get the values.

--Below is how they are used.
SELECT ORDER_ID, MAX_BY(STATUS, ORDER_DT) FROM DEMO_MINBYMAXBY_01 GROUP BY ORDER_ID;
SELECT ORDER_ID, MIN_BY(STATUS, ORDER_DT) FROM DEMO_MINBYMAXBY_01 GROUP BY ORDER_ID;

5. GROUP BY ALL:

In most of the cases where there is a need to do aggregation on multiple columns we tend to write GROUP BY with clauses like “GROUP BY C1, C2, C3, etc..” OR “GROUP BY 1,2,3,4”. Few of the downside with this approach is think of it like if we want to “add” OR “remove” some of the columns in the SELECT clause then our SQL query would FAIL.

With this option we just have to mention the keyword as “GROUP BY ALL” and that is it we do not have to worry about if we are keeping all column in the GROUP BY clause, or even if we add/forget anything this should take care of it. Below is how it can be used:

--Below is how we can use the GROUP BY ALL clause.
SELECT 
    O_ORDERSTATUS, 
    O_ORDERPRIORITY, 
    SUM(O_TOTALPRICE) 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
GROUP BY ALL;


-- Some of the caveats like we cannot combine other columns alongwith GROUP BY ALL:
SELECT 
    O_ORDERSTATUS, 
    O_ORDERPRIORITY, 
    SUM(O_TOTALPRICE) 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
GROUP BY ALL, O_ORDERSTATUS;           -----> THIS WOULD FAIL.

6. SELECT * “ILIKE”

ILIKE with SELECT clause can become very handy when we want to SELECT only limited columns with some pattern from a table. Imagine that we are attempting to obtain every column whose name contains the word “order” and then fire query. The ILIKE keyword is useful in this situation.

While the percent sign (%) can be used to match any sequence of zero or more characters, the underscore (_) symbol can match any single character. The matching procedure for patterns is case-insensitive, as the ILIKE operator implies.


--Below is how we just get the column names which has "order" key word in it.
SELECT * ILIKE '%order%' 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
LIMIT 10;


--Another way to get the output with pattern matching.
SELECT * ILIKE '%_r' 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
LIMIT 10;

7. IF EXISTS, IF NOT EXISTS

For developers and database managers, adding or removing columns is a standard procedure. However, there are times when we should only add a column if it isn’t there already or remove a column if it is. Snowflake extends the “IF [NOT] EXISTS” clause to the column level for this reason.

Let us see them as an example:

Consider the below table as my BASE table:

-- To add a new column only if it do not exist::
ALTER TABLE EMPLOYEE_TABLE ADD COLUMN IF NOT EXISTS DEPARTMENT_NAME VARCHAR;
  --> As we saw the column "department_name" was not present and hence it would get added.

-- To drop an existing column only if it’s present::
ALTER TABLE EMPLOYEE_TABLE DROP COLUMN IF EXISTS LAST_NAME;
--> As we saw the column "LAST_NAME" was present and hence it would get dropped.

8. ARRAY_SORT, ARRAY_MIN, ARRAY_MAX

These functions are really essential when we want to process the array data in an ordered way. Also these functions have the potential to consider NULL as one of there comparator operators and accordingly gives the output. Some examples are given as below:


--This query would give elements of an input array in a sorted manner.
SELECT ARRAY_SORT(ARRAY_GENERATE_RANGE(0,100,20), FALSE);
   o/p:: [80,60,40,20,0]

--This query would maximum value in this element even if we have NULLs in it.
SELECT ARRAY_MAX([20, 0, NULL, 10, NULL]);
    o/p:: 20

--This query would give minium value in this element even if we have NULLs in it.
SELECT ARRAY_MIN([20, 0, NULL, 10, NULL]);
    o/p:: 10

9. BANKER’s ROUNDING

Banker’s rounding or the option “rounding half to even” is something which is relatively new in Snowflake. Folks would now have the option to select between rounding half to even and this rounding mode with the addition of the new parameter. This is used in conjuction with Snowflake’s ROUND() function. This function allows to choose between 2 options i.e., “HALF_TO_EVEN” and “HALF_AWAY_FROM_ZERO”. Let us see some of the examples:

If we see above, then HALF_TO_EVEN gives us the value as “2” which is a nearest even number of 2.5

If we see above, then HALF_TO_EVEN gives us the value as “4” which is a nearest even number of 3.5

10. Keyword “TABLE” with new capability

We know that “SELECT *” gives us the output from the table by giving us all the columns. Now we have an alternative as well i.e., just “TABLE”. Below is the example:

SELECT * FROM EMPLOYEE_TABLE; --- This is the conventional way.

TABLE EMPLOYEE_TABLE; --- This is the new way. The o/p is still going to be the same.

Let us now discuss some of the PRIVATE PREVIEW features on Snowflake . These mostly can be used so that it can be used as DataOps or DevOps use cases.

“CREATE OR ALTER TABLE”:

Now, this is feature i.e., CREATE OR ALTER TABLE where we keep on adding new columns is a private preview feature. This command is quite helpful when dealing with slowly changing schemas, particularly when there is ambiguity about a table’s current state. The benefit is that you don’t have to start from scratch every time you make a table or figure out exactly which ALTER has to be done.

/*This is how the syntax is expected to work. 
/*Please note it is a Private preview feature hence we need follow Snowflake to get more insights about it. */

CREATE OR ALTER TABLE EMPLOYEE_TABLE(
  emp_id NUMBER,
  emp_name VARCHAR,
  salary NUMBER
);

SUMMARY:

I have shared some of the key features which are essentially new within Snowflake and also given some perspective on the potential use cases that these features solves. Please note that Snowflake is an well established and also an evolving platform for good. Hence actively following them and getting to know about all new features & capabilities helps in solving varied use cases.

Please keep reading my blogs it is only going to encourage me in posting more such content. You can find me on LinkedIn by clicking here and on Medium here. Happy Learning :)

Awarded consecutively as “Data Superhero by Snowflake for year 2024 & 2023”. Links: here

Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.