edX Microsoft: DAT201x Querying with Transaction-SQL

This post is my notes for https://courses.edx.org/courses/course-v1:Microsoft+DAT201x+2015_T4
The source files of the learning materials can be found https://github.com/MicrosoftLearning/QueryingT-SQL
I’m omitting Module 1 and 2 from my notes.

My honour code certificate can be found here https://courses.edx.org/certificates/be978f16f49148c4b21278a2a5bed46f

Module 3: Querying Multiple Tables with Joins

Image courtesy of C.L. Moffatt http://www.codeproject.com/KB/database/Visual_SQL_Joins/Visual_SQL_JOINS_orig.jpg

Module 4: Using Set Operators

  • UNION returns a result set of distinct rows combined from all statements.
  • UNION removes duplicates during query processing (affects performance).
  • UNION ALL retains duplicates during querying processing.
  • INTERSECT to return only rows that are returned by both queries.
  • EXCEPT to return rows from the first query that are not returned by the second query.

Module 5: Using Functions and Aggregating Data

  • Scalar functions return a single value based on zero or more input parameters. e.g. LTRIM(…)
  • Logical functions return Boolean values (true or false) based on an expression or column value. e.g. ISNUMERIC(…), IIF(…) and CHOOSE(…). The CHOOSE function returns the value in the 1-based ordinal position.
  • Window functions are used to rank rows across partitions or “windows”. Window functions include RANK, DENSE_RANK, NTILE, and ROW_NUMBER.
    The word window here refers to a set of rows.
  • Aggregate functions are used to provide summary values for mulitple rows — for example, the total cost of products or the maximum number of items in an order. Commonly used aggregate functions include SUM, COUNT, MIN, MAX, AVG, COUNT_BIG, GROUPING_ID, and STDEV.

  • GROUP BY with aggregate functions to return aggregations grouped by one or more columns or expressions.
  • All columns in the SELECT clause that are not aggregate function expressions must be included in a GROUP BY clause.
  • The order in which columns or expressions are listed in the GROUP BY clause determines the grouping hierarchy.
  • HAVING clause to filter the groups that are included in the query results. The WHERE clause comes into action before the HAVING clause. WHERE clause eliminates rows not wanted, while HAVING clause eliminates groups not wanted.

For documentation refer to https://msdn.microsoft.com/en-us/library/ms174318.aspx

Module 6: Using Subqueries and APPLY

  • Scalar subqueries return a single value. e.g. WHERE customerId = (SELECT MAX(customerId) FROM …)
  • Multi-valued subqueries return a single-column rowset. e.g. WHERE customerId IN (SELECT customerId from …)
  • Self-contained queries have no connection with the outer query, and it only passes values to the outer query. For instances, scalar subqueries and multi-valued subqueries.
  • Correlated subqueries reference objects in the outer query. See documentation https://technet.microsoft.com/en-us/library/ms187638

  • APPLY operator enables you to execute a table-valued function for each row in a rowset returned by a SELECT statement. Conceptually, this approach is similar to a correlated subquery.
  • CROSS APPLY returns matching rows, similar to an inner join. OUTER APPLY returns all rows in the original SELECT query results with NULL values for rows where no match was found.

Module 7: Using Table Expressions

  • Views are database objects that encapsulate SELECT queries.
  • Temporary tables are prefixed with a # symbol and stored in a temporary workspace (the tempdb database in SQL Server). Temporary tables are automatically deleted when the session in which they were created ends. Excessive use of temporary tables can negatively affect overall database server performance. While global temporary tables are created with ## symbol.
  • Table variables are prefixed with a @ symbol and are stored in memory. Table variables are scoped to the batch in which they are created. Table variables work best with small sets of data. e.g. DECLARE @varProduct table (…)
  • Table-Valued Functions (TVFs) are functions that return a table of rowset. e.g. CREATE FUNCTION …(…params) RETURNS TABLE AS …
  • Derived table is a subquery that generates a multicolumn rowset. Derived table must have AS e.g. SELECT … FROM (SELECT … ) AS tmp WHERE …
    Or it can be SELECT … FROM (SELECT…) AS tmp(output1, output2…) WHERE…

  • Common Table Expressions (CTEs) provide a more intuitive syntax or defining rowsets than derived tables, and can be used mulitple times in the same query. CTEs can be used to define recursive queries.
  • To perform recursion, use Common Table Expressions (CTEs). And if we need to perform recursive queries, CTE is pretty much the way to go.
Image courtesy of edX Microsoft: DAT201x Querying with Transact-SQL https://courses.edx.org/courses/course-v1:Microsoft+DAT201x+2015_T4/info

Module 8: Grouping Sets and Pivoting Data

  • GROUPING SETS to define custom groupings.
  • ROLLUP to include subtotals and a grand total for hierarchical groupings.
  • CUBE to include all possible groupings.
  • GROUPING_ID function indicates whether the specified column or expression has been used to generate the row. e.g. SELECT GROUPING_ID(…) FROM …
Image courtesy of edX Microsoft: DAT201x Querying with Transact-SQL https://courses.edx.org/courses/course-v1:Microsoft+DAT201x+2015_T4/info
  • PIVOT to re-orient a rowset by generating mulitple columns from values in a single column.
  • UNPIVOT to re-orient mulitple columns in a an existing rowset into a single column.

Additional Resources

  • WITH clause is a way to write subqueries, making the entire statement more readable.