SQL Gotchas and Other Things You Should Know

The clauses are logically processed in the following order:

In the WHERE clause, you specify a predicate or logical expression to filter the rows returned by the FROM phase. Only rows for which the logical expression evaluates to TRUE are returned by the WHERE phase to the subsequent logical query processing phase.

The GROUP BY phase allows you to arrange the rows returned by the previous logical query processing phase in groups. The groups are determined by the elements you specify in the GROUP BY clause.

If the query involves grouping, all phases subsequent to the GROUP BY phase—including HAVING, SELECT, and ORDER BY—must operate on groups as opposed to operating on individual rows. Each group is ultimately represented by a single row in the final result of the query. This implies that all expressions that you specify in clauses that are processed in phases subsequent to the GROUP BY phase are required to guarantee returning a scalar (single value) per group.

Elements that do not participate in the GROUP BY list are allowed only as inputs to an aggregate function such as COUNT, SUM, AVG, MIN, or MAX.

Note that all aggregate functions ignore NULL marks with one exception—COUNT(*).

If you want to handle only distinct occurrences of known values, specify the DISTINCT keyword in the parentheses of the aggregate function.

With the HAVING clause, you can specify a predicate to filter groups as opposed to filtering individual rows, which happens in the WHERE phase. Only groups for which the logical expression in the HAVING clause evaluates to TRUE are returned by the HAVING phase to the next logical query processing
phase. Groups for which the logical expression evaluates to FALSE or UNKNOWN are filtered out.

Because the HAVING clause is processed after the rows have been grouped, you can refer to aggregate functions in the logical expression.

The SELECT clause is where you specify the attributes (columns) that you want to return in the result table of the query. You can base the expressions in the SELECT list on attributes from the queried tables, with or without further manipulation.

Remember that the SELECT clause is processed after the FROM, WHERE, GROUP BY, and HAVING clauses. This means that aliases assigned to expressions in the SELECT clause do not exist as far as clauses that are processed before the SELECT clause are concerned. A very typical mistake made by programmers who are not familiar with the correct logical processing order of query clauses is to refer to expression aliases in clauses that are processed prior to the SELECT clause. Here’s an example of such an invalid attempt in the WHERE clause.

The ORDER BY clause allows you to sort the rows in the output for presentation purposes. In terms of logical query processing, ORDER BY is the very last clause to be processed.

One of the most important points to understand about SQL is that a table has no guaranteed order, because a table is supposed to represent a set (or multiset, if it has duplicates), and a set has no order. This means that when you query a table without specifying an ORDER BY clause, the query returns a table result, and SQL Server is free to return the rows in the output in any order. The only way for you to guarantee that the rows in the result are sorted is to explicitly specify an ORDER BY clause. However, if you do specify an ORDER BY clause, the result cannot qualify as a table, because the order of the rows in the result is guaranteed. A query with an ORDER BY clause results in what standard SQL calls a cursor—a nonrelational result with order guaranteed among rows. You’re probably wondering why it matters whether a query returns a table result or a cursor. Some language elements and operations in SQL expect to work with table results of queries and not with cursors; examples include table expressions and set operators, which I cover in detail in Chapter 5, “Table Expressions,” and in Chapter 6, “Set Operators.”

The ORDER BY phase is in fact the only phase in which you can refer to column aliases created in the SELECT phase, because it is the only phase that is processed after the SELECT phase.

T-SQL allows you to specify elements in the ORDER BY clause that do not appear in the SELECT clause, meaning that you can sort by something that you don’t necessarily want to return in the output.

However, when DISTINCT is specified, you are restricted in the ORDER BY list only to elements that appear in the SELECT list. The reasoning behind this restriction is that when DISTINCT is specified, a single result row might represent multiple source rows; therefore, it might not be clear which of the multiple possible values in the ORDER BY expression should be used.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>