Monday, September 24, 2007

SQL Confusion

It is bizarre how often I find laughable and silly code in production databases that are part of software which costs millions of dollars.

I'm sure there will be more of these to come. Here's my contribution for today.

SELECT distinct
...,
(select DATEADD(mi,-DatePart(mi, GetDate()), DATEADD(ss,-(DatePart(ss, GetDate())), GetDate()))) as orecordeddtm

WHOA! Run that statement and you'll see it's not only convoluted but it's wrong, too: it leaves the milliseconds portion of the time still on there (it's a datetime column). Here's my proposed simple and working alternative:

DateAdd(hh, DateDiff(hh, 0, GetDate()), 0)

The query is DISTINCT, which isn't a problem in this case because the expression is GetDate(). But what if the author of this used the same logic for a column? The left-in milliseconds would be messing everything up. Plus, why is the expression inside its very own select clause? Bizarre. Which of the following two statements makes more sense to YOU?

SELECT 1 + 1
SELECT (SELECT 1) + (SELECT 1)

Interesting is about the kindest description I can come up with for this.

Thursday, September 6, 2007

Mixed-Up Statement Types

I am going to start trying to put together the basic "What you need to know to get started with SQL Server." I see the same exact problems over and over again in SQL help forums. FAQs are good, but there are dozens and dozens of them, each generally solving a specific problem. I see a need for something before that. People are getting so confused they don't even know what to ask, let alone know what kind of FAQ is going to help them with their problem.

So I'll one at a time make some posts about different conceptual issues I see and then finally I'll pull it all together as the "Top 10 Concepts" for beginners. Or whatever name seems right at the time. :)

So today I'm going to start with the different types of statements in T-SQL, and particularly the difference between expressive, procedural, and manipulative statements. Other kinds of statements we won't concern ourselves with now are data definition, data control, and transaction control. People generally don't get as confused about those.

Expressive Statements

An expression consists of one or more literal values or functions tied together with operators, which when evaluated in the correct order result in a value or collection of values.

In most cases an expression returns only a single value. In computing there is a special term for "a thing that is only a single value" called a scalar. A composite or many-valued expression may be a list, array, or record. In SQL Server there are some cases where a rowset can be treated as a list and used in an expression.

Here are some example expressions in T-SQL:

Power(2, 10)
-- The literal values 2 and 10 are operated on by the built-in function power which returns a single value.

(18 + 23) * 7 % 5
— The literal values 18, 23, 7, and 5 are combined with operators: addition, grouping (parentheses), multiplication, and modulus, to return a single value.

CASE ... WHEN ... <WHEN ...> ELSE ... END

These are all scalar expressions. I'll give an example of a multi-valued expression later because to use it one has to mix some of the statement types.

Let me call attention specifically to the fact that with only one exception, all parts of an expression are evaluated. CASE statements do support "short-circuit" logic where the evaluation stops with the last WHEN condition that is true. But in general, you should think about expressions as an all-or-nothing proposition. Nothing can be done with all the parts of an expression until the whole thing is evaluated into a final and single scalar.

Procedural Statements

Procedural statements are called that because there is some procedure that must be followed. It isn't a simple case of order-of-operations resulting in a single value. There is in fact no value expressed at all. Here are some examples:

IF <true-or-false expression>
WHILE <true-or-false expression>
RETURN <integer expression>
THEN
ELSE
BEGIN
END
GOTO
RAISERROR(...)
PRINT

These all tell the query parser what statement it should proceed to next. Note that some of these procedural statements expect expressions immediately following. The server, while executing the statements, has to evaluate the expression in order to know what it is supposed to do. Let me call attention specifically to the fact that not every statement may be executed, and they may be executed repeatedly or out of order.

Manipulative statements

These are normally called queries (though some of them do things instead of just asking questions as the word query implies).

SELECT
INSERT
UPDATE
DELETE
UNION

The Point Of All This

Now that you know the three main kinds of statements (and I won't rule out the possibility of there being more or of there being subclassifications of these) the key concept you must know to get along well with SQL Server is that when a certain kind of statement is expected, you can't use a different one in its place.

Here are some examples of confusing one kind of statement with another.

-- Example 1
IF (IF @b = 2 THEN @i ELSE @j) = 1 PRINT 'True' ELSE PRINT 'False'

IF expects a true-false expression after it. You can't put another procedural statement there.

-- Example 2
EXECUTE CASE WHEN @Action = 'Kill' THEN MyKillingSP ELSE MyNonKillingSP END

EXECUTE isn't looking for an expression. It wants an SP name. Instead, do

IF @Action = 'Kill' EXECUTE MyKillingSP ELSE EXECUTE MyNonKillingSP.

There. EXECUTE is happy now because it has an SP name immediately following in each case.

-- Example 3 - An actual example from online
RETURN
   (
      IF @LINEID = '123'
         BEGIN
            select * FROM TABLE1
         END
      ELSE
         BEGIN
            SELECT * FROM TABLE2
         END
      END
   )

In a stored procedure, RETURN expects an integer and in a user-defined function it expects an expression or a rowset or nothing, depending on its type. You can't put procedural statements in. This is obviously a rowset-returning user-defined function. So it will have to be a table-variable returning one. Use procedural logic to insert the right rows into the table variable, and then simply RETURN.

-- Example 4
SELECT *
FROM MyTable
WHERE
   Status = (IF @Status IS NULL THEN '' ELSE @Status)

IF is a procedural statement. You can't just embed it in the middle of an expression. And the IF statement switches back to expressive again: the first keyword after THEN can't be an expression. What's supposed to be done with that? It has to be procedural or manipulative: SET or INSERT or GOTO or PRINT. If you ran the following all by itself as its own batch, you'd get an error:

''

So '' can't be the first thing after THEN.

The way to do an expressive if-then-else is with a CASE statement. I think it's worth spending a moment to explain that it has two forms:

--Simple equivalency
CASE CaseExpression WHEN TestEquivalentExpression THEN ResultExpression <WHEN ... THEN ...> ELSE ... END

CASE @i
   WHEN 1 THEN 'One'
   WHEN 2 THEN 'Two'
   WHEN 3 THEN 'Three'
   WHEN NULL THEN 'NULL'
 -- This doesn't work because "@i = NULL" will always fail.
 -- Use the next construction to test for this.

   ELSE 'Invalid'
END

-- fully expressive
CASE WHEN TrueOrFalseExpression THEN ResultExpression <WHEN ... THEN ...> ELSE ... END

CASE
   WHEN @i = 1 THEN 'One'
   WHEN @i = 2 THEN 'Two'
   WHEN @i = 3 THEN 'Three'
   WHEN @i IS NULL THEN 'NULL'
   ELSE 'Invalid'
END

But it resolves to a single final value. I believe (though I need confirmation on this) that the query engine expands the first syntax into the second. So be wary of using nondeterministic functions like Rand() in CASE statements. (Nullif() definitely gets expanded to a CASE statement so it has the same issue.)

SELECT *
FROM MyTable
WHERE
   Status = CASE WHEN @Status IS NULL THEN '' ELSE @Status END

This is just an example and may not be best coding practice.

Manipulative Statements As Expressions

Finally, there are some special cases where manipulative statements can be used in expressions or as expressions.

1. A manipulative statement that returns a single column and a single row can be used in most any place that an expression can be used. This is done by placing it in parentheses:

IF (
   SELECT OrderGroup FROM OrderStatuses WHERE OrderStatus = @OrderStatus
) = 2 ...

PRINT (SELECT Count(*) FROM MyTable)

2. A manipulative statement that returns a single column and many rows can be used in an expressive statement immediately following certain operators, again in parentheses. Those keywords are IN, EXISTS, and in conjunction with inequality operators, ANY, SOME, and ALL:

SELECT *
FROM Orders
WHERE OrderStatus IN (SELECT OrderStatus FROM OrderStatuses WHERE OrderGroup = 2)
-- Note this is better done with an INNER JOIN.

IF EXISTS (
   SELECT *
   FROM Orders
   WHERE
      ExpectedShipDate >= GetDate()
      AND OrderStatus = 1
) ...
-- note that because of EXISTS, no columns are returned, and the query engine stops as soon as it finds one row meeting the criteria.

SELECT *
FROM Orders
WHERE
   OrderStatus > ALL (
      SELECT OrderStatus
      FROM OrderStatuses
      WHERE OrderGroup = 2
   )
-- can use ANY and SOME as well as ALL

Note that some other SQL languages may allow multiple columns in some of these kinds of expressions.

That should be enough to get people started on understanding

Note: I might have mistakes or be missing some details in this blog entry. If I am, I'd appreciate you pointing it out so I can make the final version accurate and complete.