Squared Thoughts

Wednesday, January 16, 2013

On Conversion, Data Types, and Flawed Code

I came across the following expression recently:

DisplaySequence = Convert(int, IsNull(A.DisplaySequence, 10000))

I was immediately struck by an inconsistency: the IsNull function is being passed an int as its second parameter. That doesn't make sense. If the Convert is needed in the first place, then the DisplaySequence column is not int already, so either the 10000 is going to be converted to some non-integer data type, or the DisplaySequence column will be converted to an int. Neither of those make any sense. If Convert is needed at all, it should come before the IsNull, so that it is the correct data type to not possibly perform strange conversions to one or another of the IsNull operands' disparate data types.

Also, if this is working in production code and hasn't been causing errors or improper ordering, then all the values in the DisplaySequence column, when not NULL, are already integers or an equivalent data type (as far as ordering is concerned)! And if any could be non-integers, then some way to extract the numbers from them or convert them to NULL would be needed. So all in all, this expression inherently makes no sense.

But let's address the question--could this column contain non-integers? A quick sp_help TableName reveals that the column is... drumroll please... int. So the Convert is definitely unnecessary, because the column is already int!

Why does it matter to make your expressions make sense? It was working, wasn't it, without any errors in production code? Who cares?

Well, it has been said that any programmer can write code that a computer can understand. But it takes a good programmer to write code that a human can understand. The very fact that I had to go and check the data type of the DisplaySequence column is proof that there is a cost to this kind of thing. There is no way for anyone to know that an inherently flawed expression is immune to error, and this prompts research on the spot to see if so. Inherently flawed code expressions should be corrected, immediately. My work to correct this today is paying down the technical debt that was taken out when it was first written--and now the next visitor to this code won't have to waste time.

Here is the correct expression:

DisplaySequence = IsNull(A.DisplaySequence, 10000)

Now there is no inherent contradiction and we can implicitly trust that DisplaySequence is of type int. Even if wrong on this point, we've removed the eyesore that would trip up the next visitor to the code.

Monday, January 7, 2013

The Excluded Possibility

Consider this code:

SELECT
   Value =
      CASE
         WHEN Value IS NULL AND Text IS NOT NULL THEN Text
         WHEN Text IS NULL AND Value IS NOT NULL THEN Value
         ELSE NULL
      END

Does a problem leap out at you? It should. What if Value and Text are both not NULL? Then the whole expression is NULL. There's no WHEN Text IS NOT NULL AND Value IS NOT NULL THEN ... case.

Maybe the query can never return a non-NULL value for both of the columns at the same time. But in that case, why bother checking for the NULLity of the other column in each condition? It doesn't make sense.

This can be rewritten much more sensibly:

SELECT
Value = IsNull(Value, Text)

It's that simple. If the two both can't be non-NULL at the same time, it will return the same value as the whole prior expression. If they can be both non-NULL at the same time, well, we have a problem... but no bigger a problem than the previous one. We would simply show one of the columns instead of a NULL. Perhaps here is the real solution:

SELECT
Value = IsNull(Value, '') + IsNull(Text, '')

Something along these lines (perhaps with a separator in the case they both are non-NULL) is the only thing that makes sense in terms of explicitly checking the NULLity of both columns.

Friday, September 19, 2008

Think about what you are doing!

Last statement in a stored procedure running in production:

SELECT @ReturnStatus = @@ERROR
IF(@ReturnStatus <> 0)
RETURN (@ReturnStatus)

Making things as simple as possible isn't just for self-congratulatory back-patting. It's because the next time you're looking at the code it will make it easier to understand what it's doing. So let's make it simple:

RETURN @@ERROR;

Wednesday, May 7, 2008

Some Pain Is Avoidable - So Spare Yourself

Ever heard of a runaway F5? This is when you intend to highlight only a small section of SQL code but you miss for some reason and when you hit F5, the entire script you have loaded runs instead of just the small part you tried to highlight.

It's not enough to put a syntax error at the top of your script because there might be some batch separating GOs in there. So there are a few things that I routinely do to protect against this because the results of such a mistake can be, shall we say, painful.

1. Put this at the top:

create database trash
GO
use trash
GO

This will make most of your code run in the wrong context. The GO statements are important as you need these to run without errors.

If your code uses three-part names to point to specific databases, this will not help, so keep reading.

2. Don't use the database name in dev-only code, and think carefully about your use of database context in your production code. If you put a USE statement in your manually-attended script, hide it inside comments:

/*
USE Somedatabase
*/

-- USE Somedatabase

The block comment way makes highlighting the whole line easier if and when you need to run it.

3. Comment out the modification part of your statements and turn them into selects:

Instead of the following query:

UPDATE MyTable
SET Salesrep = 42
WHERE CustID = 16

This is a much safer query to leave lying around in your script:

SELECT
   SalesRep,
   NewSalesRep = 42,
   *
-- UPDATE M SET Salesrep = 42
FROM MyTable M
WHERE CustID = 16

The highlighted part is what you have to select in order to run your update (note that you're skipping the commenting characters at the beginning of the line). You can still make a mistake by leaving off the WHERE clause, but at least you know no updates will be made unless you do some careful selecting.

Here's one rough idea for a way to protect against forgetting the where clause:

SELECT
   SalesRep,
   NewSalesRep = 42,
   *
-- BEGIN TRAN UPDATE M SET Salesrep = 42
FROM MyTable M
WHERE CustID = 16
ROLLBACK TRAN
-- COMMIT TRAN

You can still blow it but it's harder. Any full-line end selection is protected.

And another, simpler and possibly better:

SELECT
   SalesRep,
   NewSalesRep = 42,
   *,
-- UPDATE M SET Salesrep = 42
FROM MyTable M /*
*/ WHERE CustID = 16

A full-line end selection is again protected.

4. Be paranoid. Don't let yourself get comfortable. THINK about each update you do to real, live, data. Make an extra "unnecessary" backup. Convert your modification statements to selects and check it. Select the results into a new table and do some joins to make sure they were right. Run the update in a transaction and do some queries to verify them. Keep in mind potential locking & blocking, though, so if people are accessing the table you're modifying, create the check statements and do the update and the check all at once:

BEGIN TRAN

UPDATE Blah -- My potentially dangerous modification

SELECT * FROM Blah B INNER JOIN Gorp G ON Splat = Bonk

ROLLBACK TRAN

Only run the whole block. If you don't like what you got, fix it, and run it again.

Be wary of leaving open any transaction, even one you don't think will affect other people. A stray SQL editor session holding some kind of lock is annoying and can cause problems.

I hope these ideas are useful to you and save you some pain! I had to learn this the hard way. I am simply fortunate that I never did anything REALLY bad before acquiring these habits.

Update 2008-05-16

5. If you must have a USE statement, check the server:

IF @@Server = 'DevServerName' USE DevDatabase

Depending on what you're doing in your code and how potentially destructive it is, you can do some pretty clever things to help prevent problems. Use the tools at your disposal.

Friday, April 18, 2008

SQL Waste Land

Welcome to Waste Land, the place where your server is full of char and nchar columns instead of varchar and nvarchar columns. In case you're not familiar with those, the difference between the two sets of data types is that the ones that aren't varying (don't have the letters "var" in the data type name) always consume a fixed amount of space. So if you define a column in a table as char(2) and then insert the value 'a', the database actually stores 'a ' with an extra space.

In SQL Server 2008, when you turn on rowlevel and page compression, it will treat char as varchar and will also compress values. But until then, char columns waste space.

Inspired by the wastefulness of a large database I work with, I wrote some code to find out: just how much of the database is wasted space?

First, some prerequisite functions:

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[TokenItem]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[TokenItem]
GO
/* Function TokenItem
Author: Erik E
Returns the [TokenNum]th item from a string
For example
PRINT dbo.TokenItem('The quick brown fox jumped over the lazy dog', 4, ' ')
will return the fourth token of the string, with tokens identified as delimited by spaces: fox
*/
CREATE FUNCTION TokenItem (
   @String varchar(8000),
   @TokenNum int,
   @Delimiter varchar(8000) = ','
)
RETURNS varchar(8000)
AS
BEGIN
   DECLARE @DelimLen int
   SET @DelimLen = Datalength(@Delimiter)
   IF @TokenNum < 1 OR @TokenNum > 1 + (Datalength(@String) - Datalength(Replace(@String, @Delimiter, ''))) / @DelimLen
      RETURN ''

   DECLARE @Pos int
   SET @Pos = 1

   WHILE @TokenNum > 1 BEGIN
      SET @Pos = CharIndex(@Delimiter, @String, @Pos) + @DelimLen
      SET @TokenNum = @TokenNum - 1
   END

   RETURN SubString(@String, @Pos, CharIndex(@Delimiter, @String + @Delimiter, @Pos) - @Pos)
END

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[TokenItemRev]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[TokenItemRev]
GO
CREATE FUNCTION TokenItemRev (
   @String varchar(8000),
   @TokenNum int,
   @Delimiter varchar(1) = ','
)
RETURNS varchar(8000)
/*
Returns the @TokenNum-th token as separated by @Delimiter, counting from the last token
Returns an empty string if no token by that number exists (< 1 or > tokencount)
Example:
PRINT dbo.TokenItemRev('a,b,c,d,e', 2, ',')
returns 'd'
*/
AS
BEGIN
   RETURN Reverse(dbo.TokenItem(Reverse(@String), @TokenNum, @Delimiter))
END

Now the real meat of the tool.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[SpaceCharInfoCollectBackup]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[SpaceCharInfoCollect]
GO
/*************************************************************************************
Procedure:   SpaceCharInfoCollect
Version:      1.2
Date:         8/11/2006
Author:      Erik E

Description:
   Find out how much wasted space there is in a database,
   because of the use of char or nchar columns instead of varchar or nvarchar.
   Doesn't support object names containing periods.

Parameters:
   TableName      Table to analyze for space information. Since it defaults to the
                  database the stored procedure is in, include the database name
                  if the stored procedure is being run from a different database,
                  as in database.owner.table. Can also specify owner.
   DestTableName  Table to receive the result information, can be in another database
                  on the same server. Use database.owner.table if desiring a table
                  in a different database than the stored procedure is located in or
                  if you want a different owner.
   DateStamp      Time to store as the data collection time. Defaults to GetDate().

Planned Updates:
   A version that runs against all databases on a server, without using undocumented
   stored procedures (See SP SpaceInfoCharCollectAllDB).

Update History:
   1.0 <8/11/2006>:   Original version
   1.1 <10/13/2006>:  Updated to accept destination table, and database name for
                      table being reported on. Added database name to results table.
                      Added object owner logic and column to results table.
   1.2 <4/18/2007>:   Used numbers instead of names for column selection to reduce
                      script size, to handle more char columns in a table without errors

Examples:
USE ThisDatabase
EXECUTE SpaceInfoCharCollectAllDB

USE DesiredDatabase
EXECUTE sp_msforeachtable 'EXECUTE DatabaseOfThisSP.dbo.SpaceCharInfoCollect ''DesiredDatabase.?'', ''DesiredDatabase.dbo.SpaceCharInfo'''
*************************************************************************************/

CREATE PROCEDURE SpaceCharInfoCollect
   @TableName varchar(8000), -- table to analyze space information on, include databasename, use brackets if name has weird characters
   @DestTableName varchar(8000), -- table to receive the result information, can be in another database on the same server
   @DateStamp datetime = NULL, -- defaults to GetDate(),
   @Debug bit = 0
AS
DECLARE
   @NSQL nvarchar(4000),
   @PreSQL varchar(8000),
   @SQL1 varchar(8000),
   @MidSQL1 varchar(8000),
   @SQL2 varchar(8000),
   @MidSQL2 varchar(8000),
   @SQL3 varchar(8000),
   @PostSQL varchar(8000),

   @DB varchar(8000),
   @Owner varchar(8000),
   @Table varchar(8000),

   @DestDB varchar(8000),
   @DestOwner varchar(8000),
   @DestTable varchar(8000)

SET NOCOUNT ON

PRINT @TableName

IF @DateStamp IS NULL SET @DateStamp = GetDate()

SET @TableName = Replace(@TableName, '''', '''''')
SET @Table = WhatDatabase.dbo.TokenItemRev(@TableName, 1, '.')
SET @Owner = WhatDatabase.dbo.TokenItemRev(@TableName, 2, '.')
IF @Owner = '' SET @Owner = user
IF @Owner LIKE '[[]%]' SET @Owner = SubString(@Owner, 2, Len(@Owner) - 2)
SET @DB = WhatDatabase.dbo.TokenItemRev(@TableName, 3, '.')
IF @DB = '' SET @DB = Replace(DB_Name(), '''', '''''')
IF @DB LIKE '[[]%]' SET @DB = SubString(@DB, 2, Len(@DB) - 2)

IF @Table LIKE '[[]%]' SET @Table = SubString(@Table, 2, Len(@Table) - 2)

SET @DestTableName = Replace(@DestTableName, '''', '''''')
SET @DestTable = WhatDatabase.dbo.TokenItemRev(@DestTableName, 1, '.')
SET @DestOwner = WhatDatabase.dbo.TokenItemRev(@DestTableName, 2, '.')
IF @DestOwner = '' SET @DestOwner = user
SET @DestDB = WhatDatabase.dbo.TokenItemRev(@DestTableName, 3, '.')

SET @SQL1 = 'IF NOT EXISTS (SELECT * FROM '
   + @DestDB
   + '.Information_Schema.Tables WHERE Table_Name = ''' + @DestTable + ''' AND Table_Schema = ''' + @DestOwner + ''')
   SELECT
      SpaceDate = convert(datetime, NULL),
      DatabaseName = convert(varchar(128), NULL),
      OwnerName = convert(varchar(128), NULL),
      TableName = convert(varchar(128), NULL),
      ColumnName = convert(varchar(128), NULL),
      ExtraSpaces = convert(bigint, 0),
      RowCnt = convert(bigint, 0)
   INTO ' + @DestTableName + '
   WHERE 1 = 0'

EXECUTE (@SQL1)

SET @PreSQL = 'INSERT ' + @DestTableName + ' (SpaceDate, DatabaseName, OwnerName, TableName, ColumnName, ExtraSpaces, RowCnt)
SELECT
   SpaceDate = ''' + Convert(varchar(50), @DateStamp, 121) + ''',
   DatabaseName = ''' + @DB + ''',
   OwnerName = ''' + @Owner + ''',
   TableName = ''' + Replace(@Table, '''', '''''') + ''',
   ColumnName,
   ExtraSpaces = CASE ColumnNum'
SET @MidSQL1 = '
      END,
   RowCnt
FROM
   (
      SELECT'
SET @MidSQL2 = ',
         RowCnt = Count(*)
      FROM ' + Replace(@TableName, '''', '''''') + ' WITH (NOLOCK)
   ) X CROSS JOIN ('
SET @PostSQL = '
   ) Y (ColumnNum, ColumnName)'

SET @SQL1 = ''

SET @NSQL =
N'SELECT
   @SQL1 = @SQL1 + ''
      WHEN '' + Convert(varchar(11), Ordinal_Position) + '' THEN ['' + Replace(Column_Name, '''''''', '''''''''''') + '']''
      + CASE WHEN Data_type = ''nchar'' THEN ''* 2'' ELSE '''' END,
   @SQL2 = IsNull(@SQL2 + '','', '''') + ''
         ['' + Replace(Column_Name, '''''''', '''''''''''') + ''] = Sum(IsNull('' + LTrim(Str(Character_Maximum_Length)) + '' - Len(['' + Replace(Column_Name, '''''''', '''''''''''') + '']), 0))'',
   @SQL3 = IsNull(@SQL3, '''') + ''
      '' + CASE WHEN @SQL3 IS NULL THEN '''' ELSE ''UNION ALL '' END + ''SELECT ''
      + Convert(varchar(11), Ordinal_Position) + '', ''
      + '''''''' + Replace(Column_Name, '''''''', '''''''''''') + ''''''''
FROM
   ' + QuoteName(@DB) + '.Information_Schema.[Columns] C
   INNER JOIN ' + QuoteName(@DB) + '.Information_Schema.Tables T ON C.Table_Name = T.Table_Name and Table_Type = ''base table''
WHERE
   Data_Type in (''char'', ''nchar'')
   AND T.Table_Name = ''' + @Table + '''
   AND T.Table_Schema = ''' + @Owner + ''''

IF @Debug = 1 PRINT @NSQL

EXECUTE sp_executesql @NSQL, N'@SQL1 varchar(8000) OUTPUT, @SQL2 varchar(8000) OUTPUT, @SQL3 varchar(8000) OUTPUT', @SQL1 OUTPUT, @SQL2 OUTPUT, @SQL3 OUTPUT
IF @SQL3 IS NULL RETURN 0 --no columns found

IF @Debug = 1
   PRINT (@PreSQL + @SQL1 + @MidSQL1 + @SQL2 + @MidSQL2 + @SQL3 + @PostSQL)
   ELSE EXECUTE (@PreSQL + @SQL1 + @MidSQL1 + @SQL2 + @MidSQL2 + @SQL3 + @PostSQL)

RETURN 0

/*
--Compare before and after values from most recent two run dates:
SELECT
   DatabaseName = IsNull(x.databasename, y.databasename),
   OwnerName = IsNull(x.ownername, y.ownername),
   TableName = IsNull(x.tablename, y.ownername),
   ColumnName = IsNull(x.columnname, y.columnname),
   ExtraSpaces1 = Sum(x.extraspaces),
   ExtraSpaces2 = Sum(y.extraspaces),
   RowCnt1 = Min(x.rowcnt), -- / count(distinct IsNull(x.databasename, y.databasename) + IsNull(x.ownername, y.ownername) + IsNull(x.tablename, y.ownername)),
   RowCnt2 = Min(y.rowcnt), --/ count(distinct IsNull(x.databasename, y.databasename) + IsNull(x.ownername, y.ownername) + IsNull(x.tablename, y.ownername))
   CollectionTime1 = Min(x.Spacedate),
   CollectionTime2 = Min(y.Spacedate)
FROM
   (
      SELECT * FROM dbo.spacecharinfo d1 WHERE spacedate = (select max(spacedate) from dbo.spacecharinfo where spacedate < (select max(spacedate) from dbo.spacecharinfo))
   ) x
   FULL JOIN (
      SELECT * FROM dbo.spacecharinfo d1 WHERE spacedate = (select max(spacedate) from dbo.spacecharinfo)
   ) y ON x.databasename = y.databasename AND x.ownername = y.ownername AND x.tablename = y.tablename AND x.columnname = y.columnname
GROUP BY
   IsNull(x.databasename, y.databasename),
   IsNull(x.ownername, y.ownername),
   IsNull(x.tablename, y.ownername),
   IsNull(x.columnname, y.columnname)
WITH ROLLUP
ORDER BY
   IsNull(x.databasename, y.databasename),
   IsNull(x.ownername, y.ownername),
   IsNull(x.tablename, y.ownername),
   IsNull(x.columnname, y.columnname)

select
   spacedate,
   databasename,
   ExtraSpaces = sum(extraspaces),
   [Rows] = sum(rowcnt),
   [Average spaces per row] = convert(float, sum(extraspaces)) / convert(float, sum(rowcnt)),
   [Average spaces per column] = convert(float, avg(extraspaces))
from spacecharinfo
group by spacedate, databasename
with rollup
having grouping(spacedate) = 0
*/

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[SpaceInfoCollectAllDB]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[SpaceInfoCollectAllDB]
GO
-- runs SpaceInfoCollect against all databases and tables, see that SP for more information
CREATE PROCEDURE SpaceInfoCollectAllDB(@DestTableName nvarchar(4000) = NULL)
AS
SET NOCOUNT ON
IF @DestTableName IS NULL SET @DestTableName = QuoteName(DB_Name()) + '.' + QuoteName(User) + '.SpaceCharInfo'
DECLARE
   @SQL nvarchar(4000),
   @DBName nvarchar(128),
   @Dt datetime
SET @Dt = GetDate()
SET @SQL =
'SELECT ''USE '' + [name] + '' EXECUTE sp_msforeachtable ''''EXECUTE ''
   + QuoteName(DB_Name()) + ''.dbo.SpaceCharInfoCollect '''''''''' + QuoteName(Replace([name], '''''''', '''''''''''')) + ''.''
   + Char(63) + '''''''''', ''''''''YourDesiredDefaultDatabase.dbo.SpaceCharInfo'''''''', ''''''''' + Convert(varchar(50), @Dt, 121) + '''''''''''''''
FROM master.dbo.sysdatabases
WHERE [name] NOT IN (''master'', ''model'', ''tempdb'', ''pubs'', ''msdb'')'

SET @DBName = DB_Name()

EXECUTE sp_executesql N'EXECUTE master.dbo.xp_execresultset @SQL, @DBName', N'@SQL nvarchar(4000), @DBName nvarchar(128)', @SQL, @DBName
SET @SQL = 'SELECT * FROM ' + @DestTableName + ' WHERE SpaceDate = ''' + Convert(varchar(50), @Dt, 121) + ''''
SET NOCOUNT OFF
EXECUTE (@SQL)

Whew! Now how do you use all that? Easy.

EXECUTE SpaceInfoCollectAllDB

When you're done, take a look at the code at the end of the SpaceCharInfoCollect to see some final analysis tools for the output table. The database I ran this on had a whopping 10 GB of extra spaces, which is something like 14% of the entire database size.

Yes, the dynamic SQL is really difficult to work with. Making major changes directly in the script is nearly impossible: it's much easier to grab the section of script you want to change, paste it into a new window, and replace doubled single-quotes with single single-quotes. Which of course doesn't work for embedded values from the full outer script, but it helps immensely. Then when you have what you need, you double them back up and paste in just the parts you changed. At times you have to go two levels of dynamic SQL deep (or more) and undouble/double your quotes at each step. Use the power of search and replace to help you.

In some cases for small changes it does work to use the debug mode to output the final script and then examine or try to run THAT, and iteratively modify the dynamic SQL and see what you get.

If you find any bugs, please let me know. I haven't tested this on nchar columns, but it should multiply the space used by 2 since we care about bytes rather than characters.

P.S. There are 15 single quotes in a row in this script. Wheeeeeeeee.

Friday, April 4, 2008

Database Time Storage and Display Problems

After seeing a post about UTC time in a forum, I did some more searching and reading on the web, and found an article, UTC datetime values in SQL Server 2000.

Earlier today I had the same basic idea as the one presented in that blog, though a little more simply:

SELECT UTCDateColumn + Getdate() - Getutcdate()

In the comments on the blog entry is a function someone tried to build that would automatically adjust for daylight savings time as well. This immediately struck me as problematic, and this was borne out by further comments such as that the rules are different for Arizona. And to not be overly self-centric, what about the majority of the people in the world who do not observe DST at all? Follow the link in this paragraph to see a map of the world usage of DST.

There were additional problems. One commenter said, "Say it's currently summer and I'm in daylight savings time, my difference will include DST. Now I try to convert a date from 6 months ago, I'll be an hour out."

So here are my rambling thoughts on the subject for your amusement (at me or with me, your choice).

So you want to know the time. The time where and in what time reference? Is it at the client? Is it at the server? Is it at the place the time event corresponds to like the cash machine where a withdrawal was made?

If you have web clients around the world, using the database to calculate "The Time" is pointless.

Even if your client location(s) only span one time zone (and can you guarantee this forever) are you sure that client will always be in the same time zone as the server?

The various methods to automatically calculate the UTC differential on the server are clever but with DST are exposed for being completely inadequate, because all one is answering is, what is the current UTC differential, including DST, at the server's location? One cannot reasonably expect that to apply to all clients and for all date ranges in the database.

In my mind, the correct answer is to let the user choose his UTC differential in the client, or let the client autodiscover the differential (through a server getutcdate(), or connecting to a time server, or reading the offset out of the registry, for example).

Handling older dates that were stored during a different phase of DST is an interesting problem. Are we answering the question "what was the time then in terms of our time now" or "what was the time then in terms of the time then?" And the time now/then where?

Before you answer that seemingly easy question overquickly, think about the 1 hour period just after DST adjustment makes the time jump back and the same hour is repeated. Imagine you're reviewing security logs for a crime event. Your system reports:

2:05 am - last person to exit the facility was researcher John Smith

2:20 am - critical research samples verified to be secure

2:35 am - later investigation pinpoints this as the time critical research samples were known to be missing

If the system is adjusting times for "what was the time then" then you can't know whether John Smith left the building before the samples were verified secure, or if he left after they were stolen, because there were in fact two segments of time that night covering 2 am - 2:59:59 am, and he could have cleverly left the building during the second hour in order to fool the computer systems.

Anyway, if you set a concern like this aside and still wish to adjust for the time things were when recorded, I can't see how this can be accomplished except on the client because only the client can know the actual location of the client and his desired DST rules. Unless you keep a comprehensive and carefully-updated catalog of the DST rules everywhere your clients can possibly be, this sounds like a challenge.

Maybe the answer is to store both dates in the database: the client time AND UTC time, or UTC time and client offset at that time. This data doesn't even have to be in the individual rows: it could be captured or updated in a UTC-offset history table for each client location every time a client connects. It has its own problems like: does the client know his location, UTC offset, and DST rules & current phase correctly? If so, can the client always be trusted to present this information accurately or could the user purposefully taint the data?

There is even more complication:

What if you are in some place that doesn't observe DST and you want to compare transactions from months ago that were recorded for two different places: your current location and a location that is in a different time zone and does observe DST.

What consistent method of adjustment do you plan to use to display all the times in the same list? Clearly they have to be adjusted the same for any sort of sensical ordering or comparison, and it makes sense to adjust them for the observer so he can view them in his own time. Yet now, adjusting the historical date of the time recorded for the observing-DST location has to be done in terms of the viewing location's DST rules, which can also lead to incorrect assumptions.

So yet again it seems that time data needs to either be stored in UTC and viewed in UTC with an adjustment for the CURRENT offset only and the user knowing this information and making allowances for it, or time data needs to be stored in UTC and ALSO the client's time and the client's location, in some format or other, so that true reconstruction of "the time" can be answered from every perspective needed: any location, observing any DST rules, during any DST phase. At historical review time, there is a huge list of possible dates to display for a single recorded UTC time:

The UTC time of the event (the one place where all the madness is overcome!)

The time of the event at the event location, using the DST offset of the event location then

The time of the event at the event location, using the DST offest of the event location now

The time of the event at the user's location, using the DST offset of the user location then

The time of the event at the user's location, using the DST offset of the user location now

The time of the event at the event location, using the DST offset of the user location then

The time of the event at the event location, using the DST offset of the user location now

Yikes!

In closing, the getdate() - getutcdate() idea might be useful in conjunction with the client's stated known offset to determine if the client is observing DST *now*, but by itself can never solve the problem of fully-sensible historical time display in any time zone observing any DST rules (that themselves change over time).

Wednesday, January 9, 2008

What's the Percent Character (%) in SQL?

% - the percent character in SQL is the modulo (aka modulus) operator, which means give the remainder of a division. For example, 11 divided by 4 is 2 with a remainder of 3. It's the 3 that we're interested in.

I think about the operation in a few different ways. Depending on what form your numbers are in and what functions are available to you, or depending on which strikes you as the most natural, one of these might be more helpful to you than the others.

Form 1 - modulo is to subtract from n the highest multiple of x that is less than n:

n % x = n – floor(n / x)

Form 2 - modulo is to divide n by x, remove the whole number portion, then multiply by x again:

n % x = (n / x - floor(n / x)) * x

Form 3 - modulo is, using a clock with x numbers (from 0 to x - 1), to start on 0 and take n steps, then look at what number we end up on.

Operation: 11 % 4
Okay. Here's a 4-numbered clock to count on:

  0
3   1
  2

Starting at 0 we take 11 clockwise steps and see that we end up on 3.

So 11 mod 4 = 3.

Here are some syntax notes for other programming environments:

n mod x <- how it’s written in VB
mod(n, x) <- how it’s written in Excel

It’s a nice way to get the remainder without having to use n twice in an expression.

Examples:

10 % 10 = 0
10 % 9 = 1
10 % 8 = 2
10 % 7 = 3
10 % 6 = 4
10 % 5 = 0

It helps to keep in mind that the result will always be in the range 0 to (x – 1). Anything mod 10 will always end up in the range 0 to 9. Modulo has the same precedence as multiply and divide (at least in SQL): lower than negation and bitwise not but higher than everything else.

If you want to add 1 to an hour value of 0 to 23 and not just end up with 1 – 24, modulo is a great way to do that.

For an hour value that goes from 0 to 23, to add one hour:

(Hr + 1) % 24 -> 0 to 23 -> 1 to 24 -> 1 to 23, 0 (0 to 23 again, rotated up one)

For an hour value that goes from 1 to 24, to add one hour:

Hr % 24 + 1 -> 1 to 24 -> 1 to 23, 0 -> 2 to 24, 1 (1 to 24 again, rotated up one)

Since the modulo of negative numbers can give unexpected results depending on the programming language (-1 % x could be 1 or it could be x – 1, e.g., -1 mod 5 could be 1 or 4) you just add a multiple of x (since that won't change the remainder but will get us out of negative territory), then subtract what you need. Usually you only need x.

Here’s an example of finding last month’s month number from this month’s month number. Note that when you have ranges that start with 1 instead of 0, sometimes you have to subtract 1 before the modulo and then add 1 after the modulo.

Note: we’re subtracting one from the month, then adding 12 so we don’t get negative numbers. We also subtract 1 before the modulo and add 1 after the modulo (to convert the 1-based range to a 0-based range and back).

(Month – 1 + 12 - 1) % 12 + 1
which simplified is:
(Month + 10) % 12 + 1 -> 1 to 12 -> 11 to 22 -> 11, 0 to 10 -> 12, 1 to 11

Monday, January 7, 2008

SQL Trouble

The troubled code:

CASE
   WHEN theTime <=0030 THEN '0000'
   WHEN theTime >0031 AND theTime <=0130 THEN '0100'
   WHEN theTime >0131 AND theTime <=0230 THEN '0200'
   WHEN theTime >0231 AND theTime <=0330 THEN '0300'
   WHEN theTime >0331 AND theTime <=0430 THEN '0400'
   WHEN theTime >0431 AND theTime <=0530 THEN '0500'
   WHEN theTime >0531 AND theTime <=0630 THEN '0600'
   WHEN theTime >0631 AND theTime <=0730 THEN '0700'
   WHEN theTime >0731 AND theTime <=0830 THEN '0800'
   WHEN theTime >0831 AND theTime <=0930 THEN '0900'
   WHEN theTime >0931 AND theTime <=1030 THEN '1000'
   WHEN theTime >1031 AND theTime <=1130 THEN '1100'
   WHEN theTime >1131 AND theTime <=1230 THEN '1200'
   WHEN theTime >1231 AND theTime <=1330 THEN '1300'
   WHEN theTime >1331 AND theTime <=1430 THEN '1400'
   WHEN theTime >1431 AND theTime <=1530 THEN '1500'
   WHEN theTime >1531 AND theTime <=1630 THEN '1600'
   WHEN theTime >1631 AND theTime <=1730 THEN '1700'
   WHEN theTime >1731 AND theTime <=1830 THEN '1800'
   WHEN theTime >1831 AND theTime <=1930 THEN '1900'
   WHEN theTime >1931 AND theTime <=2030 THEN '2000'
   WHEN theTime >2031 AND theTime <=2130 THEN '2100'
   WHEN theTime >2131 AND theTime <=2230 THEN '2200'
   WHEN theTime >2231 AND theTime <=2330 THEN '2300'
   WHEN theTime >2331 THEN '0000'
END as GroupedTime

First of all, the logic excludes every number ending in 31. The result for those will be NULL. Oops.

So here's improvement #1:

GroupedTime =
   CASE
      WHEN theTime >= 2231 THEN '2300'
      WHEN theTime >= 2131 THEN '2200'
      WHEN theTime >= 2031 THEN '2100'
      WHEN theTime >= 1931 THEN '2000'
      WHEN theTime >= 1831 THEN '1900'
      WHEN theTime >= 1731 THEN '1800'
      WHEN theTime >= 1631 THEN '1700'
      WHEN theTime >= 1531 THEN '1600'
      WHEN theTime >= 1431 THEN '1500'
      WHEN theTime >= 1331 THEN '1400'
      WHEN theTime >= 1231 THEN '1300'
      WHEN theTime >= 1131 THEN '1200'
      WHEN theTime >= 1031 THEN '1100'
      WHEN theTime >= 0931 THEN '1000'
      WHEN theTime >= 0831 THEN '0900'
      WHEN theTime >= 0731 THEN '0800'
      WHEN theTime >= 0631 THEN '0700'
      WHEN theTime >= 0531 THEN '0600'
      WHEN theTime >= 0431 THEN '0500'
      WHEN theTime >= 0331 THEN '0400'
      WHEN theTime >= 0231 THEN '0300'
      WHEN theTime >= 0131 THEN '0200'
      WHEN theTime >= 0031 THEN '0100'
      ELSE '0000'
   END

But here's improvement #2. Now, oddly enough, theTime is a varchar column (it was extracted from a datetime column as varchar for some reason, but I'm working to try to fix that).

Groupedtime = Right('0' + Convert(varchar(2), (Left(theTime, 2) + CASE WHEN Right(theTime, 2) >= 31 THEN 1 ELSE 0 END) % 24), 2) + '00'

More improvements can be made if the theTime column comes in as datetime or a proper numeric such as minutes (instead of the display varchar number where '0100' means 1 o'clock or 60 minutes, not 100 minutes. I will be making that change since the source of the data is also able to be changed.

SQL Puzzlement

The tortured SQL I find in production:

CASE WHEN ExtendedDescription is null
   THEN replace(replace(replace(MainDescription, '''', '`'), char(10), ';'), char(13), ' ')
   ELSE (replace(replace(replace(MainDescription, '''', '`'), char(10), ';'), char(13), ' ')
      + '(' + replace(replace(replace(ExtendedDescription, '''', '`') + ')', char(10), ';'), char(13), ' '))
      END as FullDescription,

A simpler way to express this:

FullDescription =
   Replace(Replace(Replace(
      MainDescription + Coalesce('(' + ExtendedDescription + ')', '')
      , '''', '`'), char(10), ';'), char(13), ' '
   ),

It's not just about the simplest way to express something, but also about how easy it is to read and see what it's doing.

Oddly enough I do put commas at the beginning of lines sometimes when expressions are very long. I'll never put commas at the beginning of a new column expression, though.

Thursday, December 27, 2007

SQL Overelaboration

Random forum posting, function that detects if a string is made only of numbers:

CREATE function fn_numeric
-- input is varchar string.
   (@string varchar(12) )
returns bit -- 0 = false, 1 = true
as
begin
declare @x smallint, @y smallint, @z bit
set @string = replace(@string, ' ', '@')
set @x = len(@string)
set @y = 0
set @z = 1
while @y < @x
    begin
    set @y = @y + 1
    if substring(@string, @y, 1) between '0' and '9'
        continue
    else
        begin
        set @z = 0
        break
        end
    end
return (@z)
end

My response:

WHERE ColumnToTest NOT LIKE '%[^0-9]%'