Wednesday, January 9, 2008

What's the Percent Character (%) in SQL?


% - the percent character in SQL is the modulo (aka modulus) operator, which means give the remainder of a division. For example, 11 divided by 4 is 2 with a remainder of 3. It's the 3 that we're interested in.

I think about the operation in a few different ways. Depending on what form your numbers are in and what functions are available to you, or depending on which strikes you as the most natural, one of these might be more helpful to you than the others.

Form 1 - modulo is to subtract from n the highest multiple of x that is less than n:

n % x = n – floor(n / x)

Form 2 - modulo is to divide n by x, remove the whole number portion, then multiply by x again:

n % x = (n / x - floor(n / x)) * x

Form 3 - modulo is, using a clock with x numbers (from 0 to x - 1), to start on 0 and take n steps, then look at what number we end up on.

Operation: 11 % 4
Okay. Here's a 4-numbered clock to count on:

  0
3   1
  2

Starting at 0 we take 11 clockwise steps and see that we end up on 3.

So 11 mod 4 = 3.

Here are some syntax notes for other programming environments:

n mod x <- how it’s written in VB
mod(n, x) <- how it’s written in Excel

It’s a nice way to get the remainder without having to use n twice in an expression.

Examples:

10 % 10 = 0
10 % 9 = 1
10 % 8 = 2
10 % 7 = 3
10 % 6 = 4
10 % 5 = 0

It helps to keep in mind that the result will always be in the range 0 to (x – 1). Anything mod 10 will always end up in the range 0 to 9. Modulo has the same precedence as multiply and divide (at least in SQL): lower than negation and bitwise not but higher than everything else.

If you want to add 1 to an hour value of 0 to 23 and not just end up with 1 – 24, modulo is a great way to do that.

For an hour value that goes from 0 to 23, to add one hour:

(Hr + 1) % 24 -> 0 to 23 -> 1 to 24 -> 1 to 23, 0 (0 to 23 again, rotated up one)

For an hour value that goes from 1 to 24, to add one hour:

Hr % 24 + 1 -> 1 to 24 -> 1 to 23, 0 -> 2 to 24, 1 (1 to 24 again, rotated up one)

Since the modulo of negative numbers can give unexpected results depending on the programming language (-1 % x could be 1 or it could be x – 1, e.g., -1 mod 5 could be 1 or 4) you just add a multiple of x (since that won't change the remainder but will get us out of negative territory), then subtract what you need. Usually you only need x.

Here’s an example of finding last month’s month number from this month’s month number. Note that when you have ranges that start with 1 instead of 0, sometimes you have to subtract 1 before the modulo and then add 1 after the modulo.

Note: we’re subtracting one from the month, then adding 12 so we don’t get negative numbers. We also subtract 1 before the modulo and add 1 after the modulo (to convert the 1-based range to a 0-based range and back).

(Month – 1 + 12 - 1) % 12 + 1
which simplified is:
(Month + 10) % 12 + 1 -> 1 to 12 -> 11 to 22 -> 11, 0 to 10 -> 12, 1 to 11

Monday, January 7, 2008

SQL Trouble

The troubled code:

CASE
   WHEN theTime <=0030 THEN '0000'
   WHEN theTime >0031 AND theTime <=0130 THEN '0100'
   WHEN theTime >0131 AND theTime <=0230 THEN '0200'
   WHEN theTime >0231 AND theTime <=0330 THEN '0300'
   WHEN theTime >0331 AND theTime <=0430 THEN '0400'
   WHEN theTime >0431 AND theTime <=0530 THEN '0500'
   WHEN theTime >0531 AND theTime <=0630 THEN '0600'
   WHEN theTime >0631 AND theTime <=0730 THEN '0700'
   WHEN theTime >0731 AND theTime <=0830 THEN '0800'
   WHEN theTime >0831 AND theTime <=0930 THEN '0900'
   WHEN theTime >0931 AND theTime <=1030 THEN '1000'
   WHEN theTime >1031 AND theTime <=1130 THEN '1100'
   WHEN theTime >1131 AND theTime <=1230 THEN '1200'
   WHEN theTime >1231 AND theTime <=1330 THEN '1300'
   WHEN theTime >1331 AND theTime <=1430 THEN '1400'
   WHEN theTime >1431 AND theTime <=1530 THEN '1500'
   WHEN theTime >1531 AND theTime <=1630 THEN '1600'
   WHEN theTime >1631 AND theTime <=1730 THEN '1700'
   WHEN theTime >1731 AND theTime <=1830 THEN '1800'
   WHEN theTime >1831 AND theTime <=1930 THEN '1900'
   WHEN theTime >1931 AND theTime <=2030 THEN '2000'
   WHEN theTime >2031 AND theTime <=2130 THEN '2100'
   WHEN theTime >2131 AND theTime <=2230 THEN '2200'
   WHEN theTime >2231 AND theTime <=2330 THEN '2300'
   WHEN theTime >2331 THEN '0000'
END as GroupedTime

First of all, the logic excludes every number ending in 31. The result for those will be NULL. Oops.

So here's improvement #1:

GroupedTime =
   CASE
      WHEN theTime >= 2231 THEN '2300'
      WHEN theTime >= 2131 THEN '2200'
      WHEN theTime >= 2031 THEN '2100'
      WHEN theTime >= 1931 THEN '2000'
      WHEN theTime >= 1831 THEN '1900'
      WHEN theTime >= 1731 THEN '1800'
      WHEN theTime >= 1631 THEN '1700'
      WHEN theTime >= 1531 THEN '1600'
      WHEN theTime >= 1431 THEN '1500'
      WHEN theTime >= 1331 THEN '1400'
      WHEN theTime >= 1231 THEN '1300'
      WHEN theTime >= 1131 THEN '1200'
      WHEN theTime >= 1031 THEN '1100'
      WHEN theTime >= 0931 THEN '1000'
      WHEN theTime >= 0831 THEN '0900'
      WHEN theTime >= 0731 THEN '0800'
      WHEN theTime >= 0631 THEN '0700'
      WHEN theTime >= 0531 THEN '0600'
      WHEN theTime >= 0431 THEN '0500'
      WHEN theTime >= 0331 THEN '0400'
      WHEN theTime >= 0231 THEN '0300'
      WHEN theTime >= 0131 THEN '0200'
      WHEN theTime >= 0031 THEN '0100'
      ELSE '0000'
   END

But here's improvement #2. Now, oddly enough, theTime is a varchar column (it was extracted from a datetime column as varchar for some reason, but I'm working to try to fix that).

Groupedtime = Right('0' + Convert(varchar(2), (Left(theTime, 2) + CASE WHEN Right(theTime, 2) >= 31 THEN 1 ELSE 0 END) % 24), 2) + '00'

More improvements can be made if the theTime column comes in as datetime or a proper numeric such as minutes (instead of the display varchar number where '0100' means 1 o'clock or 60 minutes, not 100 minutes. I will be making that change since the source of the data is also able to be changed.

SQL Puzzlement

The tortured SQL I find in production:

CASE WHEN ExtendedDescription is null
   THEN replace(replace(replace(MainDescription, '''', '`'), char(10), ';'), char(13), ' ')
   ELSE (replace(replace(replace(MainDescription, '''', '`'), char(10), ';'), char(13), ' ')
      + '(' + replace(replace(replace(ExtendedDescription, '''', '`') + ')', char(10), ';'), char(13), ' '))
      END as FullDescription,

A simpler way to express this:

FullDescription =
   Replace(Replace(Replace(
      MainDescription + Coalesce('(' + ExtendedDescription + ')', '')
      , '''', '`'), char(10), ';'), char(13), ' '
   ),

It's not just about the simplest way to express something, but also about how easy it is to read and see what it's doing.

Oddly enough I do put commas at the beginning of lines sometimes when expressions are very long. I'll never put commas at the beginning of a new column expression, though.