Monday, October 29, 2007

Sometimes Optional Keywords Shouldn't Be

I always use the CLUSTERED or NONCLUSTERED keywords in my constraint scripting. I used to not do it. But then I realized the following problem:

You have a table with some stuff in it.

CREATE TABLE Test(a int NOT NULL, b int NOT NULL)
INSERT Test VALUES (1, 2)
INSERT Test VALUES (2, 3)
INSERT Test VALUES (3, 4)

You go and add a primary key to it:

ALTER TABLE Test ADD CONSTRAINT PK_Test PRIMARY KEY (a) --<-- Clustered, right? Perhaps...

And now you have a clustered index, right? Yes. But let's say that this table had different needs than most and its primary key wasn't the clustered index and there was already a non-primary-key clustered index! So drop the constraint you just made, then re-add it after making our test table correspond with your "real" table we're imagining:

ALTER TABLE Test DROP CONSTRAINT PK_Test
CREATE CLUSTERED INDEX IX_Test_b ON Test (b)
ALTER TABLE Test ADD CONSTRAINT PK_Test PRIMARY KEY (a) --<-- Not clustered this time.

So now your primary key constraint is NOT clustered, even though the default for primary keys IS clustered. And it was the same DDL script each time, performing something very different. So I concluded that it's best to always include the keyword CLUSTERED or NONCLUSTERED so that you and other developers know that's what was intended. Plus now you won't go your merry way thinking that you did something you didn't. Let's try to re-add that PK but using the CLUSTERED keyword (which we are expecting to happen by default without using it):

ALTER TABLE Test DROP CONSTRAINT PK_Test
ALTER TABLE Test ADD CONSTRAINT PK_Test PRIMARY KEY CLUSTERED (a)

And now you get the information you needed to do your job properly:

Server: Msg 1902, Level 16, State 3, Line 1
Cannot create more than one clustered index on table 'Test'. Drop the existing clustered index 'IX_Test_b' before creating another.
Server: Msg 1750, Level 16, State 1, Line 1
Could not create constraint. See previous errors.

UNIQUE is another word that I've seen people forget to include in their NONCLUSTERED index creation (not primary keys of course) so don't forget to put that in when you are expecting uniqueness. On the one hand, you might think that checking for uniqueness on every update or insert would be a performance hit. But that is not a problem: the real performance hit comes from the extra bytes required on that non-unique index as a unique-ifier bloating the index's size and reducing its speed.

I should add that having extra unused indexes is a performance hit all by itself. Here's a good discussion about that. (This guy has a WAY better blog than I will ever have. If you've gotten anything from reading my blog, then you should forget about mine and go read his. When you're done reading all 50,000 of his posts, then come back here, where there's still a teensy tiny chance remaining you'll learn something from me that you didn't from him.)

Thursday, October 25, 2007

Table Variables Use Tempdb

create table #t(a int)
select * from tempdb.dbo.sysobjects where name not like 'sys%'
drop table #t
GO
declare @t table(a int)
select * from tempdb.dbo.sysobjects where name not like 'sys%'

Now remove the GO and run it again. Notice that the table variable exists in tempdb during the first select! This is generically in computer programming called hoisting (and see more interesting related stuff).

The query engine decides how much to put in tempdb and how much to keep in memory. Some reports I've read say that the data always lives in memory, but others say that tempdb can be used for some of the data. What I'm taking away is that, while there is always the possibility that your table variable will live only in memory, it is not guaranteed. But using a table at all will always incur a performance hit. And consuming too much RAM can be as much a detriment to a server as consuming too much tempdb space.

Create a correlated subquery and put it in the SELECT list of a simple query against a many-row table that has a clustered index, and put a literal derived table in the subquery that UNION ALL SELECTS the numbers 1 through 10. It doesn't matter what the subquery does as long as it logically runs once for each row in the main query's table (it has a reference to a column from the main table).

Now create a table variable and insert the same numbers 1 through 10 and modify the correlated subquery to use the table variable instead of the UNION list. Compare execution plans and traces. Note that the first query has 10 constant scans in it, repeated as many times as there are rows, but the number of reads are close to the number of pages in the physical table.

The second query has no constant scans in it, but the number of reads is now closer to the number of rows in the physical table.

You cannot think of table variables as no-cost, memory only, super-fast objects with no side effects. They are not. They have side effects that can very nearly be main effects.